[RFC,PATCH] Use RCU to protect tasklist for unicast signals

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC,PATCH] Use RCU to protect tasklist for unicast signals
@ 2005-08-10 17:11 Paul E. McKenney
  2005-08-11  9:56 ` Ingo Molnar
  2005-08-11 17:14 ` Christoph Hellwig
  0 siblings, 2 replies; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-10 17:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, dipankar, rusty, bmark

Hello!

This patch is an experiment in use of RCU for individual code paths that
read-acquire the tasklist lock, in this case, unicast signal delivery.
It passes five kernbenches on 4-CPU x86, but obviously needs much more
testing before it is considered for serious use, let alone inclusion.

My main question is whether I have the POSIX semantics covered.  I believe
that I do, but thought I should check with people who are more familiar
with POSIX than am I.

For the record, some shortcomings of this patch:

o	Needs lots more testing on more architectures.

o	Needs performance and stress testing.

o	Needs testing in Ingo's PREEMPT_RT environment.

o	Uses cmpxchg(), which is currently architecture dependent.
	This can be fixed, for example, by using the hashed locks
	proposed in an earlier patch from Dipankar:

	http://marc.theaimsgroup.com/?l=linux-kernel&m=111875978502912&w=2

Thoughts?

						Thanx, Paul

---

Not-signed-off-by: paulmck@us.ibm.com

 include/linux/sched.h |   27 +++++++++++++++++++++++++--
 kernel/sched.c        |    5 +++++
 kernel/signal.c       |    8 ++++++--
 3 files changed, 36 insertions(+), 4 deletions(-)

diff -urpN -X dontdiff linux-2.6.13-rc6/include/linux/sched.h linux-2.6.13-rc6-tasklistRCU/include/linux/sched.h
--- linux-2.6.13-rc6/include/linux/sched.h	2005-08-08 19:59:23.000000000 -0700
+++ linux-2.6.13-rc6-tasklistRCU/include/linux/sched.h	2005-08-09 15:44:48.000000000 -0700
@@ -34,6 +34,7 @@
 #include <linux/percpu.h>
 #include <linux/topology.h>
 #include <linux/seccomp.h>
+#include <linux/rcupdate.h>
 
 struct exec_domain;
 
@@ -770,6 +771,7 @@ struct task_struct {
 	int cpuset_mems_generation;
 #endif
 	atomic_t fs_excl;	/* holding fs exclusive resources */
+	struct rcu_head rcu;
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
@@ -793,8 +795,29 @@ static inline int pid_alive(struct task_
 extern void free_task(struct task_struct *tsk);
 extern void __put_task_struct(struct task_struct *tsk);
 #define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0)
-#define put_task_struct(tsk) \
-do { if (atomic_dec_and_test(&(tsk)->usage)) __put_task_struct(tsk); } while(0)
+
+static inline int get_task_struct_rcu(struct task_struct *t)
+{
+	int oldusage;
+
+	do {
+		oldusage = atomic_read(&t->usage);
+		if (oldusage == 0) {
+			return 0;
+		}
+	} while (cmpxchg(&t->usage.counter,
+		 oldusage, oldusage + 1) != oldusage);
+	return 1;
+}
+
+extern void __put_task_struct_cb(struct rcu_head *rhp);
+
+static inline void put_task_struct(struct task_struct *t)
+{
+	if (atomic_dec_and_test(&t->usage)) {
+		call_rcu(&t->rcu, __put_task_struct_cb);
+	}
+}
 
 /*
  * Per process flags
diff -urpN -X dontdiff linux-2.6.13-rc6/kernel/sched.c linux-2.6.13-rc6-tasklistRCU/kernel/sched.c
--- linux-2.6.13-rc6/kernel/sched.c	2005-08-08 19:59:24.000000000 -0700
+++ linux-2.6.13-rc6-tasklistRCU/kernel/sched.c	2005-08-09 12:27:34.000000000 -0700
@@ -176,6 +176,11 @@ static unsigned int task_timeslice(task_
 #define task_hot(p, now, sd) ((long long) ((now) - (p)->last_ran)	\
 				< (long long) (sd)->cache_hot_time)
 
+void __put_task_struct_cb(struct rcu_head *rhp)
+{
+	__put_task_struct(container_of(rhp, struct task_struct, rcu));
+}
+
 /*
  * These are the runqueue data structures:
  */
diff -urpN -X dontdiff linux-2.6.13-rc6/kernel/signal.c linux-2.6.13-rc6-tasklistRCU/kernel/signal.c
--- linux-2.6.13-rc6/kernel/signal.c	2005-08-08 19:59:24.000000000 -0700
+++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c	2005-08-10 08:20:25.000000000 -0700
@@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct 
 
 	ret = check_kill_permission(sig, info, p);
 	if (!ret && sig && p->sighand) {
+		if (!get_task_struct_rcu(p)) {
+			return -ESRCH;
+		}
 		spin_lock_irqsave(&p->sighand->siglock, flags);
 		ret = __group_send_sig_info(sig, info, p);
 		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+		put_task_struct(p);
 	}
 
 	return ret;
@@ -1200,12 +1204,12 @@ kill_proc_info(int sig, struct siginfo *
 	int error;
 	struct task_struct *p;
 
-	read_lock(&tasklist_lock);
+	rcu_read_lock();
 	p = find_task_by_pid(pid);
 	error = -ESRCH;
 	if (p)
 		error = group_send_sig_info(sig, info, p);
-	read_unlock(&tasklist_lock);
+	rcu_read_unlock();
 	return error;
 }
 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-10 17:11 [RFC,PATCH] Use RCU to protect tasklist for unicast signals Paul E. McKenney
@ 2005-08-11  9:56 ` Ingo Molnar
  2005-08-11 14:14   ` Paul E. McKenney
  2005-08-12  2:00   ` Lee Revell
  2005-08-11 17:14 ` Christoph Hellwig
  1 sibling, 2 replies; 28+ messages in thread
From: Ingo Molnar @ 2005-08-11  9:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, dipankar, rusty, bmark, Thomas Gleixner,
	Andrew Morton


* Paul E. McKenney <paulmck@us.ibm.com> wrote:

> Hello!
> 
> This patch is an experiment in use of RCU for individual code paths 
> that read-acquire the tasklist lock, in this case, unicast signal 
> delivery. It passes five kernbenches on 4-CPU x86, but obviously needs 
> much more testing before it is considered for serious use, let alone 
> inclusion.
> 
> My main question is whether I have the POSIX semantics covered.  I 
> believe that I do, but thought I should check with people who are more 
> familiar with POSIX than am I.
> 
> For the record, some shortcomings of this patch:
> 
> o	Needs lots more testing on more architectures.
> 
> o	Needs performance and stress testing.
> 
> o	Needs testing in Ingo's PREEMPT_RT environment.

cool patch! I have integrated it into my PREEMPT_RT tree, and all it 
needed to boot was the patch below (doesnt affect the upstream kernel).  
Using the raw IRQ flag isnt an issue in the RCU code, all the affected 
codepaths are small and deterministic.

(without this patch it locked up after detecting IRQ7 - not sure why.)

kernel still works fine after some (mostly light) testing.

	Ingo

Index: linux/kernel/rcupdate.c
===================================================================
--- linux.orig/kernel/rcupdate.c
+++ linux/kernel/rcupdate.c
@@ -134,11 +134,11 @@ void fastcall call_rcu(struct rcu_head *
 
 	head->func = func;
 	head->next = NULL;
-	local_irq_save(flags);
+	raw_local_irq_save(flags);
 	rdp = &__get_cpu_var(rcu_data);
 	*rdp->nxttail = head;
 	rdp->nxttail = &head->next;
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 }
 
 /**
@@ -165,11 +165,11 @@ void fastcall call_rcu_bh(struct rcu_hea
 
 	head->func = func;
 	head->next = NULL;
-	local_irq_save(flags);
+	raw_local_irq_save(flags);
 	rdp = &__get_cpu_var(rcu_bh_data);
 	*rdp->nxttail = head;
 	rdp->nxttail = &head->next;
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 }
 
 /*
@@ -305,11 +305,11 @@ static void rcu_check_quiescent_state(st
 static void rcu_move_batch(struct rcu_data *this_rdp, struct rcu_head *list,
 				struct rcu_head **tail)
 {
-	local_irq_disable();
+	raw_local_irq_disable();
 	*this_rdp->nxttail = list;
 	if (list)
 		this_rdp->nxttail = tail;
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 static void __rcu_offline_cpu(struct rcu_data *this_rdp,
@@ -362,13 +362,13 @@ static void __rcu_process_callbacks(stru
 		rdp->curtail = &rdp->curlist;
 	}
 
-	local_irq_disable();
+	raw_local_irq_disable();
 	if (rdp->nxtlist && !rdp->curlist) {
 		rdp->curlist = rdp->nxtlist;
 		rdp->curtail = rdp->nxttail;
 		rdp->nxtlist = NULL;
 		rdp->nxttail = &rdp->nxtlist;
-		local_irq_enable();
+		raw_local_irq_enable();
 
 		/*
 		 * start the next batch of callbacks
@@ -388,7 +388,7 @@ static void __rcu_process_callbacks(stru
 			spin_unlock(&rsp->lock);
 		}
 	} else {
-		local_irq_enable();
+		raw_local_irq_enable();
 	}
 	rcu_check_quiescent_state(rcp, rsp, rdp);
 	if (rdp->donelist)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
@ 2005-08-11 12:16 Oleg Nesterov
  2005-08-11 15:20 ` Paul E. McKenney
  2005-08-12  1:56 ` Paul E. McKenney
  0 siblings, 2 replies; 28+ messages in thread
From: Oleg Nesterov @ 2005-08-11 12:16 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

Paul E. McKenney wrote:
>
> --- linux-2.6.13-rc6/kernel/signal.c	2005-08-08 19:59:24.000000000 -0700
> +++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c	2005-08-10 08:20:25.000000000 -0700
> @@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct 
>
>  	ret = check_kill_permission(sig, info, p);
>  	if (!ret && sig && p->sighand) {
> +		if (!get_task_struct_rcu(p)) {
> +			return -ESRCH;
> +		}
>  		spin_lock_irqsave(&p->sighand->siglock, flags);
                                      ^^^^^^^
Is it correct?

The caller (kill_proc_info) does not take tasklist_lock anymore.
If p does exec() at this time it can change/free its ->sighand.

fs/exec.c:de_thread()
   773                  write_lock_irq(&tasklist_lock);
   774                  spin_lock(&oldsighand->siglock);
   775                  spin_lock(&newsighand->siglock);
   776
   777                  current->sighand = newsighand;
   778                  recalc_sigpending();
   779
   780                  spin_unlock(&newsighand->siglock);
   781                  spin_unlock(&oldsighand->siglock);
   782                  write_unlock_irq(&tasklist_lock);
   783
   784                  if (atomic_dec_and_test(&oldsighand->count))
   785                          kmem_cache_free(sighand_cachep, oldsighand);

Oleg.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-11  9:56 ` Ingo Molnar
@ 2005-08-11 14:14   ` Paul E. McKenney
  2005-08-12  2:00   ` Lee Revell
  1 sibling, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-11 14:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, dipankar, rusty, bmark, Thomas Gleixner,
	Andrew Morton

On Thu, Aug 11, 2005 at 11:56:34AM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@us.ibm.com> wrote:
> 
> > Hello!
> > 
> > This patch is an experiment in use of RCU for individual code paths 
> > that read-acquire the tasklist lock, in this case, unicast signal 
> > delivery. It passes five kernbenches on 4-CPU x86, but obviously needs 
> > much more testing before it is considered for serious use, let alone 
> > inclusion.
> > 
> > My main question is whether I have the POSIX semantics covered.  I 
> > believe that I do, but thought I should check with people who are more 
> > familiar with POSIX than am I.
> > 
> > For the record, some shortcomings of this patch:
> > 
> > o	Needs lots more testing on more architectures.
> > 
> > o	Needs performance and stress testing.
> > 
> > o	Needs testing in Ingo's PREEMPT_RT environment.
> 
> cool patch! I have integrated it into my PREEMPT_RT tree, and all it 
> needed to boot was the patch below (doesnt affect the upstream kernel).  
> Using the raw IRQ flag isnt an issue in the RCU code, all the affected 
> codepaths are small and deterministic.
> 
> (without this patch it locked up after detecting IRQ7 - not sure why.)

Without this patch on an older version of PREEMPT_RT (V0.7.52-12),
it would boot, pass kernbench, but fail LTP.  Passed both on a stock
kernel.

Will re-run with your patch.  ;-)

> kernel still works fine after some (mostly light) testing.

Cool!  Next step for me is to run some focussed stress tests.

						Thanx, Paul

> 	Ingo
> 
> Index: linux/kernel/rcupdate.c
> ===================================================================
> --- linux.orig/kernel/rcupdate.c
> +++ linux/kernel/rcupdate.c
> @@ -134,11 +134,11 @@ void fastcall call_rcu(struct rcu_head *
>  
>  	head->func = func;
>  	head->next = NULL;
> -	local_irq_save(flags);
> +	raw_local_irq_save(flags);
>  	rdp = &__get_cpu_var(rcu_data);
>  	*rdp->nxttail = head;
>  	rdp->nxttail = &head->next;
> -	local_irq_restore(flags);
> +	raw_local_irq_restore(flags);
>  }
>  
>  /**
> @@ -165,11 +165,11 @@ void fastcall call_rcu_bh(struct rcu_hea
>  
>  	head->func = func;
>  	head->next = NULL;
> -	local_irq_save(flags);
> +	raw_local_irq_save(flags);
>  	rdp = &__get_cpu_var(rcu_bh_data);
>  	*rdp->nxttail = head;
>  	rdp->nxttail = &head->next;
> -	local_irq_restore(flags);
> +	raw_local_irq_restore(flags);
>  }
>  
>  /*
> @@ -305,11 +305,11 @@ static void rcu_check_quiescent_state(st
>  static void rcu_move_batch(struct rcu_data *this_rdp, struct rcu_head *list,
>  				struct rcu_head **tail)
>  {
> -	local_irq_disable();
> +	raw_local_irq_disable();
>  	*this_rdp->nxttail = list;
>  	if (list)
>  		this_rdp->nxttail = tail;
> -	local_irq_enable();
> +	raw_local_irq_enable();
>  }
>  
>  static void __rcu_offline_cpu(struct rcu_data *this_rdp,
> @@ -362,13 +362,13 @@ static void __rcu_process_callbacks(stru
>  		rdp->curtail = &rdp->curlist;
>  	}
>  
> -	local_irq_disable();
> +	raw_local_irq_disable();
>  	if (rdp->nxtlist && !rdp->curlist) {
>  		rdp->curlist = rdp->nxtlist;
>  		rdp->curtail = rdp->nxttail;
>  		rdp->nxtlist = NULL;
>  		rdp->nxttail = &rdp->nxtlist;
> -		local_irq_enable();
> +		raw_local_irq_enable();
>  
>  		/*
>  		 * start the next batch of callbacks
> @@ -388,7 +388,7 @@ static void __rcu_process_callbacks(stru
>  			spin_unlock(&rsp->lock);
>  		}
>  	} else {
> -		local_irq_enable();
> +		raw_local_irq_enable();
>  	}
>  	rcu_check_quiescent_state(rcp, rsp, rdp);
>  	if (rdp->donelist)
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-11 12:16 Oleg Nesterov
@ 2005-08-11 15:20 ` Paul E. McKenney
  2005-08-12  1:56 ` Paul E. McKenney
  1 sibling, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-11 15:20 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

On Thu, Aug 11, 2005 at 04:16:53PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > --- linux-2.6.13-rc6/kernel/signal.c	2005-08-08 19:59:24.000000000 -0700
> > +++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c	2005-08-10 08:20:25.000000000 -0700
> > @@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct 
> >
> >  	ret = check_kill_permission(sig, info, p);
> >  	if (!ret && sig && p->sighand) {
> > +		if (!get_task_struct_rcu(p)) {
> > +			return -ESRCH;
> > +		}
> >  		spin_lock_irqsave(&p->sighand->siglock, flags);
>                                       ^^^^^^^
> Is it correct?
> 
> The caller (kill_proc_info) does not take tasklist_lock anymore.
> If p does exec() at this time it can change/free its ->sighand.
> 
> fs/exec.c:de_thread()
>    773                  write_lock_irq(&tasklist_lock);
>    774                  spin_lock(&oldsighand->siglock);
>    775                  spin_lock(&newsighand->siglock);
>    776
>    777                  current->sighand = newsighand;
>    778                  recalc_sigpending();
>    779
>    780                  spin_unlock(&newsighand->siglock);
>    781                  spin_unlock(&oldsighand->siglock);
>    782                  write_unlock_irq(&tasklist_lock);
>    783
>    784                  if (atomic_dec_and_test(&oldsighand->count))
>    785                          kmem_cache_free(sighand_cachep, oldsighand);

Looks suspicious to me!  ;-)  Will look into this one, thank you for
pointing it out!

						Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-10 17:11 [RFC,PATCH] Use RCU to protect tasklist for unicast signals Paul E. McKenney
  2005-08-11  9:56 ` Ingo Molnar
@ 2005-08-11 17:14 ` Christoph Hellwig
  2005-08-11 17:56   ` Paul E. McKenney
  2005-08-11 18:00   ` Dipankar Sarma
  1 sibling, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2005-08-11 17:14 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, mingo, dipankar, rusty, bmark

On Wed, Aug 10, 2005 at 10:11:45AM -0700, Paul E. McKenney wrote:
> Hello!
> 
> This patch is an experiment in use of RCU for individual code paths that
> read-acquire the tasklist lock, in this case, unicast signal delivery.
> It passes five kernbenches on 4-CPU x86, but obviously needs much more
> testing before it is considered for serious use, let alone inclusion.

I think we should switch over tasklist_lock to RCU completely instead of
adding suck hacks.  I've started lots of preparation work to get rid of
tasklist_lock users outside of kernel/, especialy getting rid of any
use in modules.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-11 17:14 ` Christoph Hellwig
@ 2005-08-11 17:56   ` Paul E. McKenney
  2005-08-11 18:00   ` Dipankar Sarma
  1 sibling, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-11 17:56 UTC (permalink / raw)
  To: Christoph Hellwig, linux-kernel, mingo, dipankar, rusty, bmark

On Thu, Aug 11, 2005 at 06:14:51PM +0100, Christoph Hellwig wrote:
> On Wed, Aug 10, 2005 at 10:11:45AM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > This patch is an experiment in use of RCU for individual code paths that
> > read-acquire the tasklist lock, in this case, unicast signal delivery.
> > It passes five kernbenches on 4-CPU x86, but obviously needs much more
> > testing before it is considered for serious use, let alone inclusion.
> 
> I think we should switch over tasklist_lock to RCU completely instead of
> adding suck hacks.  I've started lots of preparation work to get rid of
> tasklist_lock users outside of kernel/, especialy getting rid of any
> use in modules.

I agree fully with your end goal -- I would certainly look funny
disagreeing, right?  ;-)  And Oleg pointed out an area I missed in my
patch, am fixing, but in the meantime it most certainly does qualify as
a suck hack.  :-/  Working on it...

But one of the cool things about RCU is that we can make this change
incrementally, with a series of small patches.  We can have some of
the read paths protected by RCU and others continuing to be protected
by read_lock(), so that we can avoid the need for a "flag day" that
hits all 300+ uses of tasklist_lock.

FWIW, the approach that I am taking to fix the bug Oleg and Christoph
spotted is roughly as follows:

o	Add "struct rcu_head rcu", "struct sighand_struct *successor",
	and "int deleted" to struct sighand_struct.  This allows
	reliable signal delivery in face of sighand_struct replacements.
	If an RCU reader finds (deleted && !successor), that reader
	is trying to signal a dying process, so gives up.  If an
	RCU reader instead finds (deleted && successor), the reader
	traverses the successor pointer to find the current version.

o	Apply call_rcu() to deletion of struct sighand_struct, with
	the exception of a couple of failure paths in exec.c, where
	the sighand_struct was never exposed to RCU readers.

Thoughts?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-11 17:14 ` Christoph Hellwig
  2005-08-11 17:56   ` Paul E. McKenney
@ 2005-08-11 18:00   ` Dipankar Sarma
  2005-08-11 18:12     ` Dipankar Sarma
  1 sibling, 1 reply; 28+ messages in thread
From: Dipankar Sarma @ 2005-08-11 18:00 UTC (permalink / raw)
  To: Christoph Hellwig, Paul E. McKenney, linux-kernel, mingo, rusty,
	bmark

On Thu, Aug 11, 2005 at 06:14:51PM +0100, Christoph Hellwig wrote:
> On Wed, Aug 10, 2005 at 10:11:45AM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > This patch is an experiment in use of RCU for individual code paths that
> > read-acquire the tasklist lock, in this case, unicast signal delivery.
> > It passes five kernbenches on 4-CPU x86, but obviously needs much more
> > testing before it is considered for serious use, let alone inclusion.
> 
> I think we should switch over tasklist_lock to RCU completely instead of
> adding suck hacks.  I've started lots of preparation work to get rid of
> tasklist_lock users outside of kernel/, especialy getting rid of any
> use in modules.

That would be really helpful, specially the ones in drivers/char/tty*.c :)

When I worked on this last (a year or so ago), it seemed that I would
need to put a number of additional structures under RCU control.
It would be better to gradually move it towards RCU rather than
trying make all the readers lock-free.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-11 18:00   ` Dipankar Sarma
@ 2005-08-11 18:12     ` Dipankar Sarma
  0 siblings, 0 replies; 28+ messages in thread
From: Dipankar Sarma @ 2005-08-11 18:12 UTC (permalink / raw)
  To: Christoph Hellwig, Paul E. McKenney, linux-kernel, mingo, rusty,
	bmark

On Thu, Aug 11, 2005 at 11:30:44PM +0530, Dipankar Sarma wrote:
> When I worked on this last (a year or so ago), it seemed that I would
> need to put a number of additional structures under RCU control.
> It would be better to gradually move it towards RCU rather than
> trying make all the readers lock-free.

Just for reference, this was my tasks-rcu patch. 2.6.0-test2, no less :)
I was interested in get_pid_list() at that time. IIRC, I tested it with
lots of top running along with other tests.

Thanks
Dipankar


Incremental patch to do lockfree traversal of the task list using
RCU. For now it just does one of the costlies ones in /proc.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>


 fs/exec.c             |    1 +
 fs/proc/base.c        |    5 +++--
 include/linux/sched.h |   21 ++++++++++++++++-----
 3 files changed, 20 insertions(+), 7 deletions(-)

diff -puN fs/exec.c~tasks-rcu fs/exec.c
--- linux-2.6.0-test2-ds/fs/exec.c~tasks-rcu	2003-08-04 21:48:38.000000000 +0530
+++ linux-2.6.0-test2-ds-dipankar/fs/exec.c	2003-08-04 21:49:26.000000000 +0530
@@ -676,6 +676,7 @@ static inline int de_thread(struct task_
 
 		list_del(&current->tasks);
 		list_add_tail(&current->tasks, &init_task.tasks);
+		list_add_tail_rcu(&current->tasks, &init_task.tasks);
 		current->exit_signal = SIGCHLD;
 		state = leader->state;
 
diff -puN fs/proc/base.c~tasks-rcu fs/proc/base.c
--- linux-2.6.0-test2-ds/fs/proc/base.c~tasks-rcu	2003-08-04 21:48:38.000000000 +0530
+++ linux-2.6.0-test2-ds-dipankar/fs/proc/base.c	2003-08-04 21:49:59.000000000 +0530
@@ -32,6 +32,7 @@
 #include <linux/mount.h>
 #include <linux/security.h>
 #include <linux/ptrace.h>
+#include <linux/rcupdate.h>
 
 /*
  * For hysterical raisins we keep the same inumbers as in the old procfs.
@@ -1403,7 +1404,7 @@ static int get_pid_list(int index, unsig
 	int nr_pids = 0;
 
 	index--;
-	read_lock(&tasklist_lock);
+	rcu_read_lock();
 	for_each_process(p) {
 		int pid = p->pid;
 		if (!pid_alive(p))
@@ -1415,7 +1416,7 @@ static int get_pid_list(int index, unsig
 		if (nr_pids >= PROC_MAXPIDS)
 			break;
 	}
-	read_unlock(&tasklist_lock);
+	rcu_read_unlock();
 	return nr_pids;
 }
 
diff -puN include/linux/sched.h~tasks-rcu include/linux/sched.h
--- linux-2.6.0-test2-ds/include/linux/sched.h~tasks-rcu	2003-08-04 21:48:38.000000000 +0530
+++ linux-2.6.0-test2-ds-dipankar/include/linux/sched.h	2003-08-04 21:54:58.000000000 +0530
@@ -28,6 +28,7 @@
 #include <linux/completion.h>
 #include <linux/pid.h>
 #include <linux/percpu.h>
+#include <linux/rcupdate.h>
 
 struct exec_domain;
 
@@ -456,13 +457,23 @@ struct task_struct {
 	struct io_context *io_context;
 
 	unsigned long ptrace_message;
+	struct rcu_head rcu;
 	siginfo_t *last_siginfo; /* For ptrace use.  */
 };
 
 extern void __put_task_struct(struct task_struct *tsk);
 #define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0)
-#define put_task_struct(tsk) \
-do { if (atomic_dec_and_test(&(tsk)->usage)) __put_task_struct(tsk); } while(0)
+static void put_task_struct_rcu(struct rcu_head *rcu)
+{
+	struct task_struct *tsk = container_of(rcu, struct task_struct, rcu);
+	__put_task_struct(tsk);
+}
+static inline void put_task_struct(struct task_struct *tsk)
+{
+	if (atomic_dec_and_test(&tsk->usage))
+		call_rcu(&tsk->rcu, put_task_struct_rcu);
+}
+
 
 /*
  * Per process flags
@@ -675,13 +686,13 @@ extern void wait_task_inactive(task_t * 
 
 #define REMOVE_LINKS(p) do {					\
 	if (thread_group_leader(p))				\
-		list_del_init(&(p)->tasks);			\
+		list_del_rcu(&(p)->tasks);			\
 	remove_parent(p);					\
 	} while (0)
 
 #define SET_LINKS(p) do {					\
 	if (thread_group_leader(p))				\
-		list_add_tail(&(p)->tasks,&init_task.tasks);	\
+		list_add_tail_rcu(&(p)->tasks,&init_task.tasks);	\
 	add_parent(p, (p)->parent);				\
 	} while (0)
 
@@ -689,7 +700,7 @@ extern void wait_task_inactive(task_t * 
 #define prev_task(p)	list_entry((p)->tasks.prev, struct task_struct, tasks)
 
 #define for_each_process(p) \
-	for (p = &init_task ; (p = next_task(p)) != &init_task ; )
+	for (p = &init_task ; (p = next_task(p)),({ read_barrier_depends(); 0;}),p != &init_task ; )
 
 /*
  * Careful: do_each_thread/while_each_thread is a double loop so

_

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-11 12:16 Oleg Nesterov
  2005-08-11 15:20 ` Paul E. McKenney
@ 2005-08-12  1:56 ` Paul E. McKenney
  2005-08-12  8:51   ` Oleg Nesterov
  1 sibling, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-12  1:56 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

On Thu, Aug 11, 2005 at 04:16:53PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > --- linux-2.6.13-rc6/kernel/signal.c	2005-08-08 19:59:24.000000000 -0700
> > +++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c	2005-08-10 08:20:25.000000000 -0700
> > @@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct 
> >
> >  	ret = check_kill_permission(sig, info, p);
> >  	if (!ret && sig && p->sighand) {
> > +		if (!get_task_struct_rcu(p)) {
> > +			return -ESRCH;
> > +		}
> >  		spin_lock_irqsave(&p->sighand->siglock, flags);
>                                       ^^^^^^^
> Is it correct?

Most definitely not!  Thank you again for catching this one, would have
taken some serious test-and-debug time to root it out the hard way.

Fix provided as a patch against V0.7.53-01, probably still has some
bugs, as it has not yet been thoroughly tested.  General approach is
to RCU-protect the sighand pointer from task_struct to sighand_struct.

Will be testing more thoroughly, in the meantime, thoughts?

							Thanx, Paul

 fs/exec.c             |    6 +++++-
 include/linux/sched.h |   10 ++++++++++
 kernel/fork.c         |   11 +++++++++++
 kernel/signal.c       |   16 +++++++++++++---
 4 files changed, 39 insertions(+), 4 deletions(-)

diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c	2005-08-11 11:44:55.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c	2005-08-11 12:26:45.000000000 -0700
@@ -773,6 +773,8 @@ no_thread_group:
 		 */
 		spin_lock_init(&newsighand->siglock);
 		atomic_set(&newsighand->count, 1);
+		newsighand->deleted = 0;
+		newsighand->successor = NULL;
 		memcpy(newsighand->action, oldsighand->action,
 		       sizeof(newsighand->action));
 
@@ -785,12 +787,14 @@ no_thread_group:
 		recalc_sigpending();
 
 		task_unlock(current);
+		oldsighand->deleted = 1;
+		oldsighand->successor = newsighand;
 		spin_unlock(&newsighand->siglock);
 		spin_unlock(&oldsighand->siglock);
 		write_unlock_irq(&tasklist_lock);
 
 		if (atomic_dec_and_test(&oldsighand->count))
-			kmem_cache_free(sighand_cachep, oldsighand);
+			sighand_free(oldsighand);
 	}
 
 	BUG_ON(!thread_group_empty(current));
diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h	2005-08-11 11:44:57.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h	2005-08-11 12:17:01.000000000 -0700
@@ -450,8 +450,18 @@ struct sighand_struct {
 	atomic_t		count;
 	struct k_sigaction	action[_NSIG];
 	spinlock_t		siglock;
+	int			deleted;
+	struct sighand_struct	*successor;
+	struct rcu_head		rcu;
 };
 
+static inline void sighand_free(struct sighand_struct *sp)
+{
+	extern void sighand_free_cb(struct rcu_head *rhp);
+
+	call_rcu(&sp->rcu, sighand_free_cb);
+}
+
 /*
  * NOTE! "signal_struct" does not have it's own
  * locking, because a shared signal_struct always
diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c	2005-08-11 11:44:57.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c	2005-08-11 13:05:17.000000000 -0700
@@ -43,6 +43,7 @@
 #include <linux/acct.h>
 #include <linux/kthread.h>
 #include <linux/notifier.h>
+#include <linux/rcupdate.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -769,6 +770,14 @@ int unshare_files(void)
 
 EXPORT_SYMBOL(unshare_files);
 
+void sighand_free_cb(struct rcu_head *rhp)
+{
+	struct sighand_struct *sp =
+		container_of(rhp, struct sighand_struct, rcu);
+
+	kmem_cache_free(sighand_cachep, sp);
+}
+
 static inline int copy_sighand(unsigned long clone_flags, struct task_struct * tsk)
 {
 	struct sighand_struct *sig;
@@ -783,6 +792,8 @@ static inline int copy_sighand(unsigned 
 		return -ENOMEM;
 	spin_lock_init(&sig->siglock);
 	atomic_set(&sig->count, 1);
+	sig->deleted = 0;
+	sig->successor = 0;
 	memcpy(sig->action, current->sighand->action, sizeof(sig->action));
 	return 0;
 }
diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c	2005-08-11 11:44:57.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c	2005-08-11 17:37:55.000000000 -0700
@@ -1150,16 +1150,26 @@ void zap_other_threads(struct task_struc
 int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
 {
 	unsigned long flags;
+	struct sighand_struct *sp;
 	int ret;
 
 	ret = check_kill_permission(sig, info, p);
-	if (!ret && sig && p->sighand) {
+	if (!ret && sig && (sp = p->sighand)) {
 		if (!get_task_struct_rcu(p)) {
 			return -ESRCH;
 		}
-		spin_lock_irqsave(&p->sighand->siglock, flags);
+		spin_lock_irqsave(&sp->siglock, flags);
+		while (sp->deleted) {
+			spin_unlock_irqrestore(&sp->siglock, flags);
+			sp = sp->successor;
+			if (sp == NULL) {
+				put_task_struct(p);
+				return -ESRCH;
+			}
+			spin_lock_irqsave(&sp->siglock, flags);
+		}
 		ret = __group_send_sig_info(sig, info, p);
-		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+		spin_unlock_irqrestore(&sp->siglock, flags);
 		put_task_struct(p);
 	}
 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-11  9:56 ` Ingo Molnar
  2005-08-11 14:14   ` Paul E. McKenney
@ 2005-08-12  2:00   ` Lee Revell
  2005-08-12  6:36     ` Ingo Molnar
  1 sibling, 1 reply; 28+ messages in thread
From: Lee Revell @ 2005-08-12  2:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Paul E. McKenney, linux-kernel, dipankar, rusty, bmark,
	Thomas Gleixner, Andrew Morton

On Thu, 2005-08-11 at 11:56 +0200, Ingo Molnar wrote:
> > For the record, some shortcomings of this patch:
> > 
> > o	Needs lots more testing on more architectures.
> > 
> > o	Needs performance and stress testing.
> > 
> > o	Needs testing in Ingo's PREEMPT_RT environment.
> 
> cool patch! I have integrated it into my PREEMPT_RT tree, and all it 
> needed to boot was the patch below (doesnt affect the upstream kernel).  
> Using the raw IRQ flag isnt an issue in the RCU code, all the affected 
> codepaths are small and deterministic.
> 

Ingo,

Doesn't this fix the longest latency we were seeing with
PREEMPT_DESKTOP, I don't have a trace handy but the upshot was "signal
delivery must remain atomic on !PREEMPT_RT"?

Lee


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-12  2:00   ` Lee Revell
@ 2005-08-12  6:36     ` Ingo Molnar
  2005-08-12 20:57       ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2005-08-12  6:36 UTC (permalink / raw)
  To: Lee Revell
  Cc: Paul E. McKenney, linux-kernel, dipankar, rusty, bmark,
	Thomas Gleixner, Andrew Morton


* Lee Revell <rlrevell@joe-job.com> wrote:

> Doesn't this fix the longest latency we were seeing with 
> PREEMPT_DESKTOP, I don't have a trace handy but the upshot was "signal 
> delivery must remain atomic on !PREEMPT_RT"?

yes - although Paul's patch converts only a portion of the signal code 
to RCU-read-lock, so i'd expect there to be other latencies too. Might 
be worth a test (once we've sorted out the HRT build bugs).

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-12  1:56 ` Paul E. McKenney
@ 2005-08-12  8:51   ` Oleg Nesterov
  2005-08-12 15:42     ` Paul E. McKenney
  2005-08-15 17:44     ` Paul E. McKenney
  0 siblings, 2 replies; 28+ messages in thread
From: Oleg Nesterov @ 2005-08-12  8:51 UTC (permalink / raw)
  To: paulmck; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

Paul E. McKenney wrote:
>
> --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c	2005-08-11 11:44:55.000000000 -0700
> +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c	2005-08-11 12:26:45.000000000 -0700
> [ ... snip ... ]
> @@ -785,11 +787,13 @@ no_thread_group:
>  		recalc_sigpending();
> 
> +		oldsighand->deleted = 1;
> +		oldsighand->successor = newsighand;

I don't think this is correct.

This ->oldsighand can be shared with another CLONE_SIGHAND
process and will not be deleted, just unshared.

When the signal is sent to that process we must use ->oldsighand
for locking, not the oldsighand->successor == newsighand.

Oleg.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-12  8:51   ` Oleg Nesterov
@ 2005-08-12 15:42     ` Paul E. McKenney
  2005-08-15 17:44     ` Paul E. McKenney
  1 sibling, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-12 15:42 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

On Fri, Aug 12, 2005 at 12:51:17PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c	2005-08-11 11:44:55.000000000 -0700
> > +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c	2005-08-11 12:26:45.000000000 -0700
> > [ ... snip ... ]
> > @@ -785,11 +787,13 @@ no_thread_group:
> >  		recalc_sigpending();
> > 
> > +		oldsighand->deleted = 1;
> > +		oldsighand->successor = newsighand;
> 
> I don't think this is correct.
> 
> This ->oldsighand can be shared with another CLONE_SIGHAND
> process and will not be deleted, just unshared.
> 
> When the signal is sent to that process we must use ->oldsighand
> for locking, not the oldsighand->successor == newsighand.

Hmmm...  Sounds like I mostly managed to dig myself in deeper here.

Back to the drawing board...

						Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-12  6:36     ` Ingo Molnar
@ 2005-08-12 20:57       ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-12 20:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, linux-kernel, dipankar, rusty, bmark, Thomas Gleixner,
	Andrew Morton

On Fri, Aug 12, 2005 at 08:36:00AM +0200, Ingo Molnar wrote:
> 
> * Lee Revell <rlrevell@joe-job.com> wrote:
> 
> > Doesn't this fix the longest latency we were seeing with 
> > PREEMPT_DESKTOP, I don't have a trace handy but the upshot was "signal 
> > delivery must remain atomic on !PREEMPT_RT"?
> 
> yes - although Paul's patch converts only a portion of the signal code 
> to RCU-read-lock, so i'd expect there to be other latencies too. Might 
> be worth a test (once we've sorted out the HRT build bugs).

And there remains the question of how much of the benefit remains after
I handle the corner cases...  :-/

						Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-12  8:51   ` Oleg Nesterov
  2005-08-12 15:42     ` Paul E. McKenney
@ 2005-08-15 17:44     ` Paul E. McKenney
  2005-08-16  8:14       ` Ingo Molnar
  2005-08-16 11:56       ` Oleg Nesterov
  1 sibling, 2 replies; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-15 17:44 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

On Fri, Aug 12, 2005 at 12:51:17PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c	2005-08-11 11:44:55.000000000 -0700
> > +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c	2005-08-11 12:26:45.000000000 -0700
> > [ ... snip ... ]
> > @@ -785,11 +787,13 @@ no_thread_group:
> >  		recalc_sigpending();
> > 
> > +		oldsighand->deleted = 1;
> > +		oldsighand->successor = newsighand;
> 
> I don't think this is correct.
> 
> This ->oldsighand can be shared with another CLONE_SIGHAND
> process and will not be deleted, just unshared.
> 
> When the signal is sent to that process we must use ->oldsighand
> for locking, not the oldsighand->successor == newsighand.

OK, the attached instead revalidates that the task struct still references
the sighand_struct after obtaining the lock (and passes kernbench and
LTP, which tells me I need to get better tests!).  However, your quick
responses motivated me to read more of the code (yes, I know, I should
have done -that- beforehand!!!), and there are some remaining problems
that I need to sort out, including:

o	Fatal signals affect all threads in a process, and the
	existing code looks like it assumes that tasklist_lock
	is held during delivery of such signals.  In this patch,
	that assumption does not hold.

o	Some of the functions invoked by __group_send_sig_info(),
	including handle_stop_signal(), momentarily drop ->siglock.

I will be looking at a few approaches to handle these situations, but,
in the meantime, any other combinations that I missed?

							Thanx, Paul


diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c	2005-08-11 11:44:55.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c	2005-08-12 21:18:33.000000000 -0700
@@ -790,7 +790,7 @@ no_thread_group:
 		write_unlock_irq(&tasklist_lock);
 
 		if (atomic_dec_and_test(&oldsighand->count))
-			kmem_cache_free(sighand_cachep, oldsighand);
+			sighand_free(oldsighand);
 	}
 
 	BUG_ON(!thread_group_empty(current));
diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h	2005-08-11 11:44:57.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h	2005-08-12 21:19:34.000000000 -0700
@@ -450,8 +450,16 @@ struct sighand_struct {
 	atomic_t		count;
 	struct k_sigaction	action[_NSIG];
 	spinlock_t		siglock;
+	struct rcu_head		rcu;
 };
 
+static inline void sighand_free(struct sighand_struct *sp)
+{
+	extern void sighand_free_cb(struct rcu_head *rhp);
+
+	call_rcu(&sp->rcu, sighand_free_cb);
+}
+
 /*
  * NOTE! "signal_struct" does not have it's own
  * locking, because a shared signal_struct always
diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c	2005-08-11 11:44:57.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c	2005-08-12 21:24:00.000000000 -0700
@@ -43,6 +43,7 @@
 #include <linux/acct.h>
 #include <linux/kthread.h>
 #include <linux/notifier.h>
+#include <linux/rcupdate.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -769,6 +770,14 @@ int unshare_files(void)
 
 EXPORT_SYMBOL(unshare_files);
 
+void sighand_free_cb(struct rcu_head *rhp)
+{
+	struct sighand_struct *sp =
+		container_of(rhp, struct sighand_struct, rcu);
+
+	kmem_cache_free(sighand_cachep, sp);
+}
+
 static inline int copy_sighand(unsigned long clone_flags, struct task_struct * tsk)
 {
 	struct sighand_struct *sig;
diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c	2005-08-11 11:44:57.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c	2005-08-14 19:53:28.000000000 -0700
@@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct *
 	struct sighand_struct * sighand = tsk->sighand;
 
 	/* Ok, we're done with the signal handlers */
+	spin_lock(&sighand->siglock);
 	tsk->sighand = NULL;
 	if (atomic_dec_and_test(&sighand->count))
-		kmem_cache_free(sighand_cachep, sighand);
+		sighand_free(sighand);
+	spin_unlock(&sighand->siglock);
 }
 
 void exit_sighand(struct task_struct *tsk)
@@ -1150,16 +1152,23 @@ void zap_other_threads(struct task_struc
 int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
 {
 	unsigned long flags;
+	struct sighand_struct *sp;
 	int ret;
 
+retry:
 	ret = check_kill_permission(sig, info, p);
-	if (!ret && sig && p->sighand) {
+	if (!ret && sig && (sp = p->sighand)) {
 		if (!get_task_struct_rcu(p)) {
 			return -ESRCH;
 		}
-		spin_lock_irqsave(&p->sighand->siglock, flags);
+		spin_lock_irqsave(&sp->siglock, flags);
+		if (p->sighand != sp) {
+			spin_unlock_irqrestore(&sp->siglock, flags);
+			put_task_struct(p);
+			goto retry;
+		}
 		ret = __group_send_sig_info(sig, info, p);
-		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+		spin_unlock_irqrestore(&sp->siglock, flags);
 		put_task_struct(p);
 	}
 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-15 17:44     ` Paul E. McKenney
@ 2005-08-16  8:14       ` Ingo Molnar
  2005-08-16 11:56       ` Oleg Nesterov
  1 sibling, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2005-08-16  8:14 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Oleg Nesterov, Dipankar Sarma, linux-kernel


* Paul E. McKenney <paulmck@us.ibm.com> wrote:

> OK, the attached instead revalidates that the task struct still 
> references the sighand_struct after obtaining the lock (and passes 
> kernbench and LTP, which tells me I need to get better tests!).

i've applied this to the -RT tree, and it's looking good so far from a 
basic stability POV.

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-15 17:44     ` Paul E. McKenney
  2005-08-16  8:14       ` Ingo Molnar
@ 2005-08-16 11:56       ` Oleg Nesterov
  2005-08-16 17:07         ` Paul E. McKenney
  1 sibling, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2005-08-16 11:56 UTC (permalink / raw)
  To: paulmck; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

Paul E. McKenney wrote:
>
> OK, the attached instead revalidates that the task struct still references
> the sighand_struct after obtaining the lock

Personally I think this is a way to go. A nitpick suggestion,
could you make a separate function (say, lock_task_sighand)
which does all this job?

> > and there are some remaining problems
> > that I need to sort out, including:
> ...
>
> o	Some of the functions invoked by __group_send_sig_info(),
> 	including handle_stop_signal(), momentarily drop ->siglock.

Just to be sure that one point doesn't escape your attention, this:

> +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c	2005-08-14 19:53:28.000000000 -0700
> @@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct *
>  	struct sighand_struct * sighand = tsk->sighand;
>  
>  	/* Ok, we're done with the signal handlers */
> +	spin_lock(&sighand->siglock);
>  	tsk->sighand = NULL;
>  	if (atomic_dec_and_test(&sighand->count))
> -		kmem_cache_free(sighand_cachep, sighand);
> +		sighand_free(sighand);
> +	spin_unlock(&sighand->siglock);

is not enough (and unneeded). Unless I missed something, we have
a race:

release_task:

	__exit_signal:
		spin_lock(sighand);
		spin_unlock(sighand);
		flush_sigqueue(&sig->shared_pending);
		kmem_cache_free(tsk->signal);
							// here comes group_send_sig_info(), locks ->sighand,
							// delivers the signal to the ->shared_pending.
							// siginfo leaked, or crash.
	__exit_sighand:
		spin_lock(sighand);
		tsk->sighand = NULL;
		// too late !!!!

I think that release_task() should not use __exit_sighand()
at all. Instead, __exit_signal() should set tsk->sighand = NULL
under ->sighand->lock.

>  int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
>  {
>  	unsigned long flags;
> +	struct sighand_struct *sp;
>  	int ret;
>
> +retry:
>  	ret = check_kill_permission(sig, info, p);
> -	if (!ret && sig && p->sighand) {
> +	if (!ret && sig && (sp = p->sighand)) {
>  		if (!get_task_struct_rcu(p)) {
>  			return -ESRCH;
>  		}
> -		spin_lock_irqsave(&p->sighand->siglock, flags);
> +		spin_lock_irqsave(&sp->siglock, flags);
> +		if (p->sighand != sp) {
> +			spin_unlock_irqrestore(&sp->siglock, flags);
> +			put_task_struct(p);
> +			goto retry;
> +		}
>  		ret = __group_send_sig_info(sig, info, p);
> -		spin_unlock_irqrestore(&p->sighand->siglock, flags);
> +		spin_unlock_irqrestore(&sp->siglock, flags);
>  		put_task_struct(p);

Do we really need get_task_struct_rcu/put_task_struct here?

The task_struct can't go away under us, it is rcu protected.
When ->sighand is locked, and it is still the same after
the re-check, it means that 'p' has not done __exit_signal()
yet, so it is safe to send the signal.

And if the task has ->usage == 0, it means that it also has
->sighand == NULL, and your code will notice that.

No?

Oleg.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-16 11:56       ` Oleg Nesterov
@ 2005-08-16 17:07         ` Paul E. McKenney
  2005-08-17  1:48           ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-16 17:07 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

On Tue, Aug 16, 2005 at 03:56:05PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > OK, the attached instead revalidates that the task struct still references
> > the sighand_struct after obtaining the lock
> 
> Personally I think this is a way to go. A nitpick suggestion,
> could you make a separate function (say, lock_task_sighand)
> which does all this job?

Sounds good, will do!

The other thing that jumped out at me is that signals are very different
animals from a locking viewpoint depending on whether they are:

1.	ignored,

2.	caught by a single thread,

3.	fatal to multiple threads/processes (though I don't know
	of anything that shares sighand_struct between separate
	processes), or

4.	otherwise global to multiple threads/processes (such as
	SIGSTOP and SIGCONT).

And there are probably other distinctions that I have not yet caught
on to.

One way to approach this would be to make your suggested lock_task_sighand()
look at the signal and acquire the appropriate locks.  If, having acquired
a given set of locks, it found that the needed set had changed (e.g., due
to racing exec() or sigaction()), then it drops the locks and retries.

Does this make sense?

This approach assumes that realtime latency (of the kill() operation
itself) is critical only cases #1 and #2 above.  This makes sense to me,
but some of you might know of situations where #3 and #4 are important.
But I am hoping not.  ;-)

> > > and there are some remaining problems
> > > that I need to sort out, including:
> > ...
> >
> > o	Some of the functions invoked by __group_send_sig_info(),
> > 	including handle_stop_signal(), momentarily drop ->siglock.
> 
> Just to be sure that one point doesn't escape your attention, this:
> 
> > +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c	2005-08-14 19:53:28.000000000 -0700
> > @@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct *
> >  	struct sighand_struct * sighand = tsk->sighand;
> >  
> >  	/* Ok, we're done with the signal handlers */
> > +	spin_lock(&sighand->siglock);
> >  	tsk->sighand = NULL;
> >  	if (atomic_dec_and_test(&sighand->count))
> > -		kmem_cache_free(sighand_cachep, sighand);
> > +		sighand_free(sighand);
> > +	spin_unlock(&sighand->siglock);
> 
> is not enough (and unneeded). Unless I missed something, we have
> a race:
> 
> release_task:
> 
> 	__exit_signal:
> 		spin_lock(sighand);
> 		spin_unlock(sighand);
> 		flush_sigqueue(&sig->shared_pending);
> 		kmem_cache_free(tsk->signal);
> 							// here comes group_send_sig_info(), locks ->sighand,
> 							// delivers the signal to the ->shared_pending.
> 							// siginfo leaked, or crash.
> 	__exit_sighand:
> 		spin_lock(sighand);
> 		tsk->sighand = NULL;
> 		// too late !!!!
> 
> I think that release_task() should not use __exit_sighand()
> at all. Instead, __exit_signal() should set tsk->sighand = NULL
> under ->sighand->lock.

Will look into this -- I was inserting the locking to handle a race with
my revalidation.  It looks like I also need to pay some more attention
to the race with exiting tasks, good catch!  Your suggestion of invoking
__exit_signal() from under siglock within __exit_signal() sounds good
at first glance, will think it through.

> >  int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
> >  {
> >  	unsigned long flags;
> > +	struct sighand_struct *sp;
> >  	int ret;
> >
> > +retry:
> >  	ret = check_kill_permission(sig, info, p);
> > -	if (!ret && sig && p->sighand) {
> > +	if (!ret && sig && (sp = p->sighand)) {
> >  		if (!get_task_struct_rcu(p)) {
> >  			return -ESRCH;
> >  		}
> > -		spin_lock_irqsave(&p->sighand->siglock, flags);
> > +		spin_lock_irqsave(&sp->siglock, flags);
> > +		if (p->sighand != sp) {
> > +			spin_unlock_irqrestore(&sp->siglock, flags);
> > +			put_task_struct(p);
> > +			goto retry;
> > +		}
> >  		ret = __group_send_sig_info(sig, info, p);
> > -		spin_unlock_irqrestore(&p->sighand->siglock, flags);
> > +		spin_unlock_irqrestore(&sp->siglock, flags);
> >  		put_task_struct(p);
> 
> Do we really need get_task_struct_rcu/put_task_struct here?
> 
> The task_struct can't go away under us, it is rcu protected.
> When ->sighand is locked, and it is still the same after
> the re-check, it means that 'p' has not done __exit_signal()
> yet, so it is safe to send the signal.
> 
> And if the task has ->usage == 0, it means that it also has
> ->sighand == NULL, and your code will notice that.
> 
> No?

Seems plausible.  I got paranoid after seeing the lock dropped in
handle_stop_signal(), though.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-16 17:07         ` Paul E. McKenney
@ 2005-08-17  1:48           ` Paul E. McKenney
  2005-08-17  6:35             ` Ingo Molnar
  2005-08-17 14:35             ` Oleg Nesterov
  0 siblings, 2 replies; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-17  1:48 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

On Tue, Aug 16, 2005 at 10:07:14AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 16, 2005 at 03:56:05PM +0400, Oleg Nesterov wrote:
> > Paul E. McKenney wrote:
> > >
> > > OK, the attached instead revalidates that the task struct still references
> > > the sighand_struct after obtaining the lock
> > 
> > Personally I think this is a way to go. A nitpick suggestion,
> > could you make a separate function (say, lock_task_sighand)
> > which does all this job?
> 
> Sounds good, will do!
> 
> The other thing that jumped out at me is that signals are very different
> animals from a locking viewpoint depending on whether they are:
> 
> 1.	ignored,
> 
> 2.	caught by a single thread,
> 
> 3.	fatal to multiple threads/processes (though I don't know
> 	of anything that shares sighand_struct between separate
> 	processes), or
> 
> 4.	otherwise global to multiple threads/processes (such as
> 	SIGSTOP and SIGCONT).
> 
> And there are probably other distinctions that I have not yet caught
> on to.
> 
> One way to approach this would be to make your suggested lock_task_sighand()
> look at the signal and acquire the appropriate locks.  If, having acquired
> a given set of locks, it found that the needed set had changed (e.g., due
> to racing exec() or sigaction()), then it drops the locks and retries.
> 
> Does this make sense?
> 
> This approach assumes that realtime latency (of the kill() operation
> itself) is critical only cases #1 and #2 above.  This makes sense to me,
> but some of you might know of situations where #3 and #4 are important.
> But I am hoping not.  ;-)

OK, for this sort of approach to work, lock_task_sighand() must take and
return some sort of mask indicating what locks are held.  The mask returned
by lock_task_sighand() would then be passed to an unlock_task_sighand().

Another alternative would be to make a macro that allowed an arbitrary
statement to be wrapped up in it.

Neither of these are as appealing as I would like -- other ideas?

I have not yet attacked this in the attached patch.

> > > > and there are some remaining problems
> > > > that I need to sort out, including:
> > > ...
> > >
> > > o	Some of the functions invoked by __group_send_sig_info(),
> > > 	including handle_stop_signal(), momentarily drop ->siglock.
> > 
> > Just to be sure that one point doesn't escape your attention, this:
> > 
> > > +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c	2005-08-14 19:53:28.000000000 -0700
> > > @@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct *
> > >  	struct sighand_struct * sighand = tsk->sighand;
> > >  
> > >  	/* Ok, we're done with the signal handlers */
> > > +	spin_lock(&sighand->siglock);
> > >  	tsk->sighand = NULL;
> > >  	if (atomic_dec_and_test(&sighand->count))
> > > -		kmem_cache_free(sighand_cachep, sighand);
> > > +		sighand_free(sighand);
> > > +	spin_unlock(&sighand->siglock);
> > 
> > is not enough (and unneeded). Unless I missed something, we have
> > a race:
> > 
> > release_task:
> > 
> > 	__exit_signal:
> > 		spin_lock(sighand);
> > 		spin_unlock(sighand);
> > 		flush_sigqueue(&sig->shared_pending);
> > 		kmem_cache_free(tsk->signal);
> > 							// here comes group_send_sig_info(), locks ->sighand,
> > 							// delivers the signal to the ->shared_pending.
> > 							// siginfo leaked, or crash.
> > 	__exit_sighand:
> > 		spin_lock(sighand);
> > 		tsk->sighand = NULL;
> > 		// too late !!!!
> > 
> > I think that release_task() should not use __exit_sighand()
> > at all. Instead, __exit_signal() should set tsk->sighand = NULL
> > under ->sighand->lock.
> 
> Will look into this -- I was inserting the locking to handle a race with
> my revalidation.  It looks like I also need to pay some more attention
> to the race with exiting tasks, good catch!  Your suggestion of invoking
> __exit_signal() from under siglock within __exit_signal() sounds good
> at first glance, will think it through.

This turned out to be easy, the attached patch covers it.

> > >  int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
> > >  {
> > >  	unsigned long flags;
> > > +	struct sighand_struct *sp;
> > >  	int ret;
> > >
> > > +retry:
> > >  	ret = check_kill_permission(sig, info, p);
> > > -	if (!ret && sig && p->sighand) {
> > > +	if (!ret && sig && (sp = p->sighand)) {
> > >  		if (!get_task_struct_rcu(p)) {
> > >  			return -ESRCH;
> > >  		}
> > > -		spin_lock_irqsave(&p->sighand->siglock, flags);
> > > +		spin_lock_irqsave(&sp->siglock, flags);
> > > +		if (p->sighand != sp) {
> > > +			spin_unlock_irqrestore(&sp->siglock, flags);
> > > +			put_task_struct(p);
> > > +			goto retry;
> > > +		}
> > >  		ret = __group_send_sig_info(sig, info, p);
> > > -		spin_unlock_irqrestore(&p->sighand->siglock, flags);
> > > +		spin_unlock_irqrestore(&sp->siglock, flags);
> > >  		put_task_struct(p);
> > 
> > Do we really need get_task_struct_rcu/put_task_struct here?
> > 
> > The task_struct can't go away under us, it is rcu protected.
> > When ->sighand is locked, and it is still the same after
> > the re-check, it means that 'p' has not done __exit_signal()
> > yet, so it is safe to send the signal.
> > 
> > And if the task has ->usage == 0, it means that it also has
> > ->sighand == NULL, and your code will notice that.
> > 
> > No?
> 
> Seems plausible.  I got paranoid after seeing the lock dropped in
> handle_stop_signal(), though.

Still paranoid, but on the list for eventual cleanup after the rest of
the interactions are covered.

My tests are not finding even glaring races, so time to go and create
some torture tests before getting too much more elaborate.  10,000 eyes
are nice (and Oleg's eyes do seem to be working especially well), but
a good software-test sledgehammer has its uses as well.

						Thanx, Paul

diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c	2005-08-11 11:44:55.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c	2005-08-12 21:18:33.000000000 -0700
@@ -790,7 +790,7 @@ no_thread_group:
 		write_unlock_irq(&tasklist_lock);
 
 		if (atomic_dec_and_test(&oldsighand->count))
-			kmem_cache_free(sighand_cachep, oldsighand);
+			sighand_free(oldsighand);
 	}
 
 	BUG_ON(!thread_group_empty(current));
diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h	2005-08-11 11:44:57.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h	2005-08-12 21:19:34.000000000 -0700
@@ -450,8 +450,16 @@ struct sighand_struct {
 	atomic_t		count;
 	struct k_sigaction	action[_NSIG];
 	spinlock_t		siglock;
+	struct rcu_head		rcu;
 };
 
+static inline void sighand_free(struct sighand_struct *sp)
+{
+	extern void sighand_free_cb(struct rcu_head *rhp);
+
+	call_rcu(&sp->rcu, sighand_free_cb);
+}
+
 /*
  * NOTE! "signal_struct" does not have it's own
  * locking, because a shared signal_struct always
diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/exit.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/exit.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/exit.c	2005-08-11 11:44:57.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/exit.c	2005-08-16 11:37:44.000000000 -0700
@@ -74,7 +74,6 @@ repeat: 
 		__ptrace_unlink(p);
 	BUG_ON(!list_empty(&p->ptrace_list) || !list_empty(&p->ptrace_children));
 	__exit_signal(p);
-	__exit_sighand(p);
 	/*
 	 * Note that the fastpath in sys_times depends on __exit_signal having
 	 * updated the counters before a task is removed from the tasklist of
diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c	2005-08-11 11:44:57.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c	2005-08-12 21:24:00.000000000 -0700
@@ -43,6 +43,7 @@
 #include <linux/acct.h>
 #include <linux/kthread.h>
 #include <linux/notifier.h>
+#include <linux/rcupdate.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -769,6 +770,14 @@ int unshare_files(void)
 
 EXPORT_SYMBOL(unshare_files);
 
+void sighand_free_cb(struct rcu_head *rhp)
+{
+	struct sighand_struct *sp =
+		container_of(rhp, struct sighand_struct, rcu);
+
+	kmem_cache_free(sighand_cachep, sp);
+}
+
 static inline int copy_sighand(unsigned long clone_flags, struct task_struct * tsk)
 {
 	struct sighand_struct *sig;
diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c	2005-08-11 11:44:57.000000000 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c	2005-08-16 16:18:22.000000000 -0700
@@ -330,13 +330,17 @@ void __exit_sighand(struct task_struct *
 	/* Ok, we're done with the signal handlers */
 	tsk->sighand = NULL;
 	if (atomic_dec_and_test(&sighand->count))
-		kmem_cache_free(sighand_cachep, sighand);
+		sighand_free(sighand);
 }
 
 void exit_sighand(struct task_struct *tsk)
 {
 	write_lock_irq(&tasklist_lock);
-	__exit_sighand(tsk);
+	spin_lock(&tsk->sighand->siglock);
+	if (tsk->sighand != NULL) {
+		__exit_sighand(tsk);
+	}
+	spin_unlock(&tsk->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
 }
 
@@ -359,6 +363,7 @@ void __exit_signal(struct task_struct *t
 		if (tsk == sig->curr_target)
 			sig->curr_target = next_thread(tsk);
 		tsk->signal = NULL;
+		__exit_sighand(tsk);
 		spin_unlock(&sighand->siglock);
 		flush_sigqueue(&sig->shared_pending);
 	} else {
@@ -390,6 +395,7 @@ void __exit_signal(struct task_struct *t
 		sig->nvcsw += tsk->nvcsw;
 		sig->nivcsw += tsk->nivcsw;
 		sig->sched_time += tsk->sched_time;
+		__exit_sighand(tsk);
 		spin_unlock(&sighand->siglock);
 		sig = NULL;	/* Marker for below.  */
 	}
@@ -1150,16 +1156,23 @@ void zap_other_threads(struct task_struc
 int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
 {
 	unsigned long flags;
+	struct sighand_struct *sp;
 	int ret;
 
+retry:
 	ret = check_kill_permission(sig, info, p);
-	if (!ret && sig && p->sighand) {
+	if (!ret && sig && (sp = p->sighand)) {
 		if (!get_task_struct_rcu(p)) {
 			return -ESRCH;
 		}
-		spin_lock_irqsave(&p->sighand->siglock, flags);
+		spin_lock_irqsave(&sp->siglock, flags);
+		if (p->sighand != sp) {
+			spin_unlock_irqrestore(&sp->siglock, flags);
+			put_task_struct(p);
+			goto retry;
+		}
 		ret = __group_send_sig_info(sig, info, p);
-		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+		spin_unlock_irqrestore(&sp->siglock, flags);
 		put_task_struct(p);
 	}
 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-17  1:48           ` Paul E. McKenney
@ 2005-08-17  6:35             ` Ingo Molnar
  2005-08-17 14:35             ` Oleg Nesterov
  1 sibling, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2005-08-17  6:35 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Oleg Nesterov, Dipankar Sarma, linux-kernel


* Paul E. McKenney <paulmck@us.ibm.com> wrote:

> My tests are not finding even glaring races, so time to go and create 
> some torture tests before getting too much more elaborate.  10,000 
> eyes are nice (and Oleg's eyes do seem to be working especially well), 
> but a good software-test sledgehammer has its uses as well.

i've merged this to the -rt tree, and find below a delta patch relative 
to the previous patch.

	Ingo

Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -74,7 +74,6 @@ repeat: 
 		__ptrace_unlink(p);
 	BUG_ON(!list_empty(&p->ptrace_list) || !list_empty(&p->ptrace_children));
 	__exit_signal(p);
-	__exit_sighand(p);
 	/*
 	 * Note that the fastpath in sys_times depends on __exit_signal having
 	 * updated the counters before a task is removed from the tasklist of
Index: linux/kernel/signal.c
===================================================================
--- linux.orig/kernel/signal.c
+++ linux/kernel/signal.c
@@ -328,17 +328,19 @@ void __exit_sighand(struct task_struct *
 	struct sighand_struct * sighand = tsk->sighand;
 
 	/* Ok, we're done with the signal handlers */
-	spin_lock(&sighand->siglock);
 	tsk->sighand = NULL;
 	if (atomic_dec_and_test(&sighand->count))
 		sighand_free(sighand);
-	spin_unlock(&sighand->siglock);
 }
 
 void exit_sighand(struct task_struct *tsk)
 {
 	write_lock_irq(&tasklist_lock);
-	__exit_sighand(tsk);
+	spin_lock(&tsk->sighand->siglock);
+	if (tsk->sighand != NULL) {
+		__exit_sighand(tsk);
+	}
+	spin_unlock(&tsk->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
 }
 
@@ -361,6 +363,7 @@ void __exit_signal(struct task_struct *t
 		if (tsk == sig->curr_target)
 			sig->curr_target = next_thread(tsk);
 		tsk->signal = NULL;
+		__exit_sighand(tsk);
 		spin_unlock(&sighand->siglock);
 		flush_sigqueue(&sig->shared_pending);
 	} else {
@@ -392,6 +395,7 @@ void __exit_signal(struct task_struct *t
 		sig->nvcsw += tsk->nvcsw;
 		sig->nivcsw += tsk->nivcsw;
 		sig->sched_time += tsk->sched_time;
+		__exit_sighand(tsk);
 		spin_unlock(&sighand->siglock);
 		sig = NULL;	/* Marker for below.  */
 	}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-17  1:48           ` Paul E. McKenney
  2005-08-17  6:35             ` Ingo Molnar
@ 2005-08-17 14:35             ` Oleg Nesterov
  2005-08-17 21:19               ` Paul E. McKenney
  1 sibling, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2005-08-17 14:35 UTC (permalink / raw)
  To: paulmck; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

Paul E. McKenney wrote:
>
> > The other thing that jumped out at me is that signals are very different
> > animals from a locking viewpoint depending on whether they are:
> >
> > 1.	ignored,
> >
> > 2.	caught by a single thread,
> >
> > 3.	fatal to multiple threads/processes (though I don't know
> > 	of anything that shares sighand_struct between separate
> > 	processes), or
> >
> > 4.	otherwise global to multiple threads/processes (such as
> > 	SIGSTOP and SIGCONT).
> >
> > And there are probably other distinctions that I have not yet caught
> > on to.
> >
> > One way to approach this would be to make your suggested lock_task_sighand()
> > look at the signal and acquire the appropriate locks.  If, having acquired
> > a given set of locks, it found that the needed set had changed (e.g., due
> > to racing exec() or sigaction()), then it drops the locks and retries.
>
> OK, for this sort of approach to work, lock_task_sighand() must take and
> return some sort of mask indicating what locks are held.  The mask returned
> by lock_task_sighand() would then be passed to an unlock_task_sighand().

Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND,
so we always need to lock one ->sighand. Could you please clarify?

> > > +	if (!ret && sig && (sp = p->sighand)) {
> > >  		if (!get_task_struct_rcu(p)) {
> > >  			return -ESRCH;
> > >  		}
> > > -		spin_lock_irqsave(&p->sighand->siglock, flags);
> > > +		spin_lock_irqsave(&sp->siglock, flags);
> > > +		if (p->sighand != sp) {
> > > +			spin_unlock_irqrestore(&sp->siglock, flags);
> > > +			put_task_struct(p);
> > > +			goto retry;
> > > +		}
> > >  		ret = __group_send_sig_info(sig, info, p);
> > > -		spin_unlock_irqrestore(&p->sighand->siglock, flags);
> > > +		spin_unlock_irqrestore(&sp->siglock, flags);
> > >  		put_task_struct(p);
> >
> > Do we really need get_task_struct_rcu/put_task_struct here?
> >
> > The task_struct can't go away under us, it is rcu protected.
> > When ->sighand is locked, and it is still the same after
> > the re-check, it means that 'p' has not done __exit_signal()
> > yet, so it is safe to send the signal.
> >
> > And if the task has ->usage == 0, it means that it also has
> > ->sighand == NULL, and your code will notice that.
> >
> > No?
>
> Seems plausible.  I got paranoid after seeing the lock dropped in
> handle_stop_signal(), though.

Yes, this is bad and should be fixed, I agree.

But why do you think we need to bump task->usage? It can't make any
difference, afaics. The task_struct can't dissapear, the caller was
converted to use rcu_read_lock() or it holds tasklist_lock.

Nonzero task_struct->usage can't stop do_exit or sys_wait4, it will
only postpone call_rcu(__put_task_struct_cb).

And after we locked ->sighand we have sufficient memory barrier, so
if we read the stale value into 'sp' we will notice that (if you were
worried about this issue).

Am I missed something?

>  void exit_sighand(struct task_struct *tsk)
>  {
>  	write_lock_irq(&tasklist_lock);
> -	__exit_sighand(tsk);
> +	spin_lock(&tsk->sighand->siglock);
> +	if (tsk->sighand != NULL) {
> +		__exit_sighand(tsk);
> +	}
> +	spin_unlock(&tsk->sighand->siglock);
>  	write_unlock_irq(&tasklist_lock);
>  }

Very strange code. Why this check? And what happens with

	spin_unlock(&tsk->sighand->siglock);

when tsk->sighand == NULL ?

Oleg.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-17 14:35             ` Oleg Nesterov
@ 2005-08-17 21:19               ` Paul E. McKenney
  2005-08-18 11:48                 ` Oleg Nesterov
  2005-08-18 12:24                 ` Oleg Nesterov
  0 siblings, 2 replies; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-17 21:19 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

On Wed, Aug 17, 2005 at 06:35:03PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > > The other thing that jumped out at me is that signals are very different
> > > animals from a locking viewpoint depending on whether they are:
> > >
> > > 1.	ignored,
> > >
> > > 2.	caught by a single thread,
> > >
> > > 3.	fatal to multiple threads/processes (though I don't know
> > > 	of anything that shares sighand_struct between separate
> > > 	processes), or
> > >
> > > 4.	otherwise global to multiple threads/processes (such as
> > > 	SIGSTOP and SIGCONT).
> > >
> > > And there are probably other distinctions that I have not yet caught
> > > on to.
> > >
> > > One way to approach this would be to make your suggested lock_task_sighand()
> > > look at the signal and acquire the appropriate locks.  If, having acquired
> > > a given set of locks, it found that the needed set had changed (e.g., due
> > > to racing exec() or sigaction()), then it drops the locks and retries.
> >
> > OK, for this sort of approach to work, lock_task_sighand() must take and
> > return some sort of mask indicating what locks are held.  The mask returned
> > by lock_task_sighand() would then be passed to an unlock_task_sighand().
> 
> Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND,
> so we always need to lock one ->sighand. Could you please clarify?

On the #3 and #4 code paths, the code assumes that the task-list lock
is held.  So I was thinking of something (very roughly) as follows:

#define SIGLOCK_HOLD_RCU      (1 << 0)
#define SIGLOCK_HOLD_TASKLIST (1 << 1)
#define SIGLOCK_HOLD_SIGLOCK  (1 << 2)

int lock_task_sighand(int sig, struct task_struct *p, int locksheld, struct sighand_struct **spp, int *flags)
{
	int locksret = 0;
	struct sighand_struct *sp;

retry:
	if (!(locksheld & SIGLOCK_HOLD_RCU)) {
		locksret |= SIGLOCK_HOLD_RCU;
		rcu_read_lock();
	}
	sp = rcu_dereference(p->sighand);
	if (sp == NULL) {
		unlock_task_sighand(NULL, locksret, *flags);
		*spp = NULL;
		return 0;
	}
	if (!(locksheld & SIGLOCK_HOLD_TASKLIST)) {
		/* Complain if siglock held. */
		if (sig_kernel_stop(sig) /* expand for other conditions */) {
			locksret |= SIGLOCK_HOLD_TASKLIST;
			read_lock(&tasklist_lock);
		}
	}
	if (!(locksheld & SIGLOCK_HOLD_SIGLOCK)) {
		*flags = 0;
	} else {
		locksret |= SIGLOCK_HOLD_SIGLOCK;
		spin_lock_irqsave(&sp->siglock, *flags);
		if (p->sighand != sp) {
			unlock_task_sighand(sp, locksret, *flags);
			goto retry;
		}
	}
	*spp = sp;
	return locksret;
}

void unlock_task_sighand(struct sighand_struct *sp, int lockstodrop, int flags)
{
	/* lockstodrop had better be non-negative! */

	if (lockstodrop & SIGLOCK_HOLD_SIGLOCK) {
		spin_unlock_irqrestore(&sp->siglock, flags);
	}
	if (lockstodrop & SIGLOCK_HOLD_TASKLIST) {
		read_unlock(&tasklist_lock);
	}
	if (lockstodrop & SIGLOCK_HOLD_RCU) {
		rcu_read_unlock();
	}
}

The "expand for other conditions" must also cover signals that cause
coredumps, that kill all threads in a process, or that otherwise affect
more than one thread (e.g., kill with a negative signal number).  I am
assuming that your argument about not needing the get_task_struct_rcu()
eventually work their way through my skull.  ;-)

Of course, the "expand for other conditions" requires reference to the
sighand_struct.  And for that, you really need to be holding siglock.
But you have to drop siglock to acquire tasklist_lock.  So will end up
relying on RCU some more to safely peek into sighand_struct -before-
acquiring siglock -- which is why I snapshot p->sighand so early, it
will need to be referenced when checking "other conditions".

I am not exactly happy with the above pair of functions, but I suspect
that they might -really- simplify things a bit.

The usage would be as follows:

	if ((lockstodrop = lock_task_sighand(sig, tsk, 0, &sp, &flags)) < 0)
		return lockstodrop;  /* error code */
	if (sp != NULL) {
		/* invoke the function to send the signal */
	}
	unlock_task_sighand(sp, lockstodrop, flags);

Thoughts?  Hey, you asked for this!!!  ;-)

> > > > +	if (!ret && sig && (sp = p->sighand)) {
> > > >  		if (!get_task_struct_rcu(p)) {
> > > >  			return -ESRCH;
> > > >  		}
> > > > -		spin_lock_irqsave(&p->sighand->siglock, flags);
> > > > +		spin_lock_irqsave(&sp->siglock, flags);
> > > > +		if (p->sighand != sp) {
> > > > +			spin_unlock_irqrestore(&sp->siglock, flags);
> > > > +			put_task_struct(p);
> > > > +			goto retry;
> > > > +		}
> > > >  		ret = __group_send_sig_info(sig, info, p);
> > > > -		spin_unlock_irqrestore(&p->sighand->siglock, flags);
> > > > +		spin_unlock_irqrestore(&sp->siglock, flags);
> > > >  		put_task_struct(p);
> > >
> > > Do we really need get_task_struct_rcu/put_task_struct here?
> > >
> > > The task_struct can't go away under us, it is rcu protected.
> > > When ->sighand is locked, and it is still the same after
> > > the re-check, it means that 'p' has not done __exit_signal()
> > > yet, so it is safe to send the signal.
> > >
> > > And if the task has ->usage == 0, it means that it also has
> > > ->sighand == NULL, and your code will notice that.
> > >
> > > No?
> >
> > Seems plausible.  I got paranoid after seeing the lock dropped in
> > handle_stop_signal(), though.
> 
> Yes, this is bad and should be fixed, I agree.
> 
> But why do you think we need to bump task->usage? It can't make any
> difference, afaics. The task_struct can't dissapear, the caller was
> converted to use rcu_read_lock() or it holds tasklist_lock.
> 
> Nonzero task_struct->usage can't stop do_exit or sys_wait4, it will
> only postpone call_rcu(__put_task_struct_cb).
> 
> And after we locked ->sighand we have sufficient memory barrier, so
> if we read the stale value into 'sp' we will notice that (if you were
> worried about this issue).
> 
> Am I missed something?

I doubt if you are missing anything.  I was just being paranoid.
Seeing locks get momentarily get dropped in the middle of functions is
a -really- good way to get me into that state!  But since that code
path can be called with tasklist_lock held, it should not be sleeping,
so I believe that you are correct.

Hence my switching to rcu_read_lock() in lock_task_sighand() above.

> >  void exit_sighand(struct task_struct *tsk)
> >  {
> >  	write_lock_irq(&tasklist_lock);
> > -	__exit_sighand(tsk);
> > +	spin_lock(&tsk->sighand->siglock);
> > +	if (tsk->sighand != NULL) {
> > +		__exit_sighand(tsk);
> > +	}
> > +	spin_unlock(&tsk->sighand->siglock);
> >  	write_unlock_irq(&tasklist_lock);
> >  }
> 
> Very strange code. Why this check? And what happens with
> 
> 	spin_unlock(&tsk->sighand->siglock);
> 
> when tsk->sighand == NULL ?

My bad.  Ingo beat you to it though.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-17 21:19               ` Paul E. McKenney
@ 2005-08-18 11:48                 ` Oleg Nesterov
  2005-08-19  1:29                   ` Paul E. McKenney
  2005-08-18 12:24                 ` Oleg Nesterov
  1 sibling, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2005-08-18 11:48 UTC (permalink / raw)
  To: paulmck; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

Paul E. McKenney wrote:
>
> On Wed, Aug 17, 2005 at 06:35:03PM +0400, Oleg Nesterov wrote:
> >
> > Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND,
> > so we always need to lock one ->sighand. Could you please clarify?
>
> On the #3 and #4 code paths, the code assumes that the task-list lock
> is held.  So I was thinking of something (very roughly) as follows:
>
> #define SIGLOCK_HOLD_RCU      (1 << 0)
> #define SIGLOCK_HOLD_TASKLIST (1 << 1)
> #define SIGLOCK_HOLD_SIGLOCK  (1 << 2)

Oh, no, sorry for confusion.

I meant this function should only lock ->sighand, nothing more, something
like this:

// must be called with preemtion disabled !!!
struct sighand_struct *lock_task_sighand(struct task_struct *tsk, unsigned long *flags)
{
	struct sighand_struct *sighand;

//	sighand = NULL;
//	if (!get_task_struct_rcu(tsk))
//		goto out;

	for (;;) {
		sighand = tsk->sighand;
		if (unlikely(sighand == NULL))
			break;

		spin_lock_irqsave(sighand->siglock, *flags);

		if (likely(sighand == tsk->sighand)
			goto out;

		spin_unlock_irqrestore(sighand->siglock, *flags);
	}

//	put_task_struct(tsk);
out:
	return sighand;
}

static inline void unlock_task_sighand(struct task_struct *tsk, unsigned long *flags)
{
	spin_unlock_irqrestore(tsk->sighand->siglock, flags);
//	put_task_struct(tsk);
}

int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
{
	unsigned long flags;
	int ret;

	ret = check_kill_permission(sig, info, p);
	if (!ret && sig) {
		ret = -ESRCH;
		if (lock_task_sighand(p, &flags)) {
			ret = __group_send_sig_info(sig, info, p);
			unlock_task_sighand(p, &flags);
		}
	}

	return ret;
}

Currently the only user of it will be group_send_sig_info(), but I hope
you have devil plans to kill the "tasklist_lock guards the very rare
->sighand change" finally :)

Oleg.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-17 21:19               ` Paul E. McKenney
  2005-08-18 11:48                 ` Oleg Nesterov
@ 2005-08-18 12:24                 ` Oleg Nesterov
  1 sibling, 0 replies; 28+ messages in thread
From: Oleg Nesterov @ 2005-08-18 12:24 UTC (permalink / raw)
  To: paulmck, Thomas Gleixner; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

(Replying to wrong message, sorry).

Thomas Gleixner wrote:
>
> --- linux-2.6.13-rc6-rt8/kernel/fork.c	2005-08-17 12:57:08.000000000 +0200
> +++ linux-2.6.13-rc6-rt/kernel/fork.c	2005-08-17 11:17:46.000000000 +0200
> @@ -1198,7 +1198,8 @@ bad_fork_cleanup_mm:
>  bad_fork_cleanup_signal:
>  	exit_signal(p);
>  bad_fork_cleanup_sighand:
> -	exit_sighand(p);
> +	if (p->sighand) /* exit_signal() could have freed p->sighand */
> +		exit_sighand(p);


Looks like now it is the only user of exit_signal(), and I think
we can kill this function and just do:

bad_fork_cleanup_sighand:

	if (p->sighand) {

		// p->sighand can't change here, we don't need tasklist lock

		if (atomic_dec_and_test(p->sighand->count))

			// If we get here we are not sharing ->sighand with anybody else.
			// It means, in particular, that p had no CLONE_THREAD flag.
			// Nobody can see this process yet, we didn't call attach_pid(),
			// otherwise ->sighand was freed from __exit_signal. Thus nobody
			// can see this sighand.

			sighand_free(p->sighand);
	}

It is not an optimization, this path is rare, just to make things
more clear and to reduce "false positives" from grep tasklist_lock.

And I think it makes sense to

#define	put_sighand(sig)	\
	do if atomic_dec_and_test(sig->count) sighand_free(sig); while (0)

Oleg.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-18 11:48                 ` Oleg Nesterov
@ 2005-08-19  1:29                   ` Paul E. McKenney
  2005-08-19 13:27                     ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-19  1:29 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

On Thu, Aug 18, 2005 at 03:48:00PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > On Wed, Aug 17, 2005 at 06:35:03PM +0400, Oleg Nesterov wrote:
> > >
> > > Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND,
> > > so we always need to lock one ->sighand. Could you please clarify?
> >
> > On the #3 and #4 code paths, the code assumes that the task-list lock
> > is held.  So I was thinking of something (very roughly) as follows:
> >
> > #define SIGLOCK_HOLD_RCU      (1 << 0)
> > #define SIGLOCK_HOLD_TASKLIST (1 << 1)
> > #define SIGLOCK_HOLD_SIGLOCK  (1 << 2)
> 
> Oh, no, sorry for confusion.
> 
> I meant this function should only lock ->sighand, nothing more, something
> like this:
> 
> // must be called with preemtion disabled !!!
> struct sighand_struct *lock_task_sighand(struct task_struct *tsk, unsigned long *flags)
> {
> 	struct sighand_struct *sighand;
> 
> //	sighand = NULL;
> //	if (!get_task_struct_rcu(tsk))
> //		goto out;
> 
> 	for (;;) {
> 		sighand = tsk->sighand;
> 		if (unlikely(sighand == NULL))
> 			break;
> 
> 		spin_lock_irqsave(sighand->siglock, *flags);
> 
> 		if (likely(sighand == tsk->sighand)
> 			goto out;
> 
> 		spin_unlock_irqrestore(sighand->siglock, *flags);
> 	}
> 
> //	put_task_struct(tsk);
> out:
> 	return sighand;
> }
> 
> static inline void unlock_task_sighand(struct task_struct *tsk, unsigned long *flags)
> {
> 	spin_unlock_irqrestore(tsk->sighand->siglock, flags);
> //	put_task_struct(tsk);
> }
> 
> int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
> {
> 	unsigned long flags;
> 	int ret;
> 
> 	ret = check_kill_permission(sig, info, p);
> 	if (!ret && sig) {
> 		ret = -ESRCH;
> 		if (lock_task_sighand(p, &flags)) {
> 			ret = __group_send_sig_info(sig, info, p);
> 			unlock_task_sighand(p, &flags);
> 		}
> 	}
> 
> 	return ret;
> }
> 
> Currently the only user of it will be group_send_sig_info(),

Well, that is one reason that I didn't think that you were talking
about lock_task_sighand() working only on siglock...

>                                                              but I hope
> you have devil plans to kill the "tasklist_lock guards the very rare
> ->sighand change" finally :)

;-)

I have indeed been thinking along these lines, but all of the devil plans
that I have come up thus far are quite intrusive, and might have some
performance problems in some situations.  So it seems best to remove
tasklist_lock in steps:

1.	Single-recipient catch and ignore cases.

2.	Single-recipient stop/continue cases.

3.	Single-recipient fatal cases.

4.	Single-process multi-threaded stop/continue cases.

5.	Single-process multi-threaded fatal cases.

6.	And on to process-group cases.

There are a number of ways of handling this, and they all have some
scary aspects (and here you thought the current patches were scary!!!):

1.	Provide a single signal-reception data structure for each
	multi-recipient case: process for multithreaded processes,
	process-group leader for process-group signals, controlling
	terminal for tty-based signals.  The task delivering the
	signal records the signal (perhaps queuing it) on the
	signal-reception data structure, and sends an IPI to each
	CPU currently running one of the affected tasks.

	What about the tasks not currently running?  There are a
	number of devil-planlet alternatives possible here:

	a.	Have a daemon responsible for waking up any signaled
		tasks (assuming that they are blocked or sleeping
		interruptibly).

	b.	Have the tasks running on the CPUs each wake up 
		one sleeping task, who in turn wake up another
		once they receive the signal, and so on.

	c.	Have tasks check their signal-reception data structures
		(might have up to three: process, process group,
		controlling terminal).  Hopefully can get by with
		a very few such checks, but...

	What to do if several signals arrive at about the same time?
	What about consistent ordering of signals to different groups
	with partially overlapping task membership?  Are we allowed
	to see different signal orderings for different tasks sharing
	two different groupings?  Devil-planlet alternatives:

	a.	Serialize on controlling terminal (but this might as
		well be a global lock on single-user systems).

	b.	Allow signals to arrive in different orders at different
		threads of a process, if the different signals are
		targetted at different groups (e.g., process vs.
		process group vs. controlling terminal).

		Is POSIX OK with this?  I certainly hope so!!!  :-/

	c.	Have a signal-delivery kernel task to do the
		actual delivery.  This seems like overkill to me.

2.	Have a signal-delivery daemon do the delivery, so that
	the sender need only record the presence of the signal.
	This option would at least keep the context-switch code
	fully fit and exercised.  Other than that, it seems not
	to be all that good.

There are probably others.  But all are quite intrusive, and I think it
would be better to get a partial step into mainline sooner, and follow
up with improvements later.

So, what am I missing?  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-19  1:29                   ` Paul E. McKenney
@ 2005-08-19 13:27                     ` Oleg Nesterov
  2005-08-19 18:34                       ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2005-08-19 13:27 UTC (permalink / raw)
  To: paulmck; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

Paul E. McKenney wrote:
>
> I have indeed been thinking along these lines, but all of the devil plans
> that I have come up thus far are quite intrusive, and might have some
> performance problems in some situations.  So it seems best to remove
> tasklist_lock in steps:
>
> 1.	Single-recipient catch and ignore cases.
>
> 2.	Single-recipient stop/continue cases.
>
> 3.	Single-recipient fatal cases.
>
> 4.	Single-process multi-threaded stop/continue cases.
>
> 5.	Single-process multi-threaded fatal cases.
>
> 6.	And on to process-group cases.

Paul, I am not an expert at all, but honestly I don't see how
this could be achieved. This lock is heavily overloaded for
quite different purposes. I think that may be it makes sense
to try other steps, for example (random order):

  1. Tasklist protects ->sighand changing (de_thread) - rework
     sighand access/locking.

  2. Tasklist protects reparenting - fix this.

  .......

  N. PTRACE!!! Well, I close my eyes immediately when I see this
     word in the sources.

Only then we can eliminate tasklist locking from signal sending
path. But I don't see the easy way to solve any of these 1 - N
problems.

Currently I don't see how your patch could be "fixed" for SIGCONT
case, except very ugly:

	kill_proc_info(sig)
	{
		p = find_task_by_pid(pid);
		if (sig == SIGCONT)
			read_lock(&tasklist_lock);

		error = group_send_sig_info(...);
		...
	}

But there are other problems too.

Look at __group_complete_signal(), it scans p->pids[PIDTYPE_TGID].pid_list
list to find a a suitable thread. What if 'p' does clone(CLONE_THREAD) now?
Let's look at copy_process(), it does attach_pid(p, PIDTYPE_TGID, p->tgid)
under the lovely tasklist_lock again.

So, I don't beleive we can solve even the simplest case (single-recipient,
non fatal, non stop/cont) without significant locking rework.

I hope that your patch will stimulate this work.

Oleg.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
  2005-08-19 13:27                     ` Oleg Nesterov
@ 2005-08-19 18:34                       ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2005-08-19 18:34 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Dipankar Sarma, linux-kernel

On Fri, Aug 19, 2005 at 05:27:24PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > I have indeed been thinking along these lines, but all of the devil plans
> > that I have come up thus far are quite intrusive, and might have some
> > performance problems in some situations.  So it seems best to remove
> > tasklist_lock in steps:
> >
> > 1.	Single-recipient catch and ignore cases.
> >
> > 2.	Single-recipient stop/continue cases.
> >
> > 3.	Single-recipient fatal cases.
> >
> > 4.	Single-process multi-threaded stop/continue cases.
> >
> > 5.	Single-process multi-threaded fatal cases.
> >
> > 6.	And on to process-group cases.
> 
> Paul, I am not an expert at all, but honestly I don't see how
> this could be achieved. This lock is heavily overloaded for
> quite different purposes. I think that may be it makes sense
> to try other steps, for example (random order):
> 
>   1. Tasklist protects ->sighand changing (de_thread) - rework
>      sighand access/locking.

We have a start on this, and this is the first area I am making tests
for.

>   2. Tasklist protects reparenting - fix this.

Good point...  Will require thought.  And tests that target this race.
The tests are what I am working now.

>   .......
> 
>   N. PTRACE!!! Well, I close my eyes immediately when I see this
>      word in the sources.

This one as well.  Hmmm....  The tests could be fun -- race gdb
grabbing a process with sending that process a signal.  Might
need to instead make a small program that pretends to be gdb,
will play with it.

Some others that come to mind:

N+1.	Process-group leader dies while process-group signal is being
	received.

N+2.	Process-group members are wildly creating new processes while
	a fatal signal is being received (which should result in -all-
	processes in the group dying).

N+3.	Threads are being wildly created in a multithreaded process while
	a fatal signal is being received...

N+4.	The tsk->signal->curr_target in a multithreaded process changes
	while a signal is being received.

N+5.	The handler state (default/ignored/caught) changes while a
	signal is being received.  But siglock should cover this.

N+6.	Different signal reception orders for different tasks in
	a given set of groups (e.g., fatal signal to process group
	at same time as fatal signal to multithreaded process within
	that group).  Don't believe that this one matters, but thought
	I should call it out anyway.

	o	SIGHUP/SIGCONT from tty going away vs. SIGSTOP
		from elsewhere.

	o	tty signal (ctrl-Z, ctrl-C, ...) vs. signal to
		specific process in the tty's process group.

And there are probably more, but this should keep me busy for a bit.

> Only then we can eliminate tasklist locking from signal sending
> path. But I don't see the easy way to solve any of these 1 - N
> problems.

If it was easy, someone would likely already have done it...

> Currently I don't see how your patch could be "fixed" for SIGCONT
> case, except very ugly:
> 
> 	kill_proc_info(sig)
> 	{
> 		p = find_task_by_pid(pid);
> 		if (sig == SIGCONT)
> 			read_lock(&tasklist_lock);
> 
> 		error = group_send_sig_info(...);
> 		...
> 	}

I was indeed thinking along these lines.

> But there are other problems too.
> 
> Look at __group_complete_signal(), it scans p->pids[PIDTYPE_TGID].pid_list
> list to find a a suitable thread. What if 'p' does clone(CLONE_THREAD) now?
> Let's look at copy_process(), it does attach_pid(p, PIDTYPE_TGID, p->tgid)
> under the lovely tasklist_lock again.

Another approach that I am considering for the class of problem is to
have the code paths that create or change tasks momentarily acquire
locks corresponding to the groups of tasks that can receive signals.
There are a number of problems with this, for example, the logical place
for the lock corresponding to a process group is the process-group leader.
There has to be a way to handling the case where the process-group leader
dies before the rest of the processes do.  And so on, for similar changes.

> So, I don't beleive we can solve even the simplest case (single-recipient,
> non fatal, non stop/cont) without significant locking rework.

Agreed, hence the "Not-signed-off-by:" on my original patch.

> I hope that your patch will stimulate this work.

Me too, will continue working it.  Your comments have been very helpful!

If nothing else, this thread has shown that there are some worthwhile
benefits from getting rid of tasklist_lock...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2005-08-19 18:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-10 17:11 [RFC,PATCH] Use RCU to protect tasklist for unicast signals Paul E. McKenney
2005-08-11  9:56 ` Ingo Molnar
2005-08-11 14:14   ` Paul E. McKenney
2005-08-12  2:00   ` Lee Revell
2005-08-12  6:36     ` Ingo Molnar
2005-08-12 20:57       ` Paul E. McKenney
2005-08-11 17:14 ` Christoph Hellwig
2005-08-11 17:56   ` Paul E. McKenney
2005-08-11 18:00   ` Dipankar Sarma
2005-08-11 18:12     ` Dipankar Sarma
  -- strict thread matches above, loose matches on Subject: below --
2005-08-11 12:16 Oleg Nesterov
2005-08-11 15:20 ` Paul E. McKenney
2005-08-12  1:56 ` Paul E. McKenney
2005-08-12  8:51   ` Oleg Nesterov
2005-08-12 15:42     ` Paul E. McKenney
2005-08-15 17:44     ` Paul E. McKenney
2005-08-16  8:14       ` Ingo Molnar
2005-08-16 11:56       ` Oleg Nesterov
2005-08-16 17:07         ` Paul E. McKenney
2005-08-17  1:48           ` Paul E. McKenney
2005-08-17  6:35             ` Ingo Molnar
2005-08-17 14:35             ` Oleg Nesterov
2005-08-17 21:19               ` Paul E. McKenney
2005-08-18 11:48                 ` Oleg Nesterov
2005-08-19  1:29                   ` Paul E. McKenney
2005-08-19 13:27                     ` Oleg Nesterov
2005-08-19 18:34                       ` Paul E. McKenney
2005-08-18 12:24                 ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox