All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ravikiran G Thirumalai <kiran@scalex86.org>
To: Christoph Lameter <clameter@engr.sgi.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>,
	Shai Fultheim <shai@scalex86.org>,
	Nippun Goel <nippung@calsoftinc.com>,
	linux-kernel@vger.kernel.org, Andrew Morton <akpm@osdl.org>
Subject: Re: [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess at getrusage()
Date: Fri, 6 Jan 2006 11:46:23 -0800	[thread overview]
Message-ID: <20060106194623.GA4078@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.62.0601060921530.17444@schroedinger.engr.sgi.com>

On Fri, Jan 06, 2006 at 09:23:30AM -0800, Christoph Lameter wrote:
> On Fri, 6 Jan 2006, Ravikiran G Thirumalai wrote:
> 
> > +	need_lock = !(p == current && thread_group_empty(p));
> 
> Isnt 
> 
> need_lock = (p != current || !thread_group_empty(b))
> 
> clearer?

All the same I felt, and the comments were bold and clear.  Also, the above was
c translation of what was said in the comment  :)...I am OK either ways.
So Here goes...

Following patch avoids taking the global tasklist_lock when possible,
if a process is single threaded during getrusage().  Any avoidance of
tasklist_lock is good for NUMA boxes (and possibly for large SMPs).
Thanks to Oleg Nesterov for review and suggestions.

Signed-off-by: Nippun Goel <nippung@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>

Index: linux-2.6.15-delme/kernel/sys.c
===================================================================
--- linux-2.6.15-delme.orig/kernel/sys.c	2006-01-05 15:40:38.000000000 -0800
+++ linux-2.6.15-delme/kernel/sys.c	2006-01-06 00:43:08.000000000 -0800
@@ -1664,9 +1664,6 @@ asmlinkage long sys_setrlimit(unsigned i
  * a lot simpler!  (Which we're not doing right now because we're not
  * measuring them yet).
  *
- * This expects to be called with tasklist_lock read-locked or better,
- * and the siglock not locked.  It may momentarily take the siglock.
- *
  * When sampling multiple threads for RUSAGE_SELF, under SMP we might have
  * races with threads incrementing their own counters.  But since word
  * reads are atomic, we either get new values or old values and we don't
@@ -1674,6 +1671,25 @@ asmlinkage long sys_setrlimit(unsigned i
  * the c* fields from p->signal from races with exit.c updating those
  * fields when reaping, so a sample either gets all the additions of a
  * given child after it's reaped, or none so this sample is before reaping.
+ *
+ * tasklist_lock locking optimisation:
+ * If we are current and single threaded, we do not need to take the tasklist
+ * lock or the siglock.  No one else can take our signal_struct away, 
+ * no one else can reap the children to update signal->c* counters, and
+ * no one else can race with the signal-> fields.
+ * If we do not take the tasklist_lock, the signal-> fields could be read
+ * out of order while another thread was just exiting. So we place a 
+ * read memory barrier when we avoid the lock.  On the writer side, 
+ * write memory barrier is implied in  __exit_signal as __exit_signal releases 
+ * the siglock spinlock after updating the signal-> fields.
+ *
+ * We don't really need the siglock when we access the non c* fields
+ * of the signal_struct (for RUSAGE_SELF) even in multithreaded
+ * case, since we take the tasklist lock for read and the non c* signal-> 
+ * fields are updated only in __exit_signal, which is called with 
+ * tasklist_lock taken for write, hence these two threads cannot execute
+ * concurrently.
+ *
  */
 
 static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
@@ -1681,14 +1697,22 @@ static void k_getrusage(struct task_stru
 	struct task_struct *t;
 	unsigned long flags;
 	cputime_t utime, stime;
+	int need_lock = 0;
 
 	memset((char *) r, 0, sizeof *r);
-
-	if (unlikely(!p->signal))
-		return;
-
 	utime = stime = cputime_zero;
 
+	need_lock = (p != current || !thread_group_empty(p));
+	if (need_lock) {
+		read_lock(&tasklist_lock);
+		if (unlikely(!p->signal)) {
+			read_unlock(&tasklist_lock);
+			return;
+		}
+	} else
+		/* See locking comments above */
+		smp_rmb();
+
 	switch (who) {
 		case RUSAGE_BOTH:
 		case RUSAGE_CHILDREN:
@@ -1727,6 +1751,8 @@ static void k_getrusage(struct task_stru
 			BUG();
 	}
 
+	if (need_lock)
+		read_unlock(&tasklist_lock);
 	cputime_to_timeval(utime, &r->ru_utime);
 	cputime_to_timeval(stime, &r->ru_stime);
 }
@@ -1734,9 +1760,7 @@ static void k_getrusage(struct task_stru
 int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
 {
 	struct rusage r;
-	read_lock(&tasklist_lock);
 	k_getrusage(p, who, &r);
-	read_unlock(&tasklist_lock);
 	return copy_to_user(ru, &r, sizeof(r)) ? -EFAULT : 0;
 }
 

  reply	other threads:[~2006-01-06 19:46 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-24 17:52 [rfc][patch] Avoid taking global tasklist_lock for single threaded process at getrusage() Oleg Nesterov
2005-12-27 20:21 ` Christoph Lameter
2005-12-28 12:38   ` [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess " Oleg Nesterov
2005-12-28 18:33     ` Ravikiran G Thirumalai
2005-12-28 22:57       ` Ravikiran G Thirumalai
2005-12-30 17:57         ` Oleg Nesterov
2006-01-04 23:16           ` Ravikiran G Thirumalai
2006-01-05 19:17             ` Oleg Nesterov
2006-01-06  9:46               ` Ravikiran G Thirumalai
2006-01-06 17:23                 ` Christoph Lameter
2006-01-06 19:46                   ` Ravikiran G Thirumalai [this message]
2006-03-20 18:04                     ` Oleg Nesterov
2006-03-22 22:18                       ` Ravikiran G Thirumalai
2006-03-23 18:18                         ` Oleg Nesterov
2006-01-06 23:52                   ` Andrew Morton
2006-01-08 11:49                 ` Oleg Nesterov
2006-01-08 19:58                   ` Ravikiran G Thirumalai
2006-01-09 18:55                     ` Oleg Nesterov
2006-01-09 20:54                       ` Ravikiran G Thirumalai
2006-01-10 19:03                         ` Oleg Nesterov
2006-01-16 20:56                           ` Ravikiran G Thirumalai
2006-01-17 19:59                             ` Oleg Nesterov
2006-01-17 19:52                               ` Ravikiran G Thirumalai
2006-01-18  9:17                                 ` Oleg Nesterov
2006-01-03 18:18         ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060106194623.GA4078@localhost.localdomain \
    --to=kiran@scalex86.org \
    --cc=akpm@osdl.org \
    --cc=clameter@engr.sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nippung@calsoftinc.com \
    --cc=oleg@tv-sign.ru \
    --cc=shai@scalex86.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.