From: Ravikiran G Thirumalai <kiran@scalex86.org>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Christoph Lameter <clameter@engr.sgi.com>,
Shai Fultheim <shai@scalex86.org>,
Nippun Goel <nippung@calsoftinc.com>,
linux-kernel@vger.kernel.org, Andrew Morton <akpm@osdl.org>
Subject: Re: [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess at getrusage()
Date: Wed, 4 Jan 2006 15:16:00 -0800 [thread overview]
Message-ID: <20060104231600.GA3664@localhost.localdomain> (raw)
In-Reply-To: <43B57515.967F53E3@tv-sign.ru>
On Fri, Dec 30, 2005 at 08:57:41PM +0300, Oleg Nesterov wrote:
> > - memset((char *) r, 0, sizeof *r);
> > + if (!thread_group_empty(p)) {
> > + read_lock(&tasklist_lock);
> > + if (unlikely(!p->signal)) {
> > + read_unlock(&tasklist_lock);
> > + goto ret;
>
> Is this possible? 'current' always has valid signal/sighand.
> Or better say, process can't call getrusage after exit_signal().
You are right, this check was unnecessary
>
> > + }
> > + spin_lock_irqsave(&p->sighand->siglock, flags);
> > + lockflag = 1;
> > + }
>
> What if another thread just exited? I think you need 'else smp_rmb()'.
> here. Otherwise cpu can read signal->c* out of order.
Yes, looks like we do. We probably need to do the same at sys_times too...
>
> > }
>
> Looks we can factor out some code.
>
> Actually I dont't understand why can't we move the locking into
> k_getrusage,
>
> k_getrusage()
>
> lock_flag = (p == current && thread_group_empty(p));
> if (lockflag) {
> read_lock(&tasklist_lock);
> spin_lock_irqsave(&p->sighand->siglock, flags);
> }
>
> and remove ->sighand locking under 'switch' statement.
>
> Isn't this enough to solve perfomance problems?
Hmm, just that getrusage_self takes the siglock in the multi threaded
case which is not needed I think. Howz this patch?
getrusage_tasklistlock_optimization-v4
Following patch avoids taking the global tasklist_lock when possible,
if a process is single threaded during getrusage(). Any avoidance of
tasklist_lock is good for NUMA boxes (and possibly for large SMPs).
Index: linux-2.6.15-rc6/kernel/sys.c
===================================================================
--- linux-2.6.15-rc6.orig/kernel/sys.c 2006-01-03 12:12:05.000000000 -0800
+++ linux-2.6.15-rc6/kernel/sys.c 2006-01-04 13:41:17.000000000 -0800
@@ -1664,9 +1664,6 @@ asmlinkage long sys_setrlimit(unsigned i
* a lot simpler! (Which we're not doing right now because we're not
* measuring them yet).
*
- * This expects to be called with tasklist_lock read-locked or better,
- * and the siglock not locked. It may momentarily take the siglock.
- *
* When sampling multiple threads for RUSAGE_SELF, under SMP we might have
* races with threads incrementing their own counters. But since word
* reads are atomic, we either get new values or old values and we don't
@@ -1674,6 +1671,25 @@ asmlinkage long sys_setrlimit(unsigned i
* the c* fields from p->signal from races with exit.c updating those
* fields when reaping, so a sample either gets all the additions of a
* given child after it's reaped, or none so this sample is before reaping.
+ *
+ * tasklist_lock locking optimisation:
+ * If we are current and single threaded, we do not need to take the tasklist
+ * lock or the siglock. No one else can take our signal_struct away,
+ * no one else can reap the children to update signal->c* counters, and
+ * no one else can race with the signal-> fields.
+ * If we do not take the tasklist_lock, the signal-> fields could be read
+ * out of order while another thread was just exiting. So we place a
+ * read memory barrier when we avoid the lock. On the writer side,
+ * write memory barrier is implied in __exit_signal as __exit_signal releases
+ * the siglock spinlock after updating the signal-> fields.
+ *
+ * We don't really need the siglock when we access the non c* fields
+ * of the signal_struct (for RUSAGE_SELF) even in multithreaded
+ * case, since we take the tasklist lock for read and the non c* signal->
+ * fields are updated only in __exit_signal, which is called with
+ * tasklist_lock taken for write, hence these two threads cannot execute
+ * concurrently.
+ *
*/
static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
@@ -1681,37 +1697,51 @@ static void k_getrusage(struct task_stru
struct task_struct *t;
unsigned long flags;
cputime_t utime, stime;
+ int need_lock = 0;
memset((char *) r, 0, sizeof *r);
- if (unlikely(!p->signal))
- return;
+ need_lock = (p == current && thread_group_empty(p));
+
+ if (need_lock) {
+ read_lock(&tasklist_lock);
+ if (unlikely(!p->signal)) {
+ read_unlock(&tasklist_lock);
+ return;
+ }
+ } else
+ /* See locking comments above */
+ smp_rmb();
switch (who) {
case RUSAGE_CHILDREN:
- spin_lock_irqsave(&p->sighand->siglock, flags);
+ if (need_lock)
+ spin_lock_irqsave(&p->sighand->siglock, flags);
utime = p->signal->cutime;
stime = p->signal->cstime;
r->ru_nvcsw = p->signal->cnvcsw;
r->ru_nivcsw = p->signal->cnivcsw;
r->ru_minflt = p->signal->cmin_flt;
r->ru_majflt = p->signal->cmaj_flt;
- spin_unlock_irqrestore(&p->sighand->siglock, flags);
+ if (need_lock)
+ spin_unlock_irqrestore(&p->sighand->siglock, flags);
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
break;
case RUSAGE_SELF:
- spin_lock_irqsave(&p->sighand->siglock, flags);
utime = stime = cputime_zero;
goto sum_group;
case RUSAGE_BOTH:
- spin_lock_irqsave(&p->sighand->siglock, flags);
+ if (need_lock)
+ spin_lock_irqsave(&p->sighand->siglock, flags);
utime = p->signal->cutime;
stime = p->signal->cstime;
r->ru_nvcsw = p->signal->cnvcsw;
r->ru_nivcsw = p->signal->cnivcsw;
r->ru_minflt = p->signal->cmin_flt;
r->ru_majflt = p->signal->cmaj_flt;
+ if (need_lock)
+ spin_unlock_irqrestore(&p->sighand->siglock, flags);
sum_group:
utime = cputime_add(utime, p->signal->utime);
stime = cputime_add(stime, p->signal->stime);
@@ -1729,21 +1759,21 @@ static void k_getrusage(struct task_stru
r->ru_majflt += t->maj_flt;
t = next_thread(t);
} while (t != p);
- spin_unlock_irqrestore(&p->sighand->siglock, flags);
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
break;
default:
BUG();
}
+
+ if (need_lock)
+ read_unlock(&tasklist_lock);
}
int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
{
struct rusage r;
- read_lock(&tasklist_lock);
k_getrusage(p, who, &r);
- read_unlock(&tasklist_lock);
return copy_to_user(ru, &r, sizeof(r)) ? -EFAULT : 0;
}
next prev parent reply other threads:[~2006-01-04 23:16 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-24 17:52 [rfc][patch] Avoid taking global tasklist_lock for single threaded process at getrusage() Oleg Nesterov
2005-12-27 20:21 ` Christoph Lameter
2005-12-28 12:38 ` [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess " Oleg Nesterov
2005-12-28 18:33 ` Ravikiran G Thirumalai
2005-12-28 22:57 ` Ravikiran G Thirumalai
2005-12-30 17:57 ` Oleg Nesterov
2006-01-04 23:16 ` Ravikiran G Thirumalai [this message]
2006-01-05 19:17 ` Oleg Nesterov
2006-01-06 9:46 ` Ravikiran G Thirumalai
2006-01-06 17:23 ` Christoph Lameter
2006-01-06 19:46 ` Ravikiran G Thirumalai
2006-03-20 18:04 ` Oleg Nesterov
2006-03-22 22:18 ` Ravikiran G Thirumalai
2006-03-23 18:18 ` Oleg Nesterov
2006-01-06 23:52 ` Andrew Morton
2006-01-08 11:49 ` Oleg Nesterov
2006-01-08 19:58 ` Ravikiran G Thirumalai
2006-01-09 18:55 ` Oleg Nesterov
2006-01-09 20:54 ` Ravikiran G Thirumalai
2006-01-10 19:03 ` Oleg Nesterov
2006-01-16 20:56 ` Ravikiran G Thirumalai
2006-01-17 19:59 ` Oleg Nesterov
2006-01-17 19:52 ` Ravikiran G Thirumalai
2006-01-18 9:17 ` Oleg Nesterov
2006-01-03 18:18 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060104231600.GA3664@localhost.localdomain \
--to=kiran@scalex86.org \
--cc=akpm@osdl.org \
--cc=clameter@engr.sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nippung@calsoftinc.com \
--cc=oleg@tv-sign.ru \
--cc=shai@scalex86.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.