All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>,
	Shailabh Nagar <nagar1234@in.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	John stultz <johnstul@us.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Roland McGrath <roland@redhat.com>,
	Valdis.Kletnieks@vt.edu
Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: [patch v2 3/4] taskstats: Introduce kernel.full_cdata sysctl
Date: Mon, 29 Nov 2010 17:42:40 +0100	[thread overview]
Message-ID: <20101129164435.557525646@linux.vnet.ibm.com> (raw)
In-Reply-To: 20101129164237.522034198@linux.vnet.ibm.com

[-- Attachment #1: 03-taskstats-top-improve-ctime-account-cdata_acct.patch --]
[-- Type: text/plain, Size: 4304 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

Version 2
---------
* Implement sysctl instead of adding parallel cdata_acct
* Call __account_cdata() in __exit_signal() without tsk siglock

Description
-----------
Currently the cumulative time accounting in Linux is not complete.
Due to POSIX POSIX.1-2001, the CPU time of processes is not accounted
to the cumulative time of the parents, if the parents ignore SIGCHLD
or have set SA_NOCLDWAIT. This behaviour has the major drawback that
it is not possible to calculate all consumed CPU time of a system by
looking at the current tasks. CPU time can be lost.

This patch adds a new sysctl "kernel.full_cdata" that allows to switch
between the POSIX behavior and complete cumulative accounting.
The default is the POSIX semantics.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 Documentation/sysctl/kernel.txt |   12 ++++++++++++
 kernel/exit.c                   |   13 +++++++++----
 kernel/sysctl.c                 |   10 ++++++++++
 3 files changed, 31 insertions(+), 4 deletions(-)

--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -247,6 +247,18 @@ see the hostname(1) man page.
 
 ==============================================================
 
+full_cdata:
+
+With "full_cdata = 0" (default) cumulative resource accounting
+under Linux is done according to POSIX.1-2001: The resource
+counters of processes are not accounted to the cumulative counters
+of their parents, if the parents ignore SIGCHLD or have set
+SA_NOCLDWAIT. With "full_cdata = 1" it is possible to enforce
+that all dead processes without exception are accounted to their
+parents.
+
+==============================================================
+
 hotplug:
 
 Path for the hotplug policy agent.
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -57,6 +57,8 @@
 #include <asm/pgtable.h>
 #include <asm/mmu_context.h>
 
+int full_cdata_enabled;
+
 static void exit_mm(struct task_struct * tsk);
 
 static void __unhash_process(struct task_struct *p, bool group_dead)
@@ -77,7 +79,7 @@ static void __unhash_process(struct task
 static void __account_cdata(struct task_struct *p)
 {
 	struct cdata *cd, *pcd, *tcd;
-	unsigned long maxrss;
+	unsigned long maxrss, flags;
 	cputime_t tgutime, tgstime;
 
 	/*
@@ -104,7 +106,7 @@ static void __account_cdata(struct task_
 	tcd = &p->signal->cdata_threads;
 	cd = &p->signal->cdata_wait;
 
-	spin_lock_irq(&p->real_parent->sighand->siglock);
+	spin_lock_irqsave(&p->real_parent->sighand->siglock, flags);
 	pcd->utime =
 		cputime_add(pcd->utime,
 		cputime_add(tgutime,
@@ -137,7 +139,7 @@ static void __account_cdata(struct task_
 		pcd->maxrss = maxrss;
 	task_io_accounting_add(&p->real_parent->signal->ioac, &p->ioac);
 	task_io_accounting_add(&p->real_parent->signal->ioac, &p->signal->ioac);
-	spin_unlock_irq(&p->real_parent->sighand->siglock);
+	spin_unlock_irqrestore(&p->real_parent->sighand->siglock, flags);
 }
 
 /*
@@ -150,6 +152,9 @@ static void __exit_signal(struct task_st
 	struct sighand_struct *sighand;
 	struct tty_struct *uninitialized_var(tty);
 
+	if (group_dead && full_cdata_enabled)
+		__account_cdata(tsk);
+
 	sighand = rcu_dereference_check(tsk->sighand,
 					rcu_read_lock_held() ||
 					lockdep_tasklist_lock_is_held());
@@ -1292,7 +1297,7 @@ static int wait_task_zombie(struct wait_
 	 * It can be ptraced but not reparented, check
 	 * !task_detached() to filter out sub-threads.
 	 */
-	if (likely(!traced) && likely(!task_detached(p)))
+	if (likely(!traced) && likely(!task_detached(p)) && !full_cdata_enabled)
 		__account_cdata(p);
 
 	/*
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -85,6 +85,7 @@
 #if defined(CONFIG_SYSCTL)
 
 /* External variables not in a header file. */
+extern int full_cdata_enabled;
 extern int sysctl_overcommit_memory;
 extern int sysctl_overcommit_ratio;
 extern int max_threads;
@@ -963,6 +964,15 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 #endif
+	{
+		.procname	= "full_cdata",
+		.data		= &full_cdata_enabled,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
 /*
  * NOTE: do not add new entries to this table unless you have read
  * Documentation/sysctl/ctl_unnumbered.txt

  parent reply	other threads:[~2010-11-29 16:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-29 16:42 [patch v2 0/4] taskstats: Improve cumulative time accounting Michael Holzheu
2010-11-29 16:42 ` [patch v2 1/4] taskstats: Introduce "struct cdata" Michael Holzheu
2010-11-29 16:42 ` [patch v2 2/4] taskstats: Introduce __account_cdata() function Michael Holzheu
2010-11-29 16:42 ` Michael Holzheu [this message]
2010-11-29 16:42 ` [patch v2 4/4] taskstats: Export "cdata_wait" CPU times with taskstats Michael Holzheu
2010-12-01 18:51   ` Oleg Nesterov
2010-12-02 16:34     ` Michael Holzheu
2010-12-06  9:06       ` Balbir Singh
2010-12-08 20:23       ` Oleg Nesterov
2010-12-10 13:26         ` Michael Holzheu
     [not found]           ` <20101211173931.GA8084@redhat.com>
2010-12-13 13:05             ` Michael Holzheu
2010-12-13 13:20               ` Michael Holzheu
2010-12-13 14:33                 ` Oleg Nesterov
2010-12-13 16:42                   ` Michael Holzheu
2010-12-03  7:33     ` Balbir Singh
2010-12-06 12:37       ` Michael Holzheu
2010-12-06 15:15         ` Balbir Singh
2010-12-07 10:45     ` Michael Holzheu
2010-12-08 20:39       ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101129164435.557525646@linux.vnet.ibm.com \
    --to=holzheu@linux.vnet.ibm.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=nagar1234@in.ibm.com \
    --cc=oleg@redhat.com \
    --cc=roland@redhat.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.