All of lore.kernel.org
 help / color / mirror / Atom feed
From: KaiGai Kohei <kaigai@ak.jp.nec.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Roland McGrath <roland@redhat.com>
Subject: Fix pacct bug in multithreading case.
Date: Tue, 28 Mar 2006 20:43:49 +0900	[thread overview]
Message-ID: <44292175.6030605@ak.jp.nec.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 3227 bytes --]

Hello,

I noticed a bug on the process accounting facility.
In multi-threading process, some data would be recorded incorrectly
when the group_leader dies earlier than one or more threads.
The attached patch fixes this problem.

See below. 'bugacct' is a test program that create a worker thread
after 4 seconds sleeping, then the group_leader dies soon.
The worker thread consume CPU/Memory for 6 seconds, then exit.
We can estimate 10 seconds as etime and 6 seconds as stime + utime.
This is a sample program which the group_leader dies earlier than
other threads.

The results of same binary execution on different kernel are below.
-- accounted records --------------------
         |   btime  | utime | stime | etime | minflt | majflt |   comm  |
original | 13:16:40 |  0.00 |  0.00 |  6.10 |    171 |      0 | bugacct |
 patched | 13:20:21 |  5.83 |  0.18 | 10.03 |  32776 |      0 | bugacct |
(*) bugacct allocates 128MB memory, thus 128MB / 4KB = 32768 of minflt is
    appropriate.

-- Test results in original kernel ------
$ date; time -p ./bugacct
Tue Mar 28 13:16:36 JST 2006  <- But pacct said btime is 13:16:40
real 10.11                    <- But pacct said etime is 6.10
user 5.96                     <- But pacct said utime is 0.00
sys 0.14                      <- But pacct said stime is 0.00
$
-- Test results in patched kernel -------
$ date; time -p ./bugacct
Tue Mar 28 13:20:21 JST 2006
real 10.04
user 5.83
sys 0.19
$

In the original 2.6.16 kernel, pacct records btime, utime, stime, etime and
minflt incorrectly. In my opinion, this problem is caused by an assumption
that group_leader dies last.

The following section calculates process running time for etime and btime.
But it means running time of the thread that dies last, not process.
The start_time of the first thread in the process (group_leader) should
be reduced from uptime to calculate etime and btime correctly.
---- do_acct_process() in kernel/acct.c:
/* calculate run_time in nsec*/
do_posix_clock_monotonic_gettime(&uptime);
run_time = (u64)uptime.tv_sec*NSEC_PER_SEC + uptime.tv_nsec;
run_time -= (u64)current->start_time.tv_sec*NSEC_PER_SEC
                                + current->start_time.tv_nsec;
----

The following section calculates stime and utime of the process.
But it might count the utime and stime of the group_leader duplicatly
and ignore the utime and stime of the thread dies last, when one or
more threads remain after group_leader dead.
The ac_utime should be calculated as the sum of the signal->utime
and utime of the thread dies last. The ac_stime should be done also.
---- do_acct_process() in kernel/acct.c:
jiffies = cputime_to_jiffies(cputime_add(current->group_leader->utime,
                                         current->signal->utime));
ac.ac_utime = encode_comp_t(jiffies_to_AHZ(jiffies));
jiffies = cputime_to_jiffies(cputime_add(current->group_leader->stime,
                                         current->signal->stime));
ac.ac_stime = encode_comp_t(jiffies_to_AHZ(jiffies));
----

The part of the minflt/majflt calculation has same problem.
This patch solves those problems, I think.

Any comments are welcome. Thanks.
-- 
Linux Promotion Center, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

[-- Attachment #2: fixbug_pacct_incorrect_records.patch --]
[-- Type: text/plain, Size: 1784 bytes --]

--- linux-2.6.16/kernel/acct.c	2006-03-20 14:53:29.000000000 +0900
+++ linux-2.6.16-kg/kernel/acct.c	2006-03-27 16:25:11.000000000 +0900
@@ -449,8 +449,8 @@ static void do_acct_process(long exitcod
 	/* calculate run_time in nsec*/
 	do_posix_clock_monotonic_gettime(&uptime);
 	run_time = (u64)uptime.tv_sec*NSEC_PER_SEC + uptime.tv_nsec;
-	run_time -= (u64)current->start_time.tv_sec*NSEC_PER_SEC
-					+ current->start_time.tv_nsec;
+	run_time -= (u64)current->group_leader->start_time.tv_sec * NSEC_PER_SEC
+		       + current->group_leader->start_time.tv_nsec;
 	/* convert nsec -> AHZ */
 	elapsed = nsec_to_AHZ(run_time);
 #if ACCT_VERSION==3
@@ -469,10 +469,10 @@ static void do_acct_process(long exitcod
 #endif
 	do_div(elapsed, AHZ);
 	ac.ac_btime = xtime.tv_sec - elapsed;
-	jiffies = cputime_to_jiffies(cputime_add(current->group_leader->utime,
+	jiffies = cputime_to_jiffies(cputime_add(current->utime,
 						 current->signal->utime));
 	ac.ac_utime = encode_comp_t(jiffies_to_AHZ(jiffies));
-	jiffies = cputime_to_jiffies(cputime_add(current->group_leader->stime,
+	jiffies = cputime_to_jiffies(cputime_add(current->stime,
 						 current->signal->stime));
 	ac.ac_stime = encode_comp_t(jiffies_to_AHZ(jiffies));
 	/* we really need to bite the bullet and change layout */
@@ -522,9 +522,9 @@ static void do_acct_process(long exitcod
 	ac.ac_io = encode_comp_t(0 /* current->io_usage */);	/* %% */
 	ac.ac_rw = encode_comp_t(ac.ac_io / 1024);
 	ac.ac_minflt = encode_comp_t(current->signal->min_flt +
-				     current->group_leader->min_flt);
+				     current->min_flt);
 	ac.ac_majflt = encode_comp_t(current->signal->maj_flt +
-				     current->group_leader->maj_flt);
+				     current->maj_flt);
 	ac.ac_swaps = encode_comp_t(0);
 	ac.ac_exitcode = exitcode;
 

[-- Attachment #3: bugacct.c --]
[-- Type: text/plain, Size: 743 bytes --]

#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <sys/time.h>

#define BUFFER_SIZE (128 * 1024 * 1024)  /* 128MB */

void *mychild(void *buffer) {
	struct timeval tv;
	u_int64_t t1, t2;

	gettimeofday(&tv, NULL);
	t1 = tv.tv_sec * 1000000 + tv.tv_usec;
	t2 = t1 + 6 * 1000000;
	/* heavy CPU/Mem job */
	srand(tv.tv_usec);
	while(t1 < t2) {
	    memset(buffer, rand(), BUFFER_SIZE);
            gettimeofday(&tv, NULL);
            t1 = tv.tv_sec * 1000000 + tv.tv_usec;
	}
	return NULL;
}

int main(int argc, char *argv[]) {
	pthread_t pthread;
	void *buffer = NULL;
	
	buffer = malloc(BUFFER_SIZE);
	if (!buffer)
	    return 1;

	sleep(4);
	pthread_create(&pthread, NULL, mychild, buffer);
	sleep(1);
	pthread_exit(0);
}

             reply	other threads:[~2006-03-28 11:44 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-28 11:43 KaiGai Kohei [this message]
2006-04-04  9:25 ` Fix pacct bug in multithreading case Roland McGrath

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44292175.6030605@ak.jp.nec.com \
    --to=kaigai@ak.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=roland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.