From: KaiGai Kohei <kaigai@ak.jp.nec.com>
To: Andrew Morton <akpm@osdl.org>
Cc: KaiGai Kohei <kaigai@ak.jp.nec.com>, linux-kernel@vger.kernel.org
Subject: Re: Add pacct_struct to fix some pacct bugs.
Date: Thu, 22 Jun 2006 12:07:19 +0900 [thread overview]
Message-ID: <449A0967.2020701@ak.jp.nec.com> (raw)
In-Reply-To: <4497A34C.2000104@ak.jp.nec.com>
The seriese of patches fixes some process accounting bugs.
[PATCH 1/3] two phase process accounting
[PATCH 2/3] avoidance to refer the last thread as a representation
of the process
[PATCH 3/3] none-delayed process accounting accumulation
* background of the patch.1
The pacct facility need an i/o operation when an accounting record
is generated. There is a possibility to wake OOM killer up.
If OOM killer is activated, it kills some processes to make them
release process memory regions.
But acct_process() is called in the killed processes context
before calling exit_mm(), so those processes cannot release
own memory. In the results, any processes stop in this point and
it finally cause a system stall.
---- in kernel/exit.c : do_exit() ------------
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
exit_itimers(tsk->signal);
acct_process(code);
}
:
- snip -
:
exit_mm(tsk);
----------------------------------------------
This patch separates generating an accounting record facility
into two-phase.
In the first one, acct_collect() calculate vitual memory size
of the process and stores it into pacct_struct before exit_mm().
Then, acct_process() generates an accounting record and write
it into medium.
* background of the patch.2
When pacct facility generate an 'ac_flag' field in accounting record,
it refers a task_struct of the thread which died last in the process.
But any other task_structs are ignored.
Therefore, pacct facility drops ASU flag even if root-privilege
operations are used by any other threads except the last one.
In addition, AFORK flag is always set when the thread of group-leader
didn't die last, although this process has called execve() after fork().
We have a same matter in ac_exitcode. The recorded ac_exitcode is
an exit code of the last thread in the process. There is a possibility
this exitcode is not the group leader's one.
---- in kernel/acct.c : do_acct_process() ----
ac.ac_flag = 0;
if (current->flags & PF_FORKNOEXEC)
ac.ac_flag |= AFORK;
if (current->flags & PF_SUPERPRIV)
ac.ac_flag |= ASU;
if (current->flags & PF_DUMPCORE)
ac.ac_flag |= ACORE;
if (current->flags & PF_SIGNALED)
ac.ac_flag |= AXSIG;
:
- snip -
:
ac.ac_exitcode = exitcode;
----------------------------------------------
This patch fixes those matters.
- The exit code of group leader is recorded as ac_exitcode.
- ASU, ACORE, AXSIG flag are marked if any task_struct satisfy
the conditions.
- AFORK flag is marked if only group leader thread satisfy
the condition.
* background of the patch.3
In current 2.6.17 implementation, signal_struct refered from task_struct
is used for per-process data structure. The pacct facility also uses it
as a per-process data structure to store stime, utime, minflt, majflt.
But those members are saved in __exit_signal(). It's too late.
For example, if some threads exits at same time, pacct facility has
a possibility to drop accountings for a part of those threads.
(see, the following 'The results of original 2.6.17 kernel')
I think accounting information should be completely collected into
the per-process data structure before writing out an accounting record.
This patch fixes this matter. Accumulation of stime, utime, minflt
and majflt are done before generating accounting record.
* accounting results
[in original 2.6.17 cases]
# accton acct.log
# time -p ./bugacct
real 10.07
user 5.96
sys 0.10
# time -p ./raceacct 4
real 6.92
user 27.22
sys 0.00
# time -p ./raceacct 4
real 7.71
user 30.14
sys 0.00
# time -p ./raceacct 4
real 6.94
user 27.21
sys 0.00
# time -p ./raceacct 4
real 6.25
user 24.42
sys 0.00
# time -p ./raceacct 4
real 6.92
user 27.22
sys 0.00
-- accounting results --------
FLAG BTIME ETIME UTIME STIME MEM MINFLT MAJFLT COMM
-P-- 13:41:16 5 0 0 3072 110 0 accton
F--- 13:41:35 1006 596 9 143232 8200 0 bugacct *
F--- 13:41:53 692 2032 0 28528 38 0 raceacct *
---- 13:42:10 771 3014 0 28528 170 0 raceacct
F--- 13:42:19 694 2027 0 28528 8 0 raceacct *
F--- 13:42:26 625 1832 0 28528 40 0 raceacct *
---- 13:45:40 692 2722 0 28528 171 0 raceacct
'P' means this process used root privilege operations.
'F' means this process didn't execve() after fork().
=> bugacct used root privilege operation, but pacct facility droped it.
=> In raceacct, some threads exit on same time. pacct facility often drops
a part of utime, stime, minflt and majflt.
=> When group leader thread didn't die last in raceacct, incorrent flag 'F'
is set.
[in patched 2.6.17-kg cases]
# touch acct.log
# accton acct.log
# time -p ./bugacct
real 10.07
user 5.97
sys 0.09
# time -p ./raceacct 4
real 7.11
user 27.76
sys 0.00
# time -p ./raceacct 4
real 6.93
user 27.18
sys 0.00
# time -p ./raceacct 4
real 7.11
user 27.76
sys 0.00
# time -p ./raceacct 4
real 7.12
user 27.77
sys 0.00
# time -p ./raceacct 4
real 6.92
user 27.17
sys 0.00
-- accounting results --------
FLAG BTIME ETIME UTIME STIME MEM MINFLT MAJFLT COMM
-P-- 13:24:01 0 0 0 3072 111 0 accton
-P-- 13:24:05 1007 597 8 143232 8360 0 bugacct
---- 13:24:35 711 2776 0 28528 171 0 raceacct
---- 13:24:44 693 2718 0 28528 172 0 raceacct
---- 13:24:51 711 2776 0 28528 172 0 raceacct
---- 13:25:05 712 2777 0 28528 174 0 raceacct
---- 13:25:14 692 2717 0 28528 171 0 raceacct
I hope your any comments. Thanks,
KaiGai Kohei wrote:
>>> Hi, I noticed three problems in pacct facility.
>>>
>>> 1. Pacct facility has a possibility to write incorrect ac_flag
>>> in multi-threading cases.
>>> 2. There is a possibility to be waken up OOM Killer from
>>> pacct facility. It will cause system stall.
>>> 3. If several threads are killed at same time, There is
>>> a possibility not to pick up a part of those accountings.
>>>
>>> The attached patch will resolve those matters.
>>> Any comments please. Thanks,
>>
>> Thanks, but you have three quite distinct bugs here, and three quite
>> distinct descriptions and, I think, three quite distinct fixes.
>>
>> Would it be possible for you to prepare three patches?
>
> It may be possible. Please wait for a while to separate it into
> three-part and to confirm its correct behavior.
>
> Thanks,
--
Open Source Software Promotion Center, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>
next prev parent reply other threads:[~2006-06-22 3:08 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-20 6:24 Add pacct_struct to fix some pacct bugs KaiGai Kohei
2006-06-20 6:42 ` Andrew Morton
2006-06-20 7:27 ` KaiGai Kohei
2006-06-22 3:07 ` KaiGai Kohei [this message]
2006-06-22 3:08 ` KaiGai Kohei
2006-06-22 3:09 ` KaiGai Kohei
2006-06-22 3:09 ` KaiGai Kohei
2006-06-22 3:35 ` Andrew Morton
2006-06-22 4:05 ` KaiGai Kohei
2006-06-22 4:14 ` Randy.Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=449A0967.2020701@ak.jp.nec.com \
--to=kaigai@ak.jp.nec.com \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.