All of lore.kernel.org
 help / color / mirror / Atom feed
From: KaiGai Kohei <kaigai@ak.jp.nec.com>
To: Andrew Morton <akpm@osdl.org>
Cc: KaiGai Kohei <kaigai@ak.jp.nec.com>, linux-kernel@vger.kernel.org
Subject: Re: Add pacct_struct to fix some pacct bugs.
Date: Thu, 22 Jun 2006 12:07:19 +0900	[thread overview]
Message-ID: <449A0967.2020701@ak.jp.nec.com> (raw)
In-Reply-To: <4497A34C.2000104@ak.jp.nec.com>

The seriese of patches fixes some process accounting bugs.

[PATCH 1/3] two phase process accounting
[PATCH 2/3] avoidance to refer the last thread as a representation
             of the process
[PATCH 3/3] none-delayed process accounting accumulation

* background of the patch.1

     The pacct facility need an i/o operation when an accounting record
     is generated. There is a possibility to wake OOM killer up.
     If OOM killer is activated, it kills some processes to make them
     release process memory regions.
     But acct_process() is called in the killed processes context
     before calling exit_mm(), so those processes cannot release
     own memory. In the results, any processes stop in this point and
     it finally cause a system stall.

     ---- in kernel/exit.c : do_exit() ------------
             group_dead = atomic_dec_and_test(&tsk->signal->live);
             if (group_dead) {
                     hrtimer_cancel(&tsk->signal->real_timer);
                     exit_itimers(tsk->signal);
                     acct_process(code);
             }
                :
            - snip -
                :
             exit_mm(tsk);
     ----------------------------------------------

     This patch separates generating an accounting record facility
     into two-phase.
     In the first one, acct_collect() calculate vitual memory size
     of the process and stores it into pacct_struct before exit_mm().
     Then, acct_process() generates an accounting record and write
     it into medium.


* background of the patch.2

     When pacct facility generate an 'ac_flag' field in accounting record,
     it refers a task_struct of the thread which died last in the process.
     But any other task_structs are ignored.

     Therefore, pacct facility drops ASU flag even if root-privilege
     operations are used by any other threads except the last one.
     In addition, AFORK flag is always set when the thread of group-leader
     didn't die last, although this process has called execve() after fork().

     We have a same matter in ac_exitcode. The recorded ac_exitcode is
     an exit code of the last thread in the process. There is a possibility
     this exitcode is not the group leader's one.

     ---- in kernel/acct.c : do_acct_process() ----
             ac.ac_flag = 0;
             if (current->flags & PF_FORKNOEXEC)
                     ac.ac_flag |= AFORK;
             if (current->flags & PF_SUPERPRIV)
                     ac.ac_flag |= ASU;
             if (current->flags & PF_DUMPCORE)
                     ac.ac_flag |= ACORE;
             if (current->flags & PF_SIGNALED)
                     ac.ac_flag |= AXSIG;
                       :
                    - snip -
                       :
             ac.ac_exitcode = exitcode;
     ----------------------------------------------

     This patch fixes those matters.
     - The exit code of group leader is recorded as ac_exitcode.
     - ASU, ACORE, AXSIG flag are marked if any task_struct satisfy
       the conditions.
     - AFORK flag is marked if only group leader thread satisfy
       the condition.


* background of the patch.3

     In current 2.6.17 implementation, signal_struct refered from task_struct
     is used for per-process data structure. The pacct facility also uses it
     as a per-process data structure to store stime, utime, minflt, majflt.
     But those members are saved in __exit_signal(). It's too late.

     For example, if some threads exits at same time, pacct facility has
     a possibility to drop accountings for a part of those threads.
     (see, the following 'The results of original 2.6.17 kernel')
     I think accounting information should be completely collected into
     the per-process data structure before writing out an accounting record.

     This patch fixes this matter. Accumulation of stime, utime, minflt
     and majflt are done before generating accounting record.


* accounting results

[in original 2.6.17 cases]

# accton acct.log
# time -p ./bugacct
real 10.07
user 5.96
sys 0.10
# time -p ./raceacct 4
real 6.92
user 27.22
sys 0.00
# time -p ./raceacct 4
real 7.71
user 30.14
sys 0.00
# time -p ./raceacct 4
real 6.94
user 27.21
sys 0.00
# time -p ./raceacct 4
real 6.25
user 24.42
sys 0.00
# time -p ./raceacct 4
real 6.92
user 27.22
sys 0.00

-- accounting results --------
FLAG    BTIME  ETIME  UTIME  STIME     MEM  MINFLT MAJFLT      COMM
-P-- 13:41:16      5       0     0    3072     110      0    accton
F--- 13:41:35   1006     596     9  143232    8200      0   bugacct *
F--- 13:41:53    692    2032     0   28528      38      0  raceacct *
---- 13:42:10    771    3014     0   28528     170      0  raceacct
F--- 13:42:19    694    2027     0   28528       8      0  raceacct *
F--- 13:42:26    625    1832     0   28528      40      0  raceacct *
---- 13:45:40    692    2722     0   28528     171      0  raceacct

'P' means this process used root privilege operations.
'F' means this process didn't execve() after fork().

=> bugacct used root privilege operation, but pacct facility droped it.
=> In raceacct, some threads exit on same time. pacct facility often drops
    a part of utime, stime, minflt and majflt.
=> When group leader thread didn't die last in raceacct, incorrent flag 'F'
    is set.


[in patched 2.6.17-kg cases]

# touch acct.log
# accton acct.log
# time -p ./bugacct
real 10.07
user 5.97
sys 0.09
# time -p ./raceacct 4
real 7.11
user 27.76
sys 0.00
# time -p ./raceacct 4
real 6.93
user 27.18
sys 0.00
# time -p ./raceacct 4
real 7.11
user 27.76
sys 0.00
# time -p ./raceacct 4
real 7.12
user 27.77
sys 0.00
# time -p ./raceacct 4
real 6.92
user 27.17
sys 0.00

-- accounting results --------
FLAG    BTIME  ETIME  UTIME  STIME     MEM  MINFLT MAJFLT      COMM
-P-- 13:24:01      0      0      0    3072     111      0    accton
-P-- 13:24:05   1007    597      8  143232    8360      0   bugacct
---- 13:24:35    711   2776      0   28528     171      0  raceacct
---- 13:24:44    693   2718      0   28528     172      0  raceacct
---- 13:24:51    711   2776      0   28528     172      0  raceacct
---- 13:25:05    712   2777      0   28528     174      0  raceacct
---- 13:25:14    692   2717      0   28528     171      0  raceacct

I hope your any comments. Thanks,

KaiGai Kohei wrote:
 >>> Hi, I noticed three problems in pacct facility.
 >>>
 >>> 1. Pacct facility has a possibility to write incorrect ac_flag
 >>>    in multi-threading cases.
 >>> 2. There is a possibility to be waken up OOM Killer from
 >>>    pacct facility. It will cause system stall.
 >>> 3. If several threads are killed at same time, There is
 >>>    a possibility not to pick up a part of those accountings.
 >>>
 >>> The attached patch will resolve those matters.
 >>> Any comments please. Thanks,
 >>
 >> Thanks, but you have three quite distinct bugs here, and three quite
 >> distinct descriptions and, I think, three quite distinct fixes.
 >>
 >> Would it be possible for you to prepare three patches?
 >
 > It may be possible. Please wait for a while to separate it into
 > three-part and to confirm its correct behavior.
 >
 > Thanks,
-- 
Open Source Software Promotion Center, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

  reply	other threads:[~2006-06-22  3:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-20  6:24 Add pacct_struct to fix some pacct bugs KaiGai Kohei
2006-06-20  6:42 ` Andrew Morton
2006-06-20  7:27   ` KaiGai Kohei
2006-06-22  3:07     ` KaiGai Kohei [this message]
2006-06-22  3:08       ` KaiGai Kohei
2006-06-22  3:09       ` KaiGai Kohei
2006-06-22  3:09       ` KaiGai Kohei
2006-06-22  3:35       ` Andrew Morton
2006-06-22  4:05         ` KaiGai Kohei
2006-06-22  4:14         ` Randy.Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=449A0967.2020701@ak.jp.nec.com \
    --to=kaigai@ak.jp.nec.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.