All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Shailabh Nagar <nagar1234@in.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Venkatesh Pallipadi <venki@google.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	John stultz <johnstul@us.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@elte.hu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH 09/10] taskstats: Fix exit CPU time accounting
Date: Thu, 14 Oct 2010 15:47:16 +0200	[thread overview]
Message-ID: <20101014134716.GA5187@redhat.com> (raw)
In-Reply-To: <1286889057.1932.77.camel@holzheu-laptop>

Michael, sorry for delay...

On 10/12, Michael Holzheu wrote:
>
> On Mon, 2010-10-11 at 14:37 +0200, Oleg Nesterov wrote:
> >
> > > > Also, the logic behind ->exit_accounting_done looks wrong (and unneeded)
> > > > but I am not sure...
> > >
> > > I think the logic is correct,
> >
> > OK, I misread the patch as if we always account the exited task in
> > parent's cdata_acct,
> >
> > 	+       struct cdata cdata_wait; /* parents have done sys_wait() */
> > 	+       struct cdata cdata_acct; /* complete cumulative data from acct tree */
> >
> > while in fact the "complete" data is cdata_wait + cdata_acct.
>
> No. The complete data is in cdata_acct. It contains both, the task times
> where sys_wait() has been done and the task times, where the tasks have
> reaped themselves.

Hmm. This means my first understanding was correct. But now I am
confused again, see below.

> > Hmm. Let's return to your example above,
> >
> > 	> Snapshot 1: P1 -> P2 -> P3
> > 	> Snapshot 2: P1
> > 	> ...
> > 	> P1 got all the CPU time of P2 and P3
> >
> > Suppose that P2 dies before P3. Then P3 dies, /sbin/init does wait and
> > accounts this task. This means it is not accounted in P1->signal->cdata_acct,
> > no?
>
> No. __account_to_parent() with wait=1 is called when init waits for P3.
> Then both sets are updated cdata_acct and cdata_wait:
>
> +static void __account_to_parent(struct task_struct *p, int wait)
> +{
> +       if (wait)
> +               __account_ctime(p, &p->real_parent->signal->cdata_wait,
> +                               &p->signal->cdata_wait);
> +       __account_ctime(p, &p->acct_parent->signal->cdata_acct,
> +                       &p->signal->cdata_acct);
> +       p->exit_accounting_done = 1;
>
> If a tasks reaps itself, only cdata_acct is updated.

Yes. But __account_to_parent() always sets p->exit_accounting_done = 1.
And __exit_signal() calls __account_to_parent() only if it is not set.

This means that we update either cdata_wait (if the child was reaped
by parent) or cdata_acct (the process auto-reaps itself).

That is why I thought that ->exit_accounting_done should die, and
__exit_signal() should always call __account_to_parent() to update
cdata_acct.

Or I missed something? Confused ;)

 > > Sorry for my ignorance. Probably I have not understood what happens, if
> > > a thread group leader dies. My assumption was that then the whole thread
> > > group dies.
> >
> > No. A thread group dies when the last thread dies. If a leader exits
> > it becomes a zombie until all other sub-threads exit.
>
> That brought me to another question: Does this mean that the thread
> group leader never changes and is always alive (at least as zombie) as
> long as the thread group lives?

Yes. Except de_thread() can change the leader. The new leader is the
thread which calls exec.

> > > Also I assumed that a parent can only be a thread group
> > > leader.
> >
> > No. If a thread T does fork(), the child's ->real_parent is T, not
> > T->group_leader. If T exits, we do not reparent its children to init
> > (unless it is the last thread, of course), we pick another live
> > thread in this thread group for reparenting.
>
> Ok, I hope that I understand now. So either we could set the acct_parent
> to the thread group leader in fork(), or we use the new parent in the
> thread group if there are live threads left, when a thread exits.
>
> Something like the following:
>
>  static void forget_original_parent(struct task_struct *father)
>  {
> +       struct pid_namespace *pid_ns = task_active_pid_ns(father);
>         struct task_struct *p, *n, *reaper;
>         LIST_HEAD(dead_children);
>
>         exit_ptrace(father);
>
>         reaper = find_new_reaper(father);
>
> +       list_for_each_entry_safe(p, n, &father->children_acct, sibling_acct) {
> +               struct task_struct *t = p;
> +               do {
> +                       if (pid_ns->child_reaper == reaper)
> +                               t->acct_parent = t->acct_parent->acct_parent;
> +                       else
> +                               t->acct_parent = reaper;
> +               } while_each_thread(p, t);
> +               list_move_tail(&p->sibling_acct,
> +                              &p->acct_parent->children_acct);
> +       }
> +

I think you can simplify this, but  I am not sure right now.

First of all, ->acct_parent should be moved from task_struct to
signal_struct. No need to initialize t->acct_parent unless t is
the group leader (this means we can avoid do/while_each_thread
loop during re-parenting, but de_thread needs another trivial
change).


No need to change forget_original_parent() at all, instead we
can the single line

	p->signal->acct_parent = father->signal->acct_parent;

to reparent_leader(), after the "if (same_thread_group())" check.

What do you think?

Oleg.

  reply	other threads:[~2010-10-14 13:47 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-23 13:48 [RFC][PATCH 00/10] taskstats: Enhancements for precise accounting Michael Holzheu
2010-09-23 14:00 ` [RFC][PATCH 01/10] taskstats: Use real microsecond granularity for CPU times Michael Holzheu
2010-10-07  5:08   ` Balbir Singh
2010-10-08 15:08     ` Michael Holzheu
2010-10-08 16:39       ` Balbir Singh
2010-09-23 14:01 ` [RFC][PATCH 02/10] taskstats: Separate taskstats commands Michael Holzheu
2010-09-27  9:32   ` Balbir Singh
2010-10-11  7:40   ` Balbir Singh
2010-09-23 14:01 ` [RFC][PATCH 03/10] taskstats: Split fill_pid function Michael Holzheu
2010-09-23 17:33   ` Oleg Nesterov
2010-09-27  9:33   ` Balbir Singh
2010-10-11  8:31   ` Balbir Singh
2010-09-23 14:01 ` [RFC][PATCH 04/10] taskstats: Add new taskstats command TASKSTATS_CMD_ATTR_PIDS Michael Holzheu
2010-09-23 14:01 ` [RFC][PATCH 05/10] taskstats: Add "/proc/taskstats" Michael Holzheu
2010-09-23 14:01 ` [RFC][PATCH 06/10] taskstats: Add thread group ID to taskstats structure Michael Holzheu
2010-09-23 14:01 ` [RFC][PATCH 07/10] taskstats: Add per task steal time accounting Michael Holzheu
2010-09-23 14:02 ` [RFC][PATCH 08/10] taskstats: Add cumulative CPU time (user, system and steal) Michael Holzheu
2010-09-23 14:02 ` [RFC][PATCH 09/10] taskstats: Fix exit CPU time accounting Michael Holzheu
2010-09-23 17:10   ` Oleg Nesterov
2010-09-24 12:18     ` Michael Holzheu
2010-09-26 18:11       ` Oleg Nesterov
2010-09-27 13:23         ` Michael Holzheu
2010-09-27 13:42         ` Martin Schwidefsky
2010-09-27 16:51           ` Oleg Nesterov
2010-09-28  7:09             ` Martin Schwidefsky
2010-09-29 19:19             ` Roland McGrath
2010-09-30 13:47               ` Michael Holzheu
2010-10-05  8:57                 ` Roland McGrath
2010-10-06  9:29                   ` Michael Holzheu
2010-10-06 15:26                     ` Oleg Nesterov
2010-10-07 15:06                       ` Michael Holzheu
2010-10-11 12:37                         ` Oleg Nesterov
2010-10-12 13:10                           ` Michael Holzheu
2010-10-14 13:47                             ` Oleg Nesterov [this message]
2010-10-15 14:34                               ` Michael Holzheu
2010-10-19 14:17                                 ` Oleg Nesterov
2010-10-22 16:53                                   ` Michael Holzheu
2010-09-28  8:36           ` Balbir Singh
2010-09-28  9:08             ` Martin Schwidefsky
2010-09-28  9:23               ` Balbir Singh
2010-09-28 10:36                 ` Martin Schwidefsky
2010-09-28 10:39                   ` Balbir Singh
2010-09-28  8:21   ` Balbir Singh
2010-09-28 16:50     ` Michael Holzheu
2010-09-23 14:04 ` [RFC][PATCH 10/10] taststats: User space with ptop tool Michael Holzheu
2010-09-23 20:11 ` [RFC][PATCH 00/10] taskstats: Enhancements for precise accounting Andrew Morton
2010-09-23 20:11   ` Andrew Morton
     [not found]   ` <20100923131136.356075f4.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-09-23 22:11     ` Matt Helsley
2010-09-24  9:10     ` Michael Holzheu
2010-09-23 22:11   ` Matt Helsley
     [not found]     ` <20100923221139.GI23839-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-09-24 12:39       ` Michael Holzheu
2010-09-25 18:19       ` Serge E. Hallyn
2010-09-25 18:19         ` Serge E. Hallyn
2010-09-24 12:39     ` Michael Holzheu
2010-09-24  9:10   ` Michael Holzheu
2010-09-24 18:50     ` Andrew Morton
2010-09-24 18:50       ` Andrew Morton
2010-09-27  9:18       ` Michael Holzheu
2010-09-27 20:02         ` Andrew Morton
     [not found]           ` <20100927130256.5d9a3db8.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-09-28  8:17             ` Balbir Singh
2010-09-28  8:17           ` Balbir Singh
2010-09-27 20:02         ` Andrew Morton
     [not found]       ` <20100924115002.fcb4385a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-09-27  9:18         ` Michael Holzheu
2010-09-27 10:49     ` Balbir Singh
2010-09-27 10:49     ` Balbir Singh
2010-09-24  9:16 ` Balbir Singh
2010-09-30  8:38 ` Andi Kleen
2010-09-30 13:56   ` Michael Holzheu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101014134716.GA5187@redhat.com \
    --to=oleg@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=holzheu@linux.vnet.ibm.com \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nagar1234@in.ibm.com \
    --cc=roland@redhat.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=venki@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.