public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: akpm@osdl.org, greg@kroah.com, linux-kernel@vger.kernel.org,
	elsa-devel@lists.sourceforge.net, gh@us.ibm.com,
	efocht@hpce.nec.com, jlan@engr.sgi.com
Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
Date: Mon, 21 Feb 2005 03:58:08 -0800	[thread overview]
Message-ID: <20050221035808.48401cd2.pj@sgi.com> (raw)
In-Reply-To: <1108981982.8418.120.camel@frecb000711.frec.bull.fr>

Thank-you for your quick answer.

Guillaume wrote:
>
> If a process belongs to several group of processes, an new integer in
> the task_struct is not enough, you need a list or something like this.
> If you're using a list you need to add function to manage this list in
> the kernel but we don't want to add this kind of management inside the
> kernel because with the fork connector we can keep it outside.

Ok - fork connect.  From your patch of a couple days ago, for the
benefit of lurkers:
> 
>     It's a new patch that implements a fork connector in the
> kernel/fork.c:do_fork() routine. The connector sends information about
> parent PID and child PID over a netlink interface. It allows to several
> user space applications to be alerted when a fork occurs in the kernel.

Whoaa ... you're saying that because you might have several groups a
task could belong to at once, you'll use netlink to avoid managing lists
in the kernel.  Seems that you're spending thousands of instructions to
save dozens.  This is not a good trade off.

I can imagine several way cheaper ways to handle this.

If the number of groups to which a task could belong has some small
finite upper limit, like at most 5 groups, you could have 5 integer id's
in the task struct instead of 1.  If the number of elements in a
particular group has a small upper bound, you could even replace the
ints with bit fields.

Or you could enumerate the different combinations of groups to which a
task might belong, assign each such combination a unique integer, and
keep that integer in the task struct.  The enumeration could be done
dynamically, only counting the particular combinations of group
memberships that actually had use.  This has the disadvantage that a
particular combination, once enumerated, would have to stay around until
the next boot - a potential memory leak.  Probably not acceptable,
unless the cost of storing a no longer used combination is nearly zero.

Or you could have a little 'jobids' struct that held a list and a
reference counter, where the list held a particular combination of ids,
and the reference counter tracked how many tasks referenced that jobids
struct. Put a single pointer in the task struct to a jobids struct, and
increment and decrement the reference counter in the jobids struct on
fork and exit.  Free it if the count goes to zero on exit.  This solves
the memory leak of the previous, with increased cost to the fork.  Since
we really do design these systems to stay up 'forever', this is perhaps
the winner.  Any time a particular task is added to, or removed from, a
group, if the ref count of its jobids struct is one, then modify the id
list attached to that jobids struct in place.  If the ref count is more
than one, copy the jobids struct and list to a new one, decrement the
count in the old one, and modify the new one in place.  Such list and
counter manipulations are the daily stuff of kernel code.  No need to
avoid such.

Just because you have more than one id doesn't mean each task has to be
connected directly into its own custom list, and even if you needed
that, I don't see that it's a win to avoid such a list by using netlink.

It can be a worthwhile exercise to single step through each machine
instruction that you add to fork, in the forking task or any other task
that is sent data or a signal therefrom.  You really do want to keep the
number of added instructions (and number of additional cache lines and
memory pages accessed, especially written) to a minimum.  If the effort
of single stepping through such would require the patience of
Copernicus, then it's back to the drawing board for a more efficient
solution.

> I don't know if there is some work around 1) and 4). 

Well, you might have dodged the (1) bullet up until now by using netlink
and not extending the accounting record at exit.  Bullet (1) was
extending the accounting record past its fairly constrained size, if
that's still a problem; it's been years since I looked.  But if you
adapt one of the above suggestions, and don't send anything out of the
task context at fork, then you will have to deal with (1) in order to
include the list of job id's in the record written at exit.

If you want to collect any other data, bullet (3), you will also to
solve bullet (1).

Item (4), collecting accounting data for long running tasks, is probably
less pressing.  Its solution will also likely require solving (1),
however.

Taking a quick look at init/Kconfig and include/linux/acct.h, it seems
we are using BSD_PROCESS_ACCT_V3 format, which is the latest 64 byte
format, allowing for larger uid/gid.

With slight variations, this 64 byte format has lasted about 25 years.
It's time to replace it, especially if you have designs on collecting
any additional information, which you clearly do.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.650.933.1373, 1.925.600.0401

  reply	other threads:[~2005-02-21 12:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-17 14:55 [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector Guillaume Thouvenin
2005-02-17 15:50 ` Evgeniy Polyakov
2005-02-21  7:07   ` Guillaume Thouvenin
2005-02-21  8:41     ` Evgeniy Polyakov
2005-02-21  9:47     ` Paul Jackson
2005-02-21 10:33       ` Guillaume Thouvenin
2005-02-21 11:58         ` Paul Jackson [this message]
2005-02-21 14:43           ` Guillaume Thouvenin
2005-02-21 16:55             ` Erich Focht
2005-02-21 17:54             ` Paul Jackson
2005-02-21  8:05   ` [Elsa-devel] " Guillaume Thouvenin
2005-02-21  8:48     ` Evgeniy Polyakov
     [not found] <1108649153.8379.137.camel@frecb000711.frec.bull.fr>
2005-02-23  8:52 ` Guillaume Thouvenin
2005-02-23  9:07   ` Andrew Morton
2005-02-23 11:08     ` Evgeniy Polyakov
2005-02-23 10:58       ` Andrew Morton
2005-02-23 11:41         ` Evgeniy Polyakov
2005-02-24  6:41           ` Guillaume Thouvenin
2005-02-24  9:17             ` Evgeniy Polyakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050221035808.48401cd2.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=akpm@osdl.org \
    --cc=efocht@hpce.nec.com \
    --cc=elsa-devel@lists.sourceforge.net \
    --cc=gh@us.ibm.com \
    --cc=greg@kroah.com \
    --cc=guillaume.thouvenin@bull.net \
    --cc=jlan@engr.sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox