[PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
@ 2005-02-17 14:55 Guillaume Thouvenin
  2005-02-17 15:50 ` Evgeniy Polyakov
  0 siblings, 1 reply; 12+ messages in thread
From: Guillaume Thouvenin @ 2005-02-17 14:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg KH, lkml, Evgeniy Polyakov, elsa-devel, Gerrit Huizenga,
	Erich Focht

Hello,

    It's a new patch that implements a fork connector in the
kernel/fork.c:do_fork() routine. The connector sends information about
parent PID and child PID over a netlink interface. It allows to several
user space applications to be alerted when a fork occurs in the kernel.
The main drawback is that even if nobody listens, a message is send. I
don't know how to avoid that. I added an option (FORK_CONNECTOR) to
enable the fork connector (or disable) when compiling the kernel. To
work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It
has been tested on a 2.6.11-rc3-mm2 kernel with two user space
applications connected. 

    It is used by ELSA to manage group of processes in user space. In
conjunction with a per-process accounting information, like BSD or CSA,
ELSA provides a per-group of processes accounting.


    Every comments are welcome,

Guillaume

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
--- 

 drivers/connector/Kconfig |   10 ++++++++++
 include/linux/connector.h |    2 ++
 kernel/fork.c             |   41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+)

diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/drivers/connector/Kconfig linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig
--- linux-2.6.11-rc3-mm2/drivers/connector/Kconfig	2005-02-11 11:00:16.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig	2005-02-17 15:48:41.000000000 +0100
@@ -10,4 +10,14 @@ config CONNECTOR
 	  Connector support can also be built as a module.  If so, the module
 	  will be called cn.ko.
 
+config FORK_CONNECTOR
+	bool "Enable fork connector"
+	depends on CONNECTOR
+	---help---
+	  It adds a connector in kernel/fork.c:do_fork() function. When a fork
+	  occurs, netlink is used to transfer information about the parent and 
+	  its child. This information can be used by a user space application. 
+	  
+	  Note: it only works if connector is built in the kernel.
+	  
 endmenu
diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/include/linux/connector.h linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h
--- linux-2.6.11-rc3-mm2/include/linux/connector.h	2005-02-11 11:00:18.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h	2005-02-16 15:07:46.000000000 +0100
@@ -28,6 +28,8 @@
 #define CN_VAL_KOBJECT_UEVENT		0x0000
 #define CN_IDX_SUPERIO			0xaabb  /* SuperIO subsystem */
 #define CN_VAL_SUPERIO			0xccdd
+#define CN_IDX_FORK			0xfeed  /* fork events */
+#define CN_VAL_FORK			0xbeef
 
 
 #define CONNECTOR_MAX_MSG_SIZE 	1024
diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/kernel/fork.c linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c
--- linux-2.6.11-rc3-mm2/kernel/fork.c	2005-02-11 11:00:18.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c	2005-02-17 13:43:48.000000000 +0100
@@ -41,6 +41,7 @@
 #include <linux/profile.h>
 #include <linux/rmap.h>
 #include <linux/acct.h>
+#include <linux/connector.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -63,6 +64,44 @@ DEFINE_PER_CPU(unsigned long, process_co
 
 EXPORT_SYMBOL(tasklist_lock);
 
+#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR)
+#define FORK_CN_INFO_SIZE	64 
+static inline void fork_connector(pid_t parent, pid_t child)
+{
+	struct cb_id fork_id = {CN_IDX_FORK, CN_VAL_FORK};
+	static __u32 seq; /* used to test if we lost message */
+	
+	if (cn_already_initialized) {
+		struct cn_msg *msg;
+		size_t size;
+
+		size = sizeof(*msg) + FORK_CN_INFO_SIZE;
+		msg = kmalloc(size, GFP_KERNEL);
+		if (msg) {
+			memset(msg, '\0', size);
+			memcpy(&msg->id, &fork_id, sizeof(msg->id));
+			msg->seq = seq++;
+			msg->ack = 0; /* not used */
+			/* 
+			 * size of data is the number of characters 
+			 * printed plus one for the trailing '\0'
+			 */
+			msg->len = snprintf(msg->data, FORK_CN_INFO_SIZE-1, 
+					    "%i %i", parent, child) + 1;
+
+			cn_netlink_send(msg, 1);
+
+			kfree(msg);
+		}
+	}
+}
+#else
+static inline void fork_connector(pid_t parent, pid_t child) 
+{
+	return; 
+}
+#endif
+
 int nr_processes(void)
 {
 	int cpu;
@@ -1238,6 +1277,8 @@ long do_fork(unsigned long clone_flags,
 			if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE))
 				ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);
 		}
+
+		fork_connector(current->pid, p->pid);
 	} else {
 		free_pidmap(pid);
 		pid = PTR_ERR(p);



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
  2005-02-17 14:55 [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector Guillaume Thouvenin
@ 2005-02-17 15:50 ` Evgeniy Polyakov
  2005-02-21  7:07   ` Guillaume Thouvenin
  2005-02-21  8:05   ` [Elsa-devel] " Guillaume Thouvenin
  0 siblings, 2 replies; 12+ messages in thread
From: Evgeniy Polyakov @ 2005-02-17 15:50 UTC (permalink / raw)
  To: Guillaume Thouvenin
  Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga,
	Erich Focht

[-- Attachment #1: Type: text/plain, Size: 5955 bytes --]

On Thu, 2005-02-17 at 15:55 +0100, Guillaume Thouvenin wrote:
> Hello,

Hello, I have small note about connector's usage in your module
in particular and others in general.

>     It's a new patch that implements a fork connector in the
> kernel/fork.c:do_fork() routine. The connector sends information about
> parent PID and child PID over a netlink interface. It allows to several
> user space applications to be alerted when a fork occurs in the kernel.
> The main drawback is that even if nobody listens, a message is send. I
> don't know how to avoid that. I added an option (FORK_CONNECTOR) to
> enable the fork connector (or disable) when compiling the kernel. To
> work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It
> has been tested on a 2.6.11-rc3-mm2 kernel with two user space
> applications connected. 
> 
>     It is used by ELSA to manage group of processes in user space. In
> conjunction with a per-process accounting information, like BSD or CSA,
> ELSA provides a per-group of processes accounting.

I think people will complain here...
From rom one point of view it is step to the chaotic microkernel message
flows, 
from the other side why only fork is monitored in this way?
I still think that lsm with all calls logging is the best way to
achieve 
this goal.

>     Every comments are welcome,
> 
> Guillaume
> 
> Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
> --- 
> 
>  drivers/connector/Kconfig |   10 ++++++++++
>  include/linux/connector.h |    2 ++
>  kernel/fork.c             |   41 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 53 insertions(+)
> 
> diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/drivers/connector/Kconfig linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig
> --- linux-2.6.11-rc3-mm2/drivers/connector/Kconfig	2005-02-11 11:00:16.000000000 +0100
> +++ linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig	2005-02-17 15:48:41.000000000 +0100
> @@ -10,4 +10,14 @@ config CONNECTOR
>  	  Connector support can also be built as a module.  If so, the module
>  	  will be called cn.ko.
>  
> +config FORK_CONNECTOR
> +	bool "Enable fork connector"
> +	depends on CONNECTOR
> +	---help---
> +	  It adds a connector in kernel/fork.c:do_fork() function. When a fork
> +	  occurs, netlink is used to transfer information about the parent and 
> +	  its child. This information can be used by a user space application. 
> +	  
> +	  Note: it only works if connector is built in the kernel.
> +	  
>  endmenu
> diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/include/linux/connector.h linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h
> --- linux-2.6.11-rc3-mm2/include/linux/connector.h	2005-02-11 11:00:18.000000000 +0100
> +++ linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h	2005-02-16 15:07:46.000000000 +0100
> @@ -28,6 +28,8 @@
>  #define CN_VAL_KOBJECT_UEVENT		0x0000
>  #define CN_IDX_SUPERIO			0xaabb  /* SuperIO subsystem */
>  #define CN_VAL_SUPERIO			0xccdd
> +#define CN_IDX_FORK			0xfeed  /* fork events */
> +#define CN_VAL_FORK			0xbeef
>  
> 
>  #define CONNECTOR_MAX_MSG_SIZE 	1024
> diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/kernel/fork.c linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c
> --- linux-2.6.11-rc3-mm2/kernel/fork.c	2005-02-11 11:00:18.000000000 +0100
> +++ linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c	2005-02-17 13:43:48.000000000 +0100
> @@ -41,6 +41,7 @@
>  #include <linux/profile.h>
>  #include <linux/rmap.h>
>  #include <linux/acct.h>
> +#include <linux/connector.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/pgalloc.h>
> @@ -63,6 +64,44 @@ DEFINE_PER_CPU(unsigned long, process_co
>  
>  EXPORT_SYMBOL(tasklist_lock);
>  
> +#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR)

I suspect CONFIG_FORK_CONNECTOR is enough.

> +#define FORK_CN_INFO_SIZE	64 
> +static inline void fork_connector(pid_t parent, pid_t child)
> +{
> +	struct cb_id fork_id = {CN_IDX_FORK, CN_VAL_FORK};
> +	static __u32 seq; /* used to test if we lost message */
> +	
> +	if (cn_already_initialized) {
> +		struct cn_msg *msg;
> +		size_t size;
> +
> +		size = sizeof(*msg) + FORK_CN_INFO_SIZE;
> +		msg = kmalloc(size, GFP_KERNEL);
> +		if (msg) {
> +			memset(msg, '\0', size);
> +			memcpy(&msg->id, &fork_id, sizeof(msg->id));
> +			msg->seq = seq++;
> +			msg->ack = 0; /* not used */
> +			/* 
> +			 * size of data is the number of characters 
> +			 * printed plus one for the trailing '\0'
> +			 */
> +			msg->len = snprintf(msg->data, FORK_CN_INFO_SIZE-1, 
> +					    "%i %i", parent, child) + 1;
> +
> +			cn_netlink_send(msg, 1);

"1" here means that this message will be delivered to any group
which has it's first bit set(1, 3, and so on) in given socket queue.
I suspect it is not what you want.
By design connector's users should send messages to the group it was
registered with
(which is obtained from idx field of the struct cb_id), in your case it
is CN_IDX_FORK,
and connector userspace consumers should bind to the same group (idx
value).
It is of course not requirement, but a fair path(hmm, I can add more
strict checks into connector).
By setting 0 as second parameter for cn_netlink_send() you will force
connector's core
to select proper group for you.

> +			kfree(msg);
> +		}
> +	}
> +}
> +#else
> +static inline void fork_connector(pid_t parent, pid_t child) 
> +{
> +	return; 
> +}
> +#endif
> +
>  int nr_processes(void)
>  {
>  	int cpu;
> @@ -1238,6 +1277,8 @@ long do_fork(unsigned long clone_flags,
>  			if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE))
>  				ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);
>  		}
> +
> +		fork_connector(current->pid, p->pid);
>  	} else {
>  		free_pidmap(pid);
>  		pid = PTR_ERR(p);
> 
-- 
        Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
  2005-02-17 15:50 ` Evgeniy Polyakov
@ 2005-02-21  7:07   ` Guillaume Thouvenin
  2005-02-21  8:41     ` Evgeniy Polyakov
  2005-02-21  9:47     ` Paul Jackson
  2005-02-21  8:05   ` [Elsa-devel] " Guillaume Thouvenin
  1 sibling, 2 replies; 12+ messages in thread
From: Guillaume Thouvenin @ 2005-02-21  7:07 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga,
	Erich Focht

On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote:
> On Thu, 2005-02-17 at 15:55 +0100, Guillaume Thouvenin wrote:
> >     It's a new patch that implements a fork connector in the
> > kernel/fork.c:do_fork() routine. The connector sends information about
> > parent PID and child PID over a netlink interface. It allows to several
> > user space applications to be alerted when a fork occurs in the kernel.
> > The main drawback is that even if nobody listens, a message is send. I
> > don't know how to avoid that. I added an option (FORK_CONNECTOR) to
> > enable the fork connector (or disable) when compiling the kernel. To
> > work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It
> > has been tested on a 2.6.11-rc3-mm2 kernel with two user space
> > applications connected. 
> > 
> >     It is used by ELSA to manage group of processes in user space. In
> > conjunction with a per-process accounting information, like BSD or CSA,
> > ELSA provides a per-group of processes accounting.
> 
> I think people will complain here...
> ... [cut here] ...
> I still think that lsm with all calls logging is the best way to
> achieve this goal.

I agree with you. My first implementation was with LSM but Chris Wright
(I think it was him) notice that it's not the right framework (and it
seems true). So I looked for another solution. I though about kobject
but it was too "big" and finally, Greg KH spoke about connectors. It's
small and efficient.
 
> from the other side why only fork is monitored in this way?

The problem is the following: I have a user space daemon that manages
group of processes. The main idea is, if a parent belongs to a group
then its child belongs to the same group. To achieve this I need to know
when a fork occurs and which processes are involved. I don't see how to
do this without a hook in the do_fork() routine... Any ideas are
welcome.

Thank you Evgeniy for all your comments about the code, it helps and I
will modify the patch.

Regards,
Guillaume


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
  2005-02-21  7:07   ` Guillaume Thouvenin
@ 2005-02-21  8:41     ` Evgeniy Polyakov
  2005-02-21  9:47     ` Paul Jackson
  1 sibling, 0 replies; 12+ messages in thread
From: Evgeniy Polyakov @ 2005-02-21  8:41 UTC (permalink / raw)
  To: Guillaume Thouvenin
  Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga,
	Erich Focht

[-- Attachment #1: Type: text/plain, Size: 3032 bytes --]

On Mon, 2005-02-21 at 08:07 +0100, Guillaume Thouvenin wrote:
> On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote:
> > On Thu, 2005-02-17 at 15:55 +0100, Guillaume Thouvenin wrote:
> > >     It's a new patch that implements a fork connector in the
> > > kernel/fork.c:do_fork() routine. The connector sends information about
> > > parent PID and child PID over a netlink interface. It allows to several
> > > user space applications to be alerted when a fork occurs in the kernel.
> > > The main drawback is that even if nobody listens, a message is send. I
> > > don't know how to avoid that. I added an option (FORK_CONNECTOR) to
> > > enable the fork connector (or disable) when compiling the kernel. To
> > > work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It
> > > has been tested on a 2.6.11-rc3-mm2 kernel with two user space
> > > applications connected. 
> > > 
> > >     It is used by ELSA to manage group of processes in user space. In
> > > conjunction with a per-process accounting information, like BSD or CSA,
> > > ELSA provides a per-group of processes accounting.
> > 
> > I think people will complain here...
> > ... [cut here] ...
> > I still think that lsm with all calls logging is the best way to
> > achieve this goal.
> 
> I agree with you. My first implementation was with LSM but Chris Wright
> (I think it was him) notice that it's not the right framework (and it
> seems true). So I looked for another solution. I though about kobject
> but it was too "big" and finally, Greg KH spoke about connectors. It's
> small and efficient.

Your do_fork() change really looks like either audit addon(but it is
really
not the case) or LSM logging facility.
I think adding cn_netlink_send() in every function in security/dummy.c
and renaming it to security/cn_logger.c or something is not such a bad
idea...
Or even wait in each function until userspace replies with the decision
to
allow or not such call.
Although it can create a lock (need to recheck security hooks in
send/recv pathes).

> > from the other side why only fork is monitored in this way?
> 
> The problem is the following: I have a user space daemon that manages
> group of processes. The main idea is, if a parent belongs to a group
> then its child belongs to the same group. To achieve this I need to know
> when a fork occurs and which processes are involved. I don't see how to
> do this without a hook in the do_fork() routine... Any ideas are
> welcome.

Now I begin to understand Chris Wright - LSM are designed not for
monitoring, 
but only for initialisation path - i.e. LSM will say only if something
is allowed or not,
but not if it was performed.

So, for exactly your setup there is no any other way then to patch
do_fork().

> Thank you Evgeniy for all your comments about the code, it helps and I
> will modify the patch.
> 
> Regards,
> Guillaume
-- 
        Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
  2005-02-21  7:07   ` Guillaume Thouvenin
  2005-02-21  8:41     ` Evgeniy Polyakov
@ 2005-02-21  9:47     ` Paul Jackson
  2005-02-21 10:33       ` Guillaume Thouvenin
  1 sibling, 1 reply; 12+ messages in thread
From: Paul Jackson @ 2005-02-21  9:47 UTC (permalink / raw)
  To: Guillaume Thouvenin
  Cc: johnpol, akpm, greg, linux-kernel, elsa-devel, gh, efocht

Guillaume wrote:
> The problem is the following: I have a user space daemon that manages
> group of processes. The main idea is, if a parent belongs to a group
> then its child belongs to the same group.  To achieve this I need to know
> when a fork occurs and which processes are involved. I don't see how to
> do this without a hook in the do_fork() routine...

How is what you need, for process grouping, any more complex than
another sort of {bank, job, aggregate, session, group, ...} integer id
field in the task struct, that is copied on fork, and can be queried and
manipulated from user space, in accordance with whatever rules you
implement?

When I look at the elsacct_process_copy() routine, which is called from
fork, in your patch-2.6.8.1-elsa, I'm not sure what it does, but it sure
looks like it could cause scaling and performance problems.  Linux works
really hard to keep fork costs low.  Copying another integer field, as
part of the block copy of the task struct at fork, sure would be cheaper
than this.  Not only does this hook look too expensive, I don't even see
the need for any such explicit code hook in fork for accounting at all.

Does your user space daemon require to know about each task as it is
forked, in near real time? Is it trying to do something with this
accounting information while the tasks being accounted for are
necessarily still alive?  The classic accounting that I am familiar
with, from years ago, only did post-mortem analysis.  So long as enough
entrails were left around so that it could piece together the story, it
didn't require any immediate notice of anything.  You need to clear a
couple of accounting accumulators directly in the task struct at fork,
and write a record to a specified open file at exit.  That's about it.

The main problems I was aware of with that classic accounting (which
is probably what is now known as BSD accounting) are:
  1) The fixed length accounting record didn't allow for added or
     longer fields.  A little bit more flexible and extensible format
     is desired, but some effort should be made to keep the format
     still reasonably tight and compressed.  No full spec XML.
     The format should allow for some form of resyncronization after
     a chunk of data is lost.
  2) An additional bank/job/aggregate/session/group/... id seems desired.
     I have yet to understand why this need be anything fancier than
     another integer field in the task struct.
  3) Probably some more data items are worth collecting -- which could
     be placed in the outgoing compressed data stream, along with the
     existing records written on task exit.  Over time, appropriate
     hooks should be proposed to collect such data as seems needed.
  4) The current mechanism of collecting per-task data only on exit
     makes it difficult to account for long running jobs.  Perhaps we
     could use a leisurely background task that slowly scans the tasks
     looking for those that have been present since the last scan, and
     causes an intermediate accounting record to be written for them.

What other essential deficiencies are there that you need to address?

I don't see any need for explicit hooks in fork to resolve the above
deficiencies.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
  2005-02-21  9:47     ` Paul Jackson
@ 2005-02-21 10:33       ` Guillaume Thouvenin
  2005-02-21 11:58         ` Paul Jackson
  0 siblings, 1 reply; 12+ messages in thread
From: Guillaume Thouvenin @ 2005-02-21 10:33 UTC (permalink / raw)
  To: Paul Jackson
  Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga,
	Erich Focht, Jay Lan

On Mon, 2005-02-21 at 01:47 -0800, Paul Jackson wrote:
> Guillaume wrote:
> > The problem is the following: I have a user space daemon that manages
> > group of processes. The main idea is, if a parent belongs to a group
> > then its child belongs to the same group.  To achieve this I need to know
> > when a fork occurs and which processes are involved. I don't see how to
> > do this without a hook in the do_fork() routine...
> 
> How is what you need, for process grouping, any more complex than
> another sort of {bank, job, aggregate, session, group, ...} integer id
> field in the task struct, that is copied on fork, and can be queried and
> manipulated from user space, in accordance with whatever rules you
> implement?

If a process belongs to several group of processes, an new integer in
the task_struct is not enough, you need a list or something like this.
If you're using a list you need to add function to manage this list in
the kernel but we don't want to add this kind of management inside the
kernel because with the fork connector we can keep it outside.

> When I look at the elsacct_process_copy() routine, which is called from
> fork, in your patch-2.6.8.1-elsa, I'm not sure what it does, but it sure
> looks like it could cause scaling and performance problems.  

This patch is an old one with many kernel modifications that impacts the
Linux performance. That's why we thought about another solution where
all management is done by a user space daemon. Currently we're using the
fork connector.

> Does your user space daemon require to know about each task as it is
> forked, in near real time? Is it trying to do something with this
> accounting information while the tasks being accounted for are
> necessarily still alive?  

I don't need real time. I just need to know which process forks during
the accounting period. The user space daemon provided by ELSA just keeps
a trace of parents and its children during a given period (generally
it's the accounting period). The analysis will be done later by another
application (also provided by ELSA) by using the trace of parents and
children plus the accounting trace. 

> The main problems I was aware of with that classic accounting (which
> is probably what is now known as BSD accounting) are:
> ... [cut here] ...
>   2) An additional bank/job/aggregate/session/group/... id seems desired.
>      I have yet to understand why this need be anything fancier than
>      another integer field in the task struct.

We're trying to solve this one. I think that I answer to the integer
field problem. For the necessity of the per-group accounting, it can be
interesting to do accounting on a specific "task". For example if you
want to have accounting data for a compilation you can add the
corresponding shell in a group of processes and commands involved in the
compilation like gcc, cc, as, collect2, ... will be automatically added
in the same group and you will be able to get statistics about this
compilation.

>   3) Probably some more data items are worth collecting -- which could
>      be placed in the outgoing compressed data stream, along with the
>      existing records written on task exit.  Over time, appropriate
>      hooks should be proposed to collect such data as seems needed.

There is a discussion about this with Jay Lan to merge the CSA and BSD
accounting framework. 

I don't know if there is some work around 1) and 4). 

Regards,
Guillaume

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
  2005-02-21 10:33       ` Guillaume Thouvenin
@ 2005-02-21 11:58         ` Paul Jackson
  2005-02-21 14:43           ` Guillaume Thouvenin
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Jackson @ 2005-02-21 11:58 UTC (permalink / raw)
  To: Guillaume Thouvenin
  Cc: akpm, greg, linux-kernel, elsa-devel, gh, efocht, jlan

Thank-you for your quick answer.

Guillaume wrote:
>
> If a process belongs to several group of processes, an new integer in
> the task_struct is not enough, you need a list or something like this.
> If you're using a list you need to add function to manage this list in
> the kernel but we don't want to add this kind of management inside the
> kernel because with the fork connector we can keep it outside.

Ok - fork connect.  From your patch of a couple days ago, for the
benefit of lurkers:
> 
>     It's a new patch that implements a fork connector in the
> kernel/fork.c:do_fork() routine. The connector sends information about
> parent PID and child PID over a netlink interface. It allows to several
> user space applications to be alerted when a fork occurs in the kernel.

Whoaa ... you're saying that because you might have several groups a
task could belong to at once, you'll use netlink to avoid managing lists
in the kernel.  Seems that you're spending thousands of instructions to
save dozens.  This is not a good trade off.

I can imagine several way cheaper ways to handle this.

If the number of groups to which a task could belong has some small
finite upper limit, like at most 5 groups, you could have 5 integer id's
in the task struct instead of 1.  If the number of elements in a
particular group has a small upper bound, you could even replace the
ints with bit fields.

Or you could enumerate the different combinations of groups to which a
task might belong, assign each such combination a unique integer, and
keep that integer in the task struct.  The enumeration could be done
dynamically, only counting the particular combinations of group
memberships that actually had use.  This has the disadvantage that a
particular combination, once enumerated, would have to stay around until
the next boot - a potential memory leak.  Probably not acceptable,
unless the cost of storing a no longer used combination is nearly zero.

Or you could have a little 'jobids' struct that held a list and a
reference counter, where the list held a particular combination of ids,
and the reference counter tracked how many tasks referenced that jobids
struct. Put a single pointer in the task struct to a jobids struct, and
increment and decrement the reference counter in the jobids struct on
fork and exit.  Free it if the count goes to zero on exit.  This solves
the memory leak of the previous, with increased cost to the fork.  Since
we really do design these systems to stay up 'forever', this is perhaps
the winner.  Any time a particular task is added to, or removed from, a
group, if the ref count of its jobids struct is one, then modify the id
list attached to that jobids struct in place.  If the ref count is more
than one, copy the jobids struct and list to a new one, decrement the
count in the old one, and modify the new one in place.  Such list and
counter manipulations are the daily stuff of kernel code.  No need to
avoid such.

Just because you have more than one id doesn't mean each task has to be
connected directly into its own custom list, and even if you needed
that, I don't see that it's a win to avoid such a list by using netlink.

It can be a worthwhile exercise to single step through each machine
instruction that you add to fork, in the forking task or any other task
that is sent data or a signal therefrom.  You really do want to keep the
number of added instructions (and number of additional cache lines and
memory pages accessed, especially written) to a minimum.  If the effort
of single stepping through such would require the patience of
Copernicus, then it's back to the drawing board for a more efficient
solution.

> I don't know if there is some work around 1) and 4). 

Well, you might have dodged the (1) bullet up until now by using netlink
and not extending the accounting record at exit.  Bullet (1) was
extending the accounting record past its fairly constrained size, if
that's still a problem; it's been years since I looked.  But if you
adapt one of the above suggestions, and don't send anything out of the
task context at fork, then you will have to deal with (1) in order to
include the list of job id's in the record written at exit.

If you want to collect any other data, bullet (3), you will also to
solve bullet (1).

Item (4), collecting accounting data for long running tasks, is probably
less pressing.  Its solution will also likely require solving (1),
however.

Taking a quick look at init/Kconfig and include/linux/acct.h, it seems
we are using BSD_PROCESS_ACCT_V3 format, which is the latest 64 byte
format, allowing for larger uid/gid.

With slight variations, this 64 byte format has lasted about 25 years.
It's time to replace it, especially if you have designs on collecting
any additional information, which you clearly do.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
  2005-02-21 11:58         ` Paul Jackson
@ 2005-02-21 14:43           ` Guillaume Thouvenin
  2005-02-21 16:55             ` Erich Focht
  2005-02-21 17:54             ` Paul Jackson
  0 siblings, 2 replies; 12+ messages in thread
From: Guillaume Thouvenin @ 2005-02-21 14:43 UTC (permalink / raw)
  To: Paul Jackson
  Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga,
	Erich Focht, Jay Lan

On Mon, 2005-02-21 at 03:58 -0800, Paul Jackson wrote: 
> >     It's a new patch that implements a fork connector in the
> > kernel/fork.c:do_fork() routine. The connector sends information about
> > parent PID and child PID over a netlink interface. It allows to several
> > user space applications to be alerted when a fork occurs in the kernel.
> 
> Whoaa ... you're saying that because you might have several groups a
> task could belong to at once, you'll use netlink to avoid managing lists
> in the kernel.  Seems that you're spending thousands of instructions to
> save dozens.  This is not a good trade off.

  I understand your point of view but I'm using netlink interface
because it's already in the kernel so my choice is to use something that
is already in the kernel instead of adding dozens of new instructions
and also to do things in user space. The fork connector is here to move
the management in the user space. Otherwise there is PAGG that manages
group of processes in the kernel. To test performances, I tried to
compile a kernel several times with and without the fork connector and
here are the resource usage computed with the following command:
        
time /bin/sh -c 'make O=/home/guill/build/k2610 oldconfig   \
                 && make O=/home/guill/build/k2610 bzImage  \
                 && make O=/home/guill/build/k2610 modules'

between each test, the directory that contains object files was
destroyed and a 'sync' was done. 

Results are:

  kernel without fork connector
    real : 8m17.042s 8m10.113s 8m08.597s 8m10.068s 8m08.930s
    user : 7m32.376s 7m35.985s 7m34.424s 7m34.221s 7m34.835s
    sys  : 0m50.730s 0m51.139s 0m51.159s 0m51.406s 0m51.020s    

  kernel with the fork connector
    real : 8m14.492s 8m08.656s 8m07.754s 8m08.002s 8m07.854s
    user : 7m31.664s 7m33.528s 7m33.625s 7m33.500s 7m33.822s
    sys  : 0m50.651s 0m51.222s 0m51.102s 0m51.367s 0m50.894s
   
  kernel with the fork connector + application listens
    real : 8m08.596s 8m08.950s 8m08.899s 8m08.678s 8m08.987s
    user : 7m33.312s 7m33.898s 7m34.004s 7m33.285s 7m33.628s
    sys  : 0m52.222s 0m52.013s 0m51.809s 0m52.361s 0m52.036s


  I also choose this implementation because Erich Focht wrote in the
email http://lkml.org/lkml/2004/12/17/99 that keeps the historic about
the creation of processes "sounds very useful for a lot of interesting
stuff". So I thought about something that can be used by other
application and with netlink, information is available to everyone. 

> I can imagine several way cheaper ways to handle this.
> 
> If the number of groups to which a task could belong has some small
> finite upper limit, like at most 5 groups, you could have 5 integer id's
> in the task struct instead of 1.  If the number of elements in a
> particular group has a small upper bound, you could even replace the
> ints with bit fields.
> 
> Or you could enumerate the different combinations of groups to which a
> task might belong, assign each such combination a unique integer, and
> keep that integer in the task struct.  The enumeration could be done
> dynamically, only counting the particular combinations of group
> memberships that actually had use.  This has the disadvantage that a
> particular combination, once enumerated, would have to stay around until
> the next boot - a potential memory leak.  Probably not acceptable,
> unless the cost of storing a no longer used combination is nearly zero.

  The problem with those solutions is that we suppose that a process can
belong to a finite number of task. I suppose that it can be true in
practice.

> Or you could have a little 'jobids' struct that held a list and a
> reference counter, where the list held a particular combination of ids,
> and the reference counter tracked how many tasks referenced that jobids
> struct. Put a single pointer in the task struct to a jobids struct, and
> increment and decrement the reference counter in the jobids struct on
> fork and exit.  Free it if the count goes to zero on exit.  This solves
> the memory leak of the previous, with increased cost to the fork.  Since
> we really do design these systems to stay up 'forever', this is perhaps
> the winner.  Any time a particular task is added to, or removed from, a
> group, if the ref count of its jobids struct is one, then modify the id
> list attached to that jobids struct in place.  If the ref count is more
> than one, copy the jobids struct and list to a new one, decrement the
> count in the old one, and modify the new one in place.  Such list and
> counter manipulations are the daily stuff of kernel code.  No need to
> avoid such.

  This solution is interesting. The problem is to know if a fork
connector is useful for some other projects. As I said, one of my goal
was to provide a way to alert user space application when a fork occurs
in the kernel because I think that other applications need this kind of
information. But if it is needed only by ELSA, you're right, a solution
specific to our problem is clearly more efficient.

> Just because you have more than one id doesn't mean each task has to be
> connected directly into its own custom list, and even if you needed
> that, I don't see that it's a win to avoid such a list by using netlink.

  The advantages with the fork connector is that it can be used by other
application and modification in the current kernel is minimal. The main
drawbacks is maybe the performance...

So nobody needs such hook (Erich?)

Thank Paul for your comments,
Guillaume


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
  2005-02-21 14:43           ` Guillaume Thouvenin
@ 2005-02-21 16:55             ` Erich Focht
  2005-02-21 17:54             ` Paul Jackson
  1 sibling, 0 replies; 12+ messages in thread
From: Erich Focht @ 2005-02-21 16:55 UTC (permalink / raw)
  To: Guillaume Thouvenin
  Cc: Paul Jackson, Andrew Morton, Greg KH, lkml, elsa-devel,
	Gerrit Huizenga, Jay Lan

On Monday 21 February 2005 15:43, Guillaume Thouvenin wrote:
> 
>   I also choose this implementation because Erich Focht wrote in the
> email http://lkml.org/lkml/2004/12/17/99 that keeps the historic about
> the creation of processes "sounds very useful for a lot of interesting
> stuff". So I thought about something that can be used by other
> application and with netlink, information is available to everyone. 

Besides accounting I had in mind something like cluster-wide pid
tracking in userspace with builtin relationship information. A bit of
single system image integration... As I don't have it, yet, I'm not
(yet) a very strong requester for the service provided by your
module. But I still think it's usefull and might want later a hook on
exit, too. (And yes, I can imagine of other ways to get the data
effectively out of the kernel, too).

> Results are:
> 
>   kernel without fork connector
>     real : 8m17.042s 8m10.113s 8m08.597s 8m10.068s 8m08.930s
>     user : 7m32.376s 7m35.985s 7m34.424s 7m34.221s 7m34.835s
>     sys  : 0m50.730s 0m51.139s 0m51.159s 0m51.406s 0m51.020s    
> 
>   kernel with the fork connector
>     real : 8m14.492s 8m08.656s 8m07.754s 8m08.002s 8m07.854s
>     user : 7m31.664s 7m33.528s 7m33.625s 7m33.500s 7m33.822s
>     sys  : 0m50.651s 0m51.222s 0m51.102s 0m51.367s 0m50.894s
>    
>   kernel with the fork connector + application listens
>     real : 8m08.596s 8m08.950s 8m08.899s 8m08.678s 8m08.987s
>     user : 7m33.312s 7m33.898s 7m34.004s 7m33.285s 7m33.628s
>     sys  : 0m52.222s 0m52.013s 0m51.809s 0m52.361s 0m52.036s

I liked the previous lean implementation more, but the performance
of this one doesn't look at all as scary as I thought.

Best regards,
Erich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
  2005-02-21 14:43           ` Guillaume Thouvenin
  2005-02-21 16:55             ` Erich Focht
@ 2005-02-21 17:54             ` Paul Jackson
  1 sibling, 0 replies; 12+ messages in thread
From: Paul Jackson @ 2005-02-21 17:54 UTC (permalink / raw)
  To: Guillaume Thouvenin
  Cc: akpm, greg, linux-kernel, elsa-devel, gh, efocht, jlan

Guillaume wrote:
> 
> I understand your point of view but I'm using netlink interface
> because it's already in the kernel so my choice is to use something that
> is already in the kernel instead of adding dozens of new instructions
> and also to do things in user space.

All else equal, yes it is good to use what facilities are
already available, and it is good to do things in user space.

If one adds many cpu cycles and quite a few pages of memory
to read and touch to every fork, then all else is not equal.

> To test performances, I tried to
> compile a kernel several times with and without the fork connector

I agree that a kernel compile does not measure well fork costs.

There is a good benchmark of fork, exit and other facilities such
as socket, bind and mmap, at:

  http://bulk.fefe.de/scalability/

(the trailing '/' is needed).

You might try downloading them (see the cvs instructions on this page)
and seeing how your changes impact fork and exit.  I had to fiddle with
the rl.rlim counts in forkbench.c to get my copy to run without exceeding
rlimits on fork.

Notice also the rather dramatic improvements from Linux 2.4 to 2.6 in
some of these benchmarks, described elsewhere on the above page.  The
Linux developers take this stuff seriously, and have provided some
seriously good performance under load.

I'd be quite interested to see how your changes affect this benchmark.
Or perhaps you can find some other good measure of fork and exit costs,
the two areas that accounting is at immediate risk of impacting.

>   I also choose this implementation because Erich Focht wrote in the
> email http://lkml.org/lkml/2004/12/17/99 that keeps the historic about
> the creation of processes "sounds very useful for a lot of interesting
> stuff". 

(Some of us who only speak English would find 'history' more idiomatic
here than 'historic' ...)

Adding framework on the basis of such potential future useful value
is a hard sell in Linux land.  It is better to wait until each need is
immediately clear, and it is essential to keep the kernel infrastructure
is light weight as we can, or it will overwhelm the mental capacity of
most of us working on it, including myself for sure ;).

> Thank Paul for your comments,

You're welcome.  Thanks for tackling this task.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Elsa-devel] Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
  2005-02-17 15:50 ` Evgeniy Polyakov
  2005-02-21  7:07   ` Guillaume Thouvenin
@ 2005-02-21  8:05   ` Guillaume Thouvenin
  2005-02-21  8:48     ` Evgeniy Polyakov
  1 sibling, 1 reply; 12+ messages in thread
From: Guillaume Thouvenin @ 2005-02-21  8:05 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga,
	Erich Focht

On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote:
> >  
> > +#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR)
> 
> I suspect CONFIG_FORK_CONNECTOR is enough.

The problem here is that if connector is compiled as a module and
fork_connector as builtin there will be undefined reference to symbol
like 'cn_already_initialized' or 'cn_netlink_send'. That's why the
fork_connector() must be enable if CONFIG_CONNECTOR and
CONFIG_FORK_CONNECTOR are selected as builtin and not as a module. I
agree that it's not very elegant...

> > +			cn_netlink_send(msg, 1);
> 
> "1" here means that this message will be delivered to any group
> which has it's first bit set(1, 3, and so on) in given socket queue.
> I suspect it is not what you want.
> By design connector's users should send messages to the group it was
> registered with
> (which is obtained from idx field of the struct cb_id), in your case it
> is CN_IDX_FORK,
> and connector userspace consumers should bind to the same group (idx
> value).
> It is of course not requirement, but a fair path(hmm, I can add more
> strict checks into connector).
> By setting 0 as second parameter for cn_netlink_send() you will force
> connector's core
> to select proper group for you.

Yes but cn_netlink_send() is looking for a callback with the same idx.
As I don't use any callback, found == 0 and I have an error "Failed to
find multicast netlink group for callback[0xfeed.0xbeef]". So the good
call is: cn_netlink_send(msg, CN_IDX_FORK);

Thanks for your help,
Guillaume

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Elsa-devel] Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
  2005-02-21  8:05   ` [Elsa-devel] " Guillaume Thouvenin
@ 2005-02-21  8:48     ` Evgeniy Polyakov
  0 siblings, 0 replies; 12+ messages in thread
From: Evgeniy Polyakov @ 2005-02-21  8:48 UTC (permalink / raw)
  To: Guillaume Thouvenin
  Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga,
	Erich Focht

[-- Attachment #1: Type: text/plain, Size: 1982 bytes --]

On Mon, 2005-02-21 at 09:05 +0100, Guillaume Thouvenin wrote:
> On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote:
> > >  
> > > +#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR)
> > 
> > I suspect CONFIG_FORK_CONNECTOR is enough.
> 
> The problem here is that if connector is compiled as a module and
> fork_connector as builtin there will be undefined reference to symbol
> like 'cn_already_initialized' or 'cn_netlink_send'. That's why the
> fork_connector() must be enable if CONFIG_CONNECTOR and
> CONFIG_FORK_CONNECTOR are selected as builtin and not as a module. I
> agree that it's not very elegant...

Maybe "depends on CONNECTOR=y" ?

> > > +			cn_netlink_send(msg, 1);
> > 
> > "1" here means that this message will be delivered to any group
> > which has it's first bit set(1, 3, and so on) in given socket queue.
> > I suspect it is not what you want.
> > By design connector's users should send messages to the group it was
> > registered with
> > (which is obtained from idx field of the struct cb_id), in your case it
> > is CN_IDX_FORK,
> > and connector userspace consumers should bind to the same group (idx
> > value).
> > It is of course not requirement, but a fair path(hmm, I can add more
> > strict checks into connector).
> > By setting 0 as second parameter for cn_netlink_send() you will force
> > connector's core
> > to select proper group for you.
> 
> Yes but cn_netlink_send() is looking for a callback with the same idx.
> As I don't use any callback, found == 0 and I have an error "Failed to
> find multicast netlink group for callback[0xfeed.0xbeef]". So the good
> call is: cn_netlink_send(msg, CN_IDX_FORK);

Uh-oh, I see...
I recall your previous patch with the fork_callback()...

Nevertheless "1" is a bad idea, CN_IDX_FORK is what is expected.

> Thanks for your help,
> Guillaume
-- 
        Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-02-21 17:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-17 14:55 [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector Guillaume Thouvenin
2005-02-17 15:50 ` Evgeniy Polyakov
2005-02-21  7:07   ` Guillaume Thouvenin
2005-02-21  8:41     ` Evgeniy Polyakov
2005-02-21  9:47     ` Paul Jackson
2005-02-21 10:33       ` Guillaume Thouvenin
2005-02-21 11:58         ` Paul Jackson
2005-02-21 14:43           ` Guillaume Thouvenin
2005-02-21 16:55             ` Erich Focht
2005-02-21 17:54             ` Paul Jackson
2005-02-21  8:05   ` [Elsa-devel] " Guillaume Thouvenin
2005-02-21  8:48     ` Evgeniy Polyakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox