* [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
@ 2005-02-17 14:55 Guillaume Thouvenin
2005-02-17 15:50 ` Evgeniy Polyakov
0 siblings, 1 reply; 12+ messages in thread
From: Guillaume Thouvenin @ 2005-02-17 14:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Greg KH, lkml, Evgeniy Polyakov, elsa-devel, Gerrit Huizenga,
Erich Focht
Hello,
It's a new patch that implements a fork connector in the
kernel/fork.c:do_fork() routine. The connector sends information about
parent PID and child PID over a netlink interface. It allows to several
user space applications to be alerted when a fork occurs in the kernel.
The main drawback is that even if nobody listens, a message is send. I
don't know how to avoid that. I added an option (FORK_CONNECTOR) to
enable the fork connector (or disable) when compiling the kernel. To
work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It
has been tested on a 2.6.11-rc3-mm2 kernel with two user space
applications connected.
It is used by ELSA to manage group of processes in user space. In
conjunction with a per-process accounting information, like BSD or CSA,
ELSA provides a per-group of processes accounting.
Every comments are welcome,
Guillaume
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
---
drivers/connector/Kconfig | 10 ++++++++++
include/linux/connector.h | 2 ++
kernel/fork.c | 41 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 53 insertions(+)
diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/drivers/connector/Kconfig linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig
--- linux-2.6.11-rc3-mm2/drivers/connector/Kconfig 2005-02-11 11:00:16.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig 2005-02-17 15:48:41.000000000 +0100
@@ -10,4 +10,14 @@ config CONNECTOR
Connector support can also be built as a module. If so, the module
will be called cn.ko.
+config FORK_CONNECTOR
+ bool "Enable fork connector"
+ depends on CONNECTOR
+ ---help---
+ It adds a connector in kernel/fork.c:do_fork() function. When a fork
+ occurs, netlink is used to transfer information about the parent and
+ its child. This information can be used by a user space application.
+
+ Note: it only works if connector is built in the kernel.
+
endmenu
diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/include/linux/connector.h linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h
--- linux-2.6.11-rc3-mm2/include/linux/connector.h 2005-02-11 11:00:18.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h 2005-02-16 15:07:46.000000000 +0100
@@ -28,6 +28,8 @@
#define CN_VAL_KOBJECT_UEVENT 0x0000
#define CN_IDX_SUPERIO 0xaabb /* SuperIO subsystem */
#define CN_VAL_SUPERIO 0xccdd
+#define CN_IDX_FORK 0xfeed /* fork events */
+#define CN_VAL_FORK 0xbeef
#define CONNECTOR_MAX_MSG_SIZE 1024
diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/kernel/fork.c linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c
--- linux-2.6.11-rc3-mm2/kernel/fork.c 2005-02-11 11:00:18.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c 2005-02-17 13:43:48.000000000 +0100
@@ -41,6 +41,7 @@
#include <linux/profile.h>
#include <linux/rmap.h>
#include <linux/acct.h>
+#include <linux/connector.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -63,6 +64,44 @@ DEFINE_PER_CPU(unsigned long, process_co
EXPORT_SYMBOL(tasklist_lock);
+#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR)
+#define FORK_CN_INFO_SIZE 64
+static inline void fork_connector(pid_t parent, pid_t child)
+{
+ struct cb_id fork_id = {CN_IDX_FORK, CN_VAL_FORK};
+ static __u32 seq; /* used to test if we lost message */
+
+ if (cn_already_initialized) {
+ struct cn_msg *msg;
+ size_t size;
+
+ size = sizeof(*msg) + FORK_CN_INFO_SIZE;
+ msg = kmalloc(size, GFP_KERNEL);
+ if (msg) {
+ memset(msg, '\0', size);
+ memcpy(&msg->id, &fork_id, sizeof(msg->id));
+ msg->seq = seq++;
+ msg->ack = 0; /* not used */
+ /*
+ * size of data is the number of characters
+ * printed plus one for the trailing '\0'
+ */
+ msg->len = snprintf(msg->data, FORK_CN_INFO_SIZE-1,
+ "%i %i", parent, child) + 1;
+
+ cn_netlink_send(msg, 1);
+
+ kfree(msg);
+ }
+ }
+}
+#else
+static inline void fork_connector(pid_t parent, pid_t child)
+{
+ return;
+}
+#endif
+
int nr_processes(void)
{
int cpu;
@@ -1238,6 +1277,8 @@ long do_fork(unsigned long clone_flags,
if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE))
ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);
}
+
+ fork_connector(current->pid, p->pid);
} else {
free_pidmap(pid);
pid = PTR_ERR(p);
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector 2005-02-17 14:55 [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector Guillaume Thouvenin @ 2005-02-17 15:50 ` Evgeniy Polyakov 2005-02-21 7:07 ` Guillaume Thouvenin 2005-02-21 8:05 ` [Elsa-devel] " Guillaume Thouvenin 0 siblings, 2 replies; 12+ messages in thread From: Evgeniy Polyakov @ 2005-02-17 15:50 UTC (permalink / raw) To: Guillaume Thouvenin Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga, Erich Focht [-- Attachment #1: Type: text/plain, Size: 5955 bytes --] On Thu, 2005-02-17 at 15:55 +0100, Guillaume Thouvenin wrote: > Hello, Hello, I have small note about connector's usage in your module in particular and others in general. > It's a new patch that implements a fork connector in the > kernel/fork.c:do_fork() routine. The connector sends information about > parent PID and child PID over a netlink interface. It allows to several > user space applications to be alerted when a fork occurs in the kernel. > The main drawback is that even if nobody listens, a message is send. I > don't know how to avoid that. I added an option (FORK_CONNECTOR) to > enable the fork connector (or disable) when compiling the kernel. To > work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It > has been tested on a 2.6.11-rc3-mm2 kernel with two user space > applications connected. > > It is used by ELSA to manage group of processes in user space. In > conjunction with a per-process accounting information, like BSD or CSA, > ELSA provides a per-group of processes accounting. I think people will complain here... From rom one point of view it is step to the chaotic microkernel message flows, from the other side why only fork is monitored in this way? I still think that lsm with all calls logging is the best way to achieve this goal. > Every comments are welcome, > > Guillaume > > Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@bull.net> > --- > > drivers/connector/Kconfig | 10 ++++++++++ > include/linux/connector.h | 2 ++ > kernel/fork.c | 41 +++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 53 insertions(+) > > diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/drivers/connector/Kconfig linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig > --- linux-2.6.11-rc3-mm2/drivers/connector/Kconfig 2005-02-11 11:00:16.000000000 +0100 > +++ linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig 2005-02-17 15:48:41.000000000 +0100 > @@ -10,4 +10,14 @@ config CONNECTOR > Connector support can also be built as a module. If so, the module > will be called cn.ko. > > +config FORK_CONNECTOR > + bool "Enable fork connector" > + depends on CONNECTOR > + ---help--- > + It adds a connector in kernel/fork.c:do_fork() function. When a fork > + occurs, netlink is used to transfer information about the parent and > + its child. This information can be used by a user space application. > + > + Note: it only works if connector is built in the kernel. > + > endmenu > diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/include/linux/connector.h linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h > --- linux-2.6.11-rc3-mm2/include/linux/connector.h 2005-02-11 11:00:18.000000000 +0100 > +++ linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h 2005-02-16 15:07:46.000000000 +0100 > @@ -28,6 +28,8 @@ > #define CN_VAL_KOBJECT_UEVENT 0x0000 > #define CN_IDX_SUPERIO 0xaabb /* SuperIO subsystem */ > #define CN_VAL_SUPERIO 0xccdd > +#define CN_IDX_FORK 0xfeed /* fork events */ > +#define CN_VAL_FORK 0xbeef > > > #define CONNECTOR_MAX_MSG_SIZE 1024 > diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/kernel/fork.c linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c > --- linux-2.6.11-rc3-mm2/kernel/fork.c 2005-02-11 11:00:18.000000000 +0100 > +++ linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c 2005-02-17 13:43:48.000000000 +0100 > @@ -41,6 +41,7 @@ > #include <linux/profile.h> > #include <linux/rmap.h> > #include <linux/acct.h> > +#include <linux/connector.h> > > #include <asm/pgtable.h> > #include <asm/pgalloc.h> > @@ -63,6 +64,44 @@ DEFINE_PER_CPU(unsigned long, process_co > > EXPORT_SYMBOL(tasklist_lock); > > +#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR) I suspect CONFIG_FORK_CONNECTOR is enough. > +#define FORK_CN_INFO_SIZE 64 > +static inline void fork_connector(pid_t parent, pid_t child) > +{ > + struct cb_id fork_id = {CN_IDX_FORK, CN_VAL_FORK}; > + static __u32 seq; /* used to test if we lost message */ > + > + if (cn_already_initialized) { > + struct cn_msg *msg; > + size_t size; > + > + size = sizeof(*msg) + FORK_CN_INFO_SIZE; > + msg = kmalloc(size, GFP_KERNEL); > + if (msg) { > + memset(msg, '\0', size); > + memcpy(&msg->id, &fork_id, sizeof(msg->id)); > + msg->seq = seq++; > + msg->ack = 0; /* not used */ > + /* > + * size of data is the number of characters > + * printed plus one for the trailing '\0' > + */ > + msg->len = snprintf(msg->data, FORK_CN_INFO_SIZE-1, > + "%i %i", parent, child) + 1; > + > + cn_netlink_send(msg, 1); "1" here means that this message will be delivered to any group which has it's first bit set(1, 3, and so on) in given socket queue. I suspect it is not what you want. By design connector's users should send messages to the group it was registered with (which is obtained from idx field of the struct cb_id), in your case it is CN_IDX_FORK, and connector userspace consumers should bind to the same group (idx value). It is of course not requirement, but a fair path(hmm, I can add more strict checks into connector). By setting 0 as second parameter for cn_netlink_send() you will force connector's core to select proper group for you. > + kfree(msg); > + } > + } > +} > +#else > +static inline void fork_connector(pid_t parent, pid_t child) > +{ > + return; > +} > +#endif > + > int nr_processes(void) > { > int cpu; > @@ -1238,6 +1277,8 @@ long do_fork(unsigned long clone_flags, > if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE)) > ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP); > } > + > + fork_connector(current->pid, p->pid); > } else { > free_pidmap(pid); > pid = PTR_ERR(p); > -- Evgeniy Polyakov Crash is better than data corruption -- Arthur Grabowski [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector 2005-02-17 15:50 ` Evgeniy Polyakov @ 2005-02-21 7:07 ` Guillaume Thouvenin 2005-02-21 8:41 ` Evgeniy Polyakov 2005-02-21 9:47 ` Paul Jackson 2005-02-21 8:05 ` [Elsa-devel] " Guillaume Thouvenin 1 sibling, 2 replies; 12+ messages in thread From: Guillaume Thouvenin @ 2005-02-21 7:07 UTC (permalink / raw) To: Evgeniy Polyakov Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga, Erich Focht On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote: > On Thu, 2005-02-17 at 15:55 +0100, Guillaume Thouvenin wrote: > > It's a new patch that implements a fork connector in the > > kernel/fork.c:do_fork() routine. The connector sends information about > > parent PID and child PID over a netlink interface. It allows to several > > user space applications to be alerted when a fork occurs in the kernel. > > The main drawback is that even if nobody listens, a message is send. I > > don't know how to avoid that. I added an option (FORK_CONNECTOR) to > > enable the fork connector (or disable) when compiling the kernel. To > > work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It > > has been tested on a 2.6.11-rc3-mm2 kernel with two user space > > applications connected. > > > > It is used by ELSA to manage group of processes in user space. In > > conjunction with a per-process accounting information, like BSD or CSA, > > ELSA provides a per-group of processes accounting. > > I think people will complain here... > ... [cut here] ... > I still think that lsm with all calls logging is the best way to > achieve this goal. I agree with you. My first implementation was with LSM but Chris Wright (I think it was him) notice that it's not the right framework (and it seems true). So I looked for another solution. I though about kobject but it was too "big" and finally, Greg KH spoke about connectors. It's small and efficient. > from the other side why only fork is monitored in this way? The problem is the following: I have a user space daemon that manages group of processes. The main idea is, if a parent belongs to a group then its child belongs to the same group. To achieve this I need to know when a fork occurs and which processes are involved. I don't see how to do this without a hook in the do_fork() routine... Any ideas are welcome. Thank you Evgeniy for all your comments about the code, it helps and I will modify the patch. Regards, Guillaume ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector 2005-02-21 7:07 ` Guillaume Thouvenin @ 2005-02-21 8:41 ` Evgeniy Polyakov 2005-02-21 9:47 ` Paul Jackson 1 sibling, 0 replies; 12+ messages in thread From: Evgeniy Polyakov @ 2005-02-21 8:41 UTC (permalink / raw) To: Guillaume Thouvenin Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga, Erich Focht [-- Attachment #1: Type: text/plain, Size: 3032 bytes --] On Mon, 2005-02-21 at 08:07 +0100, Guillaume Thouvenin wrote: > On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote: > > On Thu, 2005-02-17 at 15:55 +0100, Guillaume Thouvenin wrote: > > > It's a new patch that implements a fork connector in the > > > kernel/fork.c:do_fork() routine. The connector sends information about > > > parent PID and child PID over a netlink interface. It allows to several > > > user space applications to be alerted when a fork occurs in the kernel. > > > The main drawback is that even if nobody listens, a message is send. I > > > don't know how to avoid that. I added an option (FORK_CONNECTOR) to > > > enable the fork connector (or disable) when compiling the kernel. To > > > work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It > > > has been tested on a 2.6.11-rc3-mm2 kernel with two user space > > > applications connected. > > > > > > It is used by ELSA to manage group of processes in user space. In > > > conjunction with a per-process accounting information, like BSD or CSA, > > > ELSA provides a per-group of processes accounting. > > > > I think people will complain here... > > ... [cut here] ... > > I still think that lsm with all calls logging is the best way to > > achieve this goal. > > I agree with you. My first implementation was with LSM but Chris Wright > (I think it was him) notice that it's not the right framework (and it > seems true). So I looked for another solution. I though about kobject > but it was too "big" and finally, Greg KH spoke about connectors. It's > small and efficient. Your do_fork() change really looks like either audit addon(but it is really not the case) or LSM logging facility. I think adding cn_netlink_send() in every function in security/dummy.c and renaming it to security/cn_logger.c or something is not such a bad idea... Or even wait in each function until userspace replies with the decision to allow or not such call. Although it can create a lock (need to recheck security hooks in send/recv pathes). > > from the other side why only fork is monitored in this way? > > The problem is the following: I have a user space daemon that manages > group of processes. The main idea is, if a parent belongs to a group > then its child belongs to the same group. To achieve this I need to know > when a fork occurs and which processes are involved. I don't see how to > do this without a hook in the do_fork() routine... Any ideas are > welcome. Now I begin to understand Chris Wright - LSM are designed not for monitoring, but only for initialisation path - i.e. LSM will say only if something is allowed or not, but not if it was performed. So, for exactly your setup there is no any other way then to patch do_fork(). > Thank you Evgeniy for all your comments about the code, it helps and I > will modify the patch. > > Regards, > Guillaume -- Evgeniy Polyakov Crash is better than data corruption -- Arthur Grabowski [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector 2005-02-21 7:07 ` Guillaume Thouvenin 2005-02-21 8:41 ` Evgeniy Polyakov @ 2005-02-21 9:47 ` Paul Jackson 2005-02-21 10:33 ` Guillaume Thouvenin 1 sibling, 1 reply; 12+ messages in thread From: Paul Jackson @ 2005-02-21 9:47 UTC (permalink / raw) To: Guillaume Thouvenin Cc: johnpol, akpm, greg, linux-kernel, elsa-devel, gh, efocht Guillaume wrote: > The problem is the following: I have a user space daemon that manages > group of processes. The main idea is, if a parent belongs to a group > then its child belongs to the same group. To achieve this I need to know > when a fork occurs and which processes are involved. I don't see how to > do this without a hook in the do_fork() routine... How is what you need, for process grouping, any more complex than another sort of {bank, job, aggregate, session, group, ...} integer id field in the task struct, that is copied on fork, and can be queried and manipulated from user space, in accordance with whatever rules you implement? When I look at the elsacct_process_copy() routine, which is called from fork, in your patch-2.6.8.1-elsa, I'm not sure what it does, but it sure looks like it could cause scaling and performance problems. Linux works really hard to keep fork costs low. Copying another integer field, as part of the block copy of the task struct at fork, sure would be cheaper than this. Not only does this hook look too expensive, I don't even see the need for any such explicit code hook in fork for accounting at all. Does your user space daemon require to know about each task as it is forked, in near real time? Is it trying to do something with this accounting information while the tasks being accounted for are necessarily still alive? The classic accounting that I am familiar with, from years ago, only did post-mortem analysis. So long as enough entrails were left around so that it could piece together the story, it didn't require any immediate notice of anything. You need to clear a couple of accounting accumulators directly in the task struct at fork, and write a record to a specified open file at exit. That's about it. The main problems I was aware of with that classic accounting (which is probably what is now known as BSD accounting) are: 1) The fixed length accounting record didn't allow for added or longer fields. A little bit more flexible and extensible format is desired, but some effort should be made to keep the format still reasonably tight and compressed. No full spec XML. The format should allow for some form of resyncronization after a chunk of data is lost. 2) An additional bank/job/aggregate/session/group/... id seems desired. I have yet to understand why this need be anything fancier than another integer field in the task struct. 3) Probably some more data items are worth collecting -- which could be placed in the outgoing compressed data stream, along with the existing records written on task exit. Over time, appropriate hooks should be proposed to collect such data as seems needed. 4) The current mechanism of collecting per-task data only on exit makes it difficult to account for long running jobs. Perhaps we could use a leisurely background task that slowly scans the tasks looking for those that have been present since the last scan, and causes an intermediate accounting record to be written for them. What other essential deficiencies are there that you need to address? I don't see any need for explicit hooks in fork to resolve the above deficiencies. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector 2005-02-21 9:47 ` Paul Jackson @ 2005-02-21 10:33 ` Guillaume Thouvenin 2005-02-21 11:58 ` Paul Jackson 0 siblings, 1 reply; 12+ messages in thread From: Guillaume Thouvenin @ 2005-02-21 10:33 UTC (permalink / raw) To: Paul Jackson Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga, Erich Focht, Jay Lan On Mon, 2005-02-21 at 01:47 -0800, Paul Jackson wrote: > Guillaume wrote: > > The problem is the following: I have a user space daemon that manages > > group of processes. The main idea is, if a parent belongs to a group > > then its child belongs to the same group. To achieve this I need to know > > when a fork occurs and which processes are involved. I don't see how to > > do this without a hook in the do_fork() routine... > > How is what you need, for process grouping, any more complex than > another sort of {bank, job, aggregate, session, group, ...} integer id > field in the task struct, that is copied on fork, and can be queried and > manipulated from user space, in accordance with whatever rules you > implement? If a process belongs to several group of processes, an new integer in the task_struct is not enough, you need a list or something like this. If you're using a list you need to add function to manage this list in the kernel but we don't want to add this kind of management inside the kernel because with the fork connector we can keep it outside. > When I look at the elsacct_process_copy() routine, which is called from > fork, in your patch-2.6.8.1-elsa, I'm not sure what it does, but it sure > looks like it could cause scaling and performance problems. This patch is an old one with many kernel modifications that impacts the Linux performance. That's why we thought about another solution where all management is done by a user space daemon. Currently we're using the fork connector. > Does your user space daemon require to know about each task as it is > forked, in near real time? Is it trying to do something with this > accounting information while the tasks being accounted for are > necessarily still alive? I don't need real time. I just need to know which process forks during the accounting period. The user space daemon provided by ELSA just keeps a trace of parents and its children during a given period (generally it's the accounting period). The analysis will be done later by another application (also provided by ELSA) by using the trace of parents and children plus the accounting trace. > The main problems I was aware of with that classic accounting (which > is probably what is now known as BSD accounting) are: > ... [cut here] ... > 2) An additional bank/job/aggregate/session/group/... id seems desired. > I have yet to understand why this need be anything fancier than > another integer field in the task struct. We're trying to solve this one. I think that I answer to the integer field problem. For the necessity of the per-group accounting, it can be interesting to do accounting on a specific "task". For example if you want to have accounting data for a compilation you can add the corresponding shell in a group of processes and commands involved in the compilation like gcc, cc, as, collect2, ... will be automatically added in the same group and you will be able to get statistics about this compilation. > 3) Probably some more data items are worth collecting -- which could > be placed in the outgoing compressed data stream, along with the > existing records written on task exit. Over time, appropriate > hooks should be proposed to collect such data as seems needed. There is a discussion about this with Jay Lan to merge the CSA and BSD accounting framework. I don't know if there is some work around 1) and 4). Regards, Guillaume ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector 2005-02-21 10:33 ` Guillaume Thouvenin @ 2005-02-21 11:58 ` Paul Jackson 2005-02-21 14:43 ` Guillaume Thouvenin 0 siblings, 1 reply; 12+ messages in thread From: Paul Jackson @ 2005-02-21 11:58 UTC (permalink / raw) To: Guillaume Thouvenin Cc: akpm, greg, linux-kernel, elsa-devel, gh, efocht, jlan Thank-you for your quick answer. Guillaume wrote: > > If a process belongs to several group of processes, an new integer in > the task_struct is not enough, you need a list or something like this. > If you're using a list you need to add function to manage this list in > the kernel but we don't want to add this kind of management inside the > kernel because with the fork connector we can keep it outside. Ok - fork connect. From your patch of a couple days ago, for the benefit of lurkers: > > It's a new patch that implements a fork connector in the > kernel/fork.c:do_fork() routine. The connector sends information about > parent PID and child PID over a netlink interface. It allows to several > user space applications to be alerted when a fork occurs in the kernel. Whoaa ... you're saying that because you might have several groups a task could belong to at once, you'll use netlink to avoid managing lists in the kernel. Seems that you're spending thousands of instructions to save dozens. This is not a good trade off. I can imagine several way cheaper ways to handle this. If the number of groups to which a task could belong has some small finite upper limit, like at most 5 groups, you could have 5 integer id's in the task struct instead of 1. If the number of elements in a particular group has a small upper bound, you could even replace the ints with bit fields. Or you could enumerate the different combinations of groups to which a task might belong, assign each such combination a unique integer, and keep that integer in the task struct. The enumeration could be done dynamically, only counting the particular combinations of group memberships that actually had use. This has the disadvantage that a particular combination, once enumerated, would have to stay around until the next boot - a potential memory leak. Probably not acceptable, unless the cost of storing a no longer used combination is nearly zero. Or you could have a little 'jobids' struct that held a list and a reference counter, where the list held a particular combination of ids, and the reference counter tracked how many tasks referenced that jobids struct. Put a single pointer in the task struct to a jobids struct, and increment and decrement the reference counter in the jobids struct on fork and exit. Free it if the count goes to zero on exit. This solves the memory leak of the previous, with increased cost to the fork. Since we really do design these systems to stay up 'forever', this is perhaps the winner. Any time a particular task is added to, or removed from, a group, if the ref count of its jobids struct is one, then modify the id list attached to that jobids struct in place. If the ref count is more than one, copy the jobids struct and list to a new one, decrement the count in the old one, and modify the new one in place. Such list and counter manipulations are the daily stuff of kernel code. No need to avoid such. Just because you have more than one id doesn't mean each task has to be connected directly into its own custom list, and even if you needed that, I don't see that it's a win to avoid such a list by using netlink. It can be a worthwhile exercise to single step through each machine instruction that you add to fork, in the forking task or any other task that is sent data or a signal therefrom. You really do want to keep the number of added instructions (and number of additional cache lines and memory pages accessed, especially written) to a minimum. If the effort of single stepping through such would require the patience of Copernicus, then it's back to the drawing board for a more efficient solution. > I don't know if there is some work around 1) and 4). Well, you might have dodged the (1) bullet up until now by using netlink and not extending the accounting record at exit. Bullet (1) was extending the accounting record past its fairly constrained size, if that's still a problem; it's been years since I looked. But if you adapt one of the above suggestions, and don't send anything out of the task context at fork, then you will have to deal with (1) in order to include the list of job id's in the record written at exit. If you want to collect any other data, bullet (3), you will also to solve bullet (1). Item (4), collecting accounting data for long running tasks, is probably less pressing. Its solution will also likely require solving (1), however. Taking a quick look at init/Kconfig and include/linux/acct.h, it seems we are using BSD_PROCESS_ACCT_V3 format, which is the latest 64 byte format, allowing for larger uid/gid. With slight variations, this 64 byte format has lasted about 25 years. It's time to replace it, especially if you have designs on collecting any additional information, which you clearly do. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector 2005-02-21 11:58 ` Paul Jackson @ 2005-02-21 14:43 ` Guillaume Thouvenin 2005-02-21 16:55 ` Erich Focht 2005-02-21 17:54 ` Paul Jackson 0 siblings, 2 replies; 12+ messages in thread From: Guillaume Thouvenin @ 2005-02-21 14:43 UTC (permalink / raw) To: Paul Jackson Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga, Erich Focht, Jay Lan On Mon, 2005-02-21 at 03:58 -0800, Paul Jackson wrote: > > It's a new patch that implements a fork connector in the > > kernel/fork.c:do_fork() routine. The connector sends information about > > parent PID and child PID over a netlink interface. It allows to several > > user space applications to be alerted when a fork occurs in the kernel. > > Whoaa ... you're saying that because you might have several groups a > task could belong to at once, you'll use netlink to avoid managing lists > in the kernel. Seems that you're spending thousands of instructions to > save dozens. This is not a good trade off. I understand your point of view but I'm using netlink interface because it's already in the kernel so my choice is to use something that is already in the kernel instead of adding dozens of new instructions and also to do things in user space. The fork connector is here to move the management in the user space. Otherwise there is PAGG that manages group of processes in the kernel. To test performances, I tried to compile a kernel several times with and without the fork connector and here are the resource usage computed with the following command: time /bin/sh -c 'make O=/home/guill/build/k2610 oldconfig \ && make O=/home/guill/build/k2610 bzImage \ && make O=/home/guill/build/k2610 modules' between each test, the directory that contains object files was destroyed and a 'sync' was done. Results are: kernel without fork connector real : 8m17.042s 8m10.113s 8m08.597s 8m10.068s 8m08.930s user : 7m32.376s 7m35.985s 7m34.424s 7m34.221s 7m34.835s sys : 0m50.730s 0m51.139s 0m51.159s 0m51.406s 0m51.020s kernel with the fork connector real : 8m14.492s 8m08.656s 8m07.754s 8m08.002s 8m07.854s user : 7m31.664s 7m33.528s 7m33.625s 7m33.500s 7m33.822s sys : 0m50.651s 0m51.222s 0m51.102s 0m51.367s 0m50.894s kernel with the fork connector + application listens real : 8m08.596s 8m08.950s 8m08.899s 8m08.678s 8m08.987s user : 7m33.312s 7m33.898s 7m34.004s 7m33.285s 7m33.628s sys : 0m52.222s 0m52.013s 0m51.809s 0m52.361s 0m52.036s I also choose this implementation because Erich Focht wrote in the email http://lkml.org/lkml/2004/12/17/99 that keeps the historic about the creation of processes "sounds very useful for a lot of interesting stuff". So I thought about something that can be used by other application and with netlink, information is available to everyone. > I can imagine several way cheaper ways to handle this. > > If the number of groups to which a task could belong has some small > finite upper limit, like at most 5 groups, you could have 5 integer id's > in the task struct instead of 1. If the number of elements in a > particular group has a small upper bound, you could even replace the > ints with bit fields. > > Or you could enumerate the different combinations of groups to which a > task might belong, assign each such combination a unique integer, and > keep that integer in the task struct. The enumeration could be done > dynamically, only counting the particular combinations of group > memberships that actually had use. This has the disadvantage that a > particular combination, once enumerated, would have to stay around until > the next boot - a potential memory leak. Probably not acceptable, > unless the cost of storing a no longer used combination is nearly zero. The problem with those solutions is that we suppose that a process can belong to a finite number of task. I suppose that it can be true in practice. > Or you could have a little 'jobids' struct that held a list and a > reference counter, where the list held a particular combination of ids, > and the reference counter tracked how many tasks referenced that jobids > struct. Put a single pointer in the task struct to a jobids struct, and > increment and decrement the reference counter in the jobids struct on > fork and exit. Free it if the count goes to zero on exit. This solves > the memory leak of the previous, with increased cost to the fork. Since > we really do design these systems to stay up 'forever', this is perhaps > the winner. Any time a particular task is added to, or removed from, a > group, if the ref count of its jobids struct is one, then modify the id > list attached to that jobids struct in place. If the ref count is more > than one, copy the jobids struct and list to a new one, decrement the > count in the old one, and modify the new one in place. Such list and > counter manipulations are the daily stuff of kernel code. No need to > avoid such. This solution is interesting. The problem is to know if a fork connector is useful for some other projects. As I said, one of my goal was to provide a way to alert user space application when a fork occurs in the kernel because I think that other applications need this kind of information. But if it is needed only by ELSA, you're right, a solution specific to our problem is clearly more efficient. > Just because you have more than one id doesn't mean each task has to be > connected directly into its own custom list, and even if you needed > that, I don't see that it's a win to avoid such a list by using netlink. The advantages with the fork connector is that it can be used by other application and modification in the current kernel is minimal. The main drawbacks is maybe the performance... So nobody needs such hook (Erich?) Thank Paul for your comments, Guillaume ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector 2005-02-21 14:43 ` Guillaume Thouvenin @ 2005-02-21 16:55 ` Erich Focht 2005-02-21 17:54 ` Paul Jackson 1 sibling, 0 replies; 12+ messages in thread From: Erich Focht @ 2005-02-21 16:55 UTC (permalink / raw) To: Guillaume Thouvenin Cc: Paul Jackson, Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga, Jay Lan On Monday 21 February 2005 15:43, Guillaume Thouvenin wrote: > > I also choose this implementation because Erich Focht wrote in the > email http://lkml.org/lkml/2004/12/17/99 that keeps the historic about > the creation of processes "sounds very useful for a lot of interesting > stuff". So I thought about something that can be used by other > application and with netlink, information is available to everyone. Besides accounting I had in mind something like cluster-wide pid tracking in userspace with builtin relationship information. A bit of single system image integration... As I don't have it, yet, I'm not (yet) a very strong requester for the service provided by your module. But I still think it's usefull and might want later a hook on exit, too. (And yes, I can imagine of other ways to get the data effectively out of the kernel, too). > Results are: > > kernel without fork connector > real : 8m17.042s 8m10.113s 8m08.597s 8m10.068s 8m08.930s > user : 7m32.376s 7m35.985s 7m34.424s 7m34.221s 7m34.835s > sys : 0m50.730s 0m51.139s 0m51.159s 0m51.406s 0m51.020s > > kernel with the fork connector > real : 8m14.492s 8m08.656s 8m07.754s 8m08.002s 8m07.854s > user : 7m31.664s 7m33.528s 7m33.625s 7m33.500s 7m33.822s > sys : 0m50.651s 0m51.222s 0m51.102s 0m51.367s 0m50.894s > > kernel with the fork connector + application listens > real : 8m08.596s 8m08.950s 8m08.899s 8m08.678s 8m08.987s > user : 7m33.312s 7m33.898s 7m34.004s 7m33.285s 7m33.628s > sys : 0m52.222s 0m52.013s 0m51.809s 0m52.361s 0m52.036s I liked the previous lean implementation more, but the performance of this one doesn't look at all as scary as I thought. Best regards, Erich ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector 2005-02-21 14:43 ` Guillaume Thouvenin 2005-02-21 16:55 ` Erich Focht @ 2005-02-21 17:54 ` Paul Jackson 1 sibling, 0 replies; 12+ messages in thread From: Paul Jackson @ 2005-02-21 17:54 UTC (permalink / raw) To: Guillaume Thouvenin Cc: akpm, greg, linux-kernel, elsa-devel, gh, efocht, jlan Guillaume wrote: > > I understand your point of view but I'm using netlink interface > because it's already in the kernel so my choice is to use something that > is already in the kernel instead of adding dozens of new instructions > and also to do things in user space. All else equal, yes it is good to use what facilities are already available, and it is good to do things in user space. If one adds many cpu cycles and quite a few pages of memory to read and touch to every fork, then all else is not equal. > To test performances, I tried to > compile a kernel several times with and without the fork connector I agree that a kernel compile does not measure well fork costs. There is a good benchmark of fork, exit and other facilities such as socket, bind and mmap, at: http://bulk.fefe.de/scalability/ (the trailing '/' is needed). You might try downloading them (see the cvs instructions on this page) and seeing how your changes impact fork and exit. I had to fiddle with the rl.rlim counts in forkbench.c to get my copy to run without exceeding rlimits on fork. Notice also the rather dramatic improvements from Linux 2.4 to 2.6 in some of these benchmarks, described elsewhere on the above page. The Linux developers take this stuff seriously, and have provided some seriously good performance under load. I'd be quite interested to see how your changes affect this benchmark. Or perhaps you can find some other good measure of fork and exit costs, the two areas that accounting is at immediate risk of impacting. > I also choose this implementation because Erich Focht wrote in the > email http://lkml.org/lkml/2004/12/17/99 that keeps the historic about > the creation of processes "sounds very useful for a lot of interesting > stuff". (Some of us who only speak English would find 'history' more idiomatic here than 'historic' ...) Adding framework on the basis of such potential future useful value is a hard sell in Linux land. It is better to wait until each need is immediately clear, and it is essential to keep the kernel infrastructure is light weight as we can, or it will overwhelm the mental capacity of most of us working on it, including myself for sure ;). > Thank Paul for your comments, You're welcome. Thanks for tackling this task. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Elsa-devel] Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector 2005-02-17 15:50 ` Evgeniy Polyakov 2005-02-21 7:07 ` Guillaume Thouvenin @ 2005-02-21 8:05 ` Guillaume Thouvenin 2005-02-21 8:48 ` Evgeniy Polyakov 1 sibling, 1 reply; 12+ messages in thread From: Guillaume Thouvenin @ 2005-02-21 8:05 UTC (permalink / raw) To: Evgeniy Polyakov Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga, Erich Focht On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote: > > > > +#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR) > > I suspect CONFIG_FORK_CONNECTOR is enough. The problem here is that if connector is compiled as a module and fork_connector as builtin there will be undefined reference to symbol like 'cn_already_initialized' or 'cn_netlink_send'. That's why the fork_connector() must be enable if CONFIG_CONNECTOR and CONFIG_FORK_CONNECTOR are selected as builtin and not as a module. I agree that it's not very elegant... > > + cn_netlink_send(msg, 1); > > "1" here means that this message will be delivered to any group > which has it's first bit set(1, 3, and so on) in given socket queue. > I suspect it is not what you want. > By design connector's users should send messages to the group it was > registered with > (which is obtained from idx field of the struct cb_id), in your case it > is CN_IDX_FORK, > and connector userspace consumers should bind to the same group (idx > value). > It is of course not requirement, but a fair path(hmm, I can add more > strict checks into connector). > By setting 0 as second parameter for cn_netlink_send() you will force > connector's core > to select proper group for you. Yes but cn_netlink_send() is looking for a callback with the same idx. As I don't use any callback, found == 0 and I have an error "Failed to find multicast netlink group for callback[0xfeed.0xbeef]". So the good call is: cn_netlink_send(msg, CN_IDX_FORK); Thanks for your help, Guillaume ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Elsa-devel] Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector 2005-02-21 8:05 ` [Elsa-devel] " Guillaume Thouvenin @ 2005-02-21 8:48 ` Evgeniy Polyakov 0 siblings, 0 replies; 12+ messages in thread From: Evgeniy Polyakov @ 2005-02-21 8:48 UTC (permalink / raw) To: Guillaume Thouvenin Cc: Andrew Morton, Greg KH, lkml, elsa-devel, Gerrit Huizenga, Erich Focht [-- Attachment #1: Type: text/plain, Size: 1982 bytes --] On Mon, 2005-02-21 at 09:05 +0100, Guillaume Thouvenin wrote: > On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote: > > > > > > +#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR) > > > > I suspect CONFIG_FORK_CONNECTOR is enough. > > The problem here is that if connector is compiled as a module and > fork_connector as builtin there will be undefined reference to symbol > like 'cn_already_initialized' or 'cn_netlink_send'. That's why the > fork_connector() must be enable if CONFIG_CONNECTOR and > CONFIG_FORK_CONNECTOR are selected as builtin and not as a module. I > agree that it's not very elegant... Maybe "depends on CONNECTOR=y" ? > > > + cn_netlink_send(msg, 1); > > > > "1" here means that this message will be delivered to any group > > which has it's first bit set(1, 3, and so on) in given socket queue. > > I suspect it is not what you want. > > By design connector's users should send messages to the group it was > > registered with > > (which is obtained from idx field of the struct cb_id), in your case it > > is CN_IDX_FORK, > > and connector userspace consumers should bind to the same group (idx > > value). > > It is of course not requirement, but a fair path(hmm, I can add more > > strict checks into connector). > > By setting 0 as second parameter for cn_netlink_send() you will force > > connector's core > > to select proper group for you. > > Yes but cn_netlink_send() is looking for a callback with the same idx. > As I don't use any callback, found == 0 and I have an error "Failed to > find multicast netlink group for callback[0xfeed.0xbeef]". So the good > call is: cn_netlink_send(msg, CN_IDX_FORK); Uh-oh, I see... I recall your previous patch with the fork_callback()... Nevertheless "1" is a bad idea, CN_IDX_FORK is what is expected. > Thanks for your help, > Guillaume -- Evgeniy Polyakov Crash is better than data corruption -- Arthur Grabowski [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2005-02-21 17:56 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-02-17 14:55 [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector Guillaume Thouvenin 2005-02-17 15:50 ` Evgeniy Polyakov 2005-02-21 7:07 ` Guillaume Thouvenin 2005-02-21 8:41 ` Evgeniy Polyakov 2005-02-21 9:47 ` Paul Jackson 2005-02-21 10:33 ` Guillaume Thouvenin 2005-02-21 11:58 ` Paul Jackson 2005-02-21 14:43 ` Guillaume Thouvenin 2005-02-21 16:55 ` Erich Focht 2005-02-21 17:54 ` Paul Jackson 2005-02-21 8:05 ` [Elsa-devel] " Guillaume Thouvenin 2005-02-21 8:48 ` Evgeniy Polyakov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox