* [PATCH 1/9] Task Watchers v2: Task watchers v2
2006-11-03 4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
@ 2006-11-03 4:22 ` Matt Helsley
2006-11-03 13:22 ` Daniel Walker
2006-11-03 4:22 ` [PATCH 2/9] Task Watchers v2: Register audit task watcher Matt Helsley
` (8 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Matt Helsley @ 2006-11-03 4:22 UTC (permalink / raw)
To: Linux-Kernel
Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton
[-- Attachment #1: task-watchers-v2 --]
[-- Type: text/plain, Size: 15855 bytes --]
Associate function calls with significant events in a task's lifetime much like
we handle kernel and module init/exit functions. This creates a table for each
of the following events in the task_watchers_table ELF section:
WATCH_TASK_INIT at the beginning of a fork/clone system call when the
new task struct first becomes available.
WATCH_TASK_CLONE just before returning successfully from a fork/clone.
WATCH_TASK_EXEC just before successfully returning from the exec
system call.
WATCH_TASK_UID every time a task's real or effective user id changes.
WATCH_TASK_GID every time a task's real or effective group id changes.
WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
for any reason.
WATCH_TASK_FREE is called before critical task structures like
the mm_struct become inaccessible and the task is subsequently freed.
The next patch will add a debugfs interface for measuring fork and exit rates
which can be used to calculate the overhead of the task watcher infrastructure.
Subsequent patches will make use of task watchers to simplify fork, exit,
and many of the system calls that set [er][ug]ids.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chandra S. Seetharaman <sekharan@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: linux-audit@redhat.com
Cc: Paul Jackson <pj@sgi.com>
---
fs/exec.c | 3 +++
include/asm-generic/vmlinux.lds.h | 19 +++++++++++++++++++
include/linux/task_watchers.h | 31 +++++++++++++++++++++++++++++++
kernel/Makefile | 2 +-
kernel/exit.c | 3 +++
kernel/fork.c | 15 +++++++++++----
kernel/sys.c | 9 +++++++++
kernel/task_watchers.c | 37 +++++++++++++++++++++++++++++++++++++
8 files changed, 114 insertions(+), 5 deletions(-)
Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel
Clone Number of Children Cloned
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 18058.4 18323.3 18465.9 18439.5 18574.5 18566.3
Dev 325.705 306.322 316.464 291.979 287.531 281.275
Err (%) 1.80362 1.67176 1.71378 1.58345 1.54799 1.51498
Fork Number of Children Forked
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 18074 18199.8 18399.7 18482.5 18504.6 18565.5
Dev 331.876 315.515 302.402 309.314 300.937 309.168
Err (%) 1.83621 1.73361 1.64351 1.67356 1.62628 1.66528
Kernbench:
Elapsed: 124.353s User: 439.935s System: 46.334s CPU: 390.4%
440.61user 46.24system 2:04.35elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
440.27user 46.21system 2:04.81elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
440.78user 46.70system 2:04.39elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.91user 46.35system 2:04.31elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.80user 46.28system 2:04.39elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.67user 46.27system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.63user 46.29system 2:04.01elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.49user 46.48system 2:04.67elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.63user 46.25system 2:04.34elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.56user 46.27system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
Index: linux-2.6.19-rc2-mm2/kernel/sys.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/sys.c
+++ linux-2.6.19-rc2-mm2/kernel/sys.c
@@ -28,10 +28,11 @@
#include <linux/tty.h>
#include <linux/signal.h>
#include <linux/cn_proc.h>
#include <linux/getcpu.h>
#include <linux/seccomp.h>
+#include <linux/task_watchers.h>
#include <linux/compat.h>
#include <linux/syscalls.h>
#include <linux/kprobes.h>
@@ -958,10 +959,11 @@ asmlinkage long sys_setregid(gid_t rgid,
current->fsgid = new_egid;
current->egid = new_egid;
current->gid = new_rgid;
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
+ notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
}
/*
* setgid() is implemented like SysV w/ SAVED_IDS
@@ -993,10 +995,11 @@ asmlinkage long sys_setgid(gid_t gid)
else
return -EPERM;
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
+ notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
}
static int set_user(uid_t new_ruid, int dumpclear)
{
@@ -1081,10 +1084,11 @@ asmlinkage long sys_setreuid(uid_t ruid,
current->suid = current->euid;
current->fsuid = current->euid;
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+ notify_task_watchers(WATCH_TASK_UID, 0, current);
return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE);
}
@@ -1128,10 +1132,11 @@ asmlinkage long sys_setuid(uid_t uid)
current->fsuid = current->euid = uid;
current->suid = new_suid;
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+ notify_task_watchers(WATCH_TASK_UID, 0, current);
return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID);
}
@@ -1176,10 +1181,11 @@ asmlinkage long sys_setresuid(uid_t ruid
if (suid != (uid_t) -1)
current->suid = suid;
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+ notify_task_watchers(WATCH_TASK_UID, 0, current);
return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES);
}
asmlinkage long sys_getresuid(uid_t __user *ruid, uid_t __user *euid, uid_t __user *suid)
@@ -1228,10 +1234,11 @@ asmlinkage long sys_setresgid(gid_t rgid
if (sgid != (gid_t) -1)
current->sgid = sgid;
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
+ notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
}
asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid)
{
@@ -1269,10 +1276,11 @@ asmlinkage long sys_setfsuid(uid_t uid)
current->fsuid = uid;
}
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+ notify_task_watchers(WATCH_TASK_UID, 0, current);
security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS);
return old_fsuid;
}
@@ -1296,10 +1304,11 @@ asmlinkage long sys_setfsgid(gid_t gid)
smp_wmb();
}
current->fsgid = gid;
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
+ notify_task_watchers(WATCH_TASK_GID, 0, current);
}
return old_fsgid;
}
asmlinkage long sys_times(struct tms __user * tbuf)
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -40,10 +40,11 @@
#include <linux/compat.h>
#include <linux/pipe_fs_i.h>
#include <linux/audit.h> /* for audit_free() */
#include <linux/resource.h>
#include <linux/blkdev.h>
+#include <linux/task_watchers.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
#include <asm/pgtable.h>
#include <asm/mmu_context.h>
@@ -885,10 +886,11 @@ fastcall NORET_TYPE void do_exit(long co
set_current_state(TASK_UNINTERRUPTIBLE);
schedule();
}
tsk->flags |= PF_EXITING;
+ notify_task_watchers(WATCH_TASK_EXIT, code, tsk);
if (unlikely(in_atomic()))
printk(KERN_INFO "note: %s[%d] exited with preempt_count %d\n",
current->comm, current->pid,
preempt_count());
@@ -916,10 +918,11 @@ fastcall NORET_TYPE void do_exit(long co
audit_free(tsk);
taskstats_exit_send(tsk, tidstats, group_dead, mycpu);
taskstats_exit_free(tidstats);
exit_mm(tsk);
+ notify_task_watchers(WATCH_TASK_FREE, code, tsk);
if (group_dead)
acct_process();
exit_sem(tsk);
__exit_files(tsk);
Index: linux-2.6.19-rc2-mm2/fs/exec.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/fs/exec.c
+++ linux-2.6.19-rc2-mm2/fs/exec.c
@@ -48,10 +48,11 @@
#include <linux/syscalls.h>
#include <linux/rmap.h>
#include <linux/tsacct_kern.h>
#include <linux/cn_proc.h>
#include <linux/audit.h>
+#include <linux/task_watchers.h>
#include <asm/uaccess.h>
#include <asm/mmu_context.h>
#ifdef CONFIG_KMOD
@@ -1083,10 +1084,12 @@ int search_binary_handler(struct linux_b
allow_write_access(bprm->file);
if (bprm->file)
fput(bprm->file);
bprm->file = NULL;
current->did_exec = 1;
+ notify_task_watchers(WATCH_TASK_EXEC, 0,
+ current);
proc_exec_connector(current);
return retval;
}
read_lock(&binfmt_lock);
put_binfmt(fmt);
Index: linux-2.6.19-rc2-mm2/include/linux/task_watchers.h
===================================================================
--- /dev/null
+++ linux-2.6.19-rc2-mm2/include/linux/task_watchers.h
@@ -0,0 +1,31 @@
+#ifndef _TASK_WATCHERS_H
+#define _TASK_WATCHERS_H
+#include <linux/sched.h>
+
+#define WATCH_TASK_INIT 0
+#define WATCH_TASK_CLONE 1
+#define WATCH_TASK_EXEC 2
+#define WATCH_TASK_UID 3
+#define WATCH_TASK_GID 4
+#define WATCH_TASK_EXIT 5
+#define WATCH_TASK_FREE 6
+#define NUM_WATCH_TASK_EVENTS 7
+
+#ifndef MODULE
+typedef int (*task_watcher_fn)(unsigned long, struct task_struct*);
+
+/*
+ * Watch for events occuring within a task and call the supplied function
+ * when (and only when) the given event happens.
+ * Only non-modular kernel code may register functions as task_watchers.
+ */
+#define task_watcher_func(ev, fn) \
+static task_watcher_fn __task_watcher_##ev##_##fn __attribute_used__ \
+ __attribute__ ((__section__ (".task_watchers." #ev))) = fn
+#else
+#error "task_watcher() macro may not be used in modules."
+#endif
+
+extern int notify_task_watchers(unsigned int ev_idx, unsigned long val,
+ struct task_struct *tsk);
+#endif /* _TASK_WATCHERS_H */
Index: linux-2.6.19-rc2-mm2/kernel/task_watchers.c
===================================================================
--- /dev/null
+++ linux-2.6.19-rc2-mm2/kernel/task_watchers.c
@@ -0,0 +1,37 @@
+#include <linux/task_watchers.h>
+
+/* Defined in include/asm-generic/common.lds.h */
+extern const task_watcher_fn __start_task_watchers_init[],
+ __start_task_watchers_clone[], __start_task_watchers_exec[],
+ __start_task_watchers_uid[], __start_task_watchers_gid[],
+ __start_task_watchers_exit[], __start_task_watchers_free[],
+ __stop_task_watchers_free[];
+
+/*
+ * Tables of ptrs to the first watcher func for WATCH_TASK_*
+ */
+static const task_watcher_fn *twtable[] = {
+ __start_task_watchers_init,
+ __start_task_watchers_clone,
+ __start_task_watchers_exec,
+ __start_task_watchers_uid,
+ __start_task_watchers_gid,
+ __start_task_watchers_exit,
+ __start_task_watchers_free,
+ __stop_task_watchers_free,
+};
+
+int notify_task_watchers(unsigned int ev, unsigned long val,
+ struct task_struct *tsk)
+{
+ const task_watcher_fn *tw_call;
+ int ret_err = 0, err;
+
+ /* Call all of the watchers, report the first error */
+ for (tw_call = twtable[ev]; tw_call < twtable[ev + 1]; tw_call++) {
+ err = (*tw_call)(val, tsk);
+ if (unlikely((err < 0) && (ret_err == NOTIFY_OK)))
+ ret_err = err;
+ }
+ return ret_err;
+}
Index: linux-2.6.19-rc2-mm2/kernel/Makefile
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/Makefile
+++ linux-2.6.19-rc2-mm2/kernel/Makefile
@@ -6,11 +6,11 @@ obj-y = sched.o fork.o exec_domain.o
exit.o itimer.o time.o softirq.o resource.o \
sysctl.o capability.o ptrace.o timer.o user.o \
signal.o sys.o kmod.o workqueue.o pid.o \
rcupdate.o extable.o params.o posix-timers.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
- hrtimer.o rwsem.o latency.o nsproxy.o srcu.o
+ hrtimer.o rwsem.o latency.o nsproxy.o srcu.o task_watchers.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += time/
obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
obj-$(CONFIG_LOCKDEP) += lockdep.o
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -46,10 +46,11 @@
#include <linux/tsacct_kern.h>
#include <linux/cn_proc.h>
#include <linux/delayacct.h>
#include <linux/taskstats_kern.h>
#include <linux/random.h>
+#include <linux/task_watchers.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -1045,10 +1046,18 @@ static struct task_struct *copy_process(
do_posix_clock_monotonic_gettime(&p->start_time);
p->security = NULL;
p->io_context = NULL;
p->io_wait = NULL;
p->audit_context = NULL;
+
+ p->tgid = p->pid;
+ if (clone_flags & CLONE_THREAD)
+ p->tgid = current->tgid;
+
+ retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
+ if (retval < 0)
+ goto bad_fork_cleanup_delays_binfmt;
cpuset_fork(p);
#ifdef CONFIG_NUMA
p->mempolicy = mpol_copy(p->mempolicy);
if (IS_ERR(p->mempolicy)) {
retval = PTR_ERR(p->mempolicy);
@@ -1084,14 +1093,10 @@ static struct task_struct *copy_process(
#ifdef CONFIG_DEBUG_MUTEXES
p->blocked_on = NULL; /* not blocked yet */
#endif
- p->tgid = p->pid;
- if (clone_flags & CLONE_THREAD)
- p->tgid = current->tgid;
-
if ((retval = security_task_alloc(p)))
goto bad_fork_cleanup_policy;
if ((retval = audit_alloc(p)))
goto bad_fork_cleanup_security;
/* copy all the process information */
@@ -1248,10 +1253,11 @@ static struct task_struct *copy_process(
}
total_forks++;
spin_unlock(¤t->sighand->siglock);
write_unlock_irq(&tasklist_lock);
+ notify_task_watchers(WATCH_TASK_CLONE, clone_flags, p);
proc_fork_connector(p);
return p;
bad_fork_cleanup_namespaces:
exit_task_namespaces(p);
@@ -1280,10 +1286,11 @@ bad_fork_cleanup_policy:
bad_fork_cleanup_cpuset:
#endif
cpuset_exit(p);
bad_fork_cleanup_delays_binfmt:
delayacct_tsk_free(p);
+ notify_task_watchers(WATCH_TASK_FREE, 0, p);
if (p->binfmt)
module_put(p->binfmt->module);
bad_fork_cleanup_put_domain:
module_put(task_thread_info(p)->exec_domain->module);
bad_fork_cleanup_count:
Index: linux-2.6.19-rc2-mm2/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/asm-generic/vmlinux.lds.h
+++ linux-2.6.19-rc2-mm2/include/asm-generic/vmlinux.lds.h
@@ -42,10 +42,29 @@
VMLINUX_SYMBOL(__start_rio_route_ops) = .; \
*(.rio_route_ops) \
VMLINUX_SYMBOL(__end_rio_route_ops) = .; \
} \
\
+ .task_watchers_table : AT(ADDR(.task_watchers_table) - LOAD_OFFSET) { \
+ *(.task_watchers_table) \
+ VMLINUX_SYMBOL(__start_task_watchers_init) = .; \
+ *(.task_watchers.init) \
+ VMLINUX_SYMBOL(__start_task_watchers_clone) = .; \
+ *(.task_watchers.clone) \
+ VMLINUX_SYMBOL(__start_task_watchers_exec) = .; \
+ *(.task_watchers.exec) \
+ VMLINUX_SYMBOL(__start_task_watchers_uid) = .; \
+ *(.task_watchers.uid) \
+ VMLINUX_SYMBOL(__start_task_watchers_gid) = .; \
+ *(.task_watchers.gid) \
+ VMLINUX_SYMBOL(__start_task_watchers_exit) = .; \
+ *(.task_watchers.exit) \
+ VMLINUX_SYMBOL(__start_task_watchers_free) = .; \
+ *(.task_watchers.free) \
+ VMLINUX_SYMBOL(__stop_task_watchers_free) = .; \
+ } \
+ \
/* Kernel symbol table: Normal symbols */ \
__ksymtab : AT(ADDR(__ksymtab) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab) = .; \
*(__ksymtab) \
VMLINUX_SYMBOL(__stop___ksymtab) = .; \
--
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 1/9] Task Watchers v2: Task watchers v2
2006-11-03 4:22 ` [PATCH 1/9] Task Watchers v2: Task watchers v2 Matt Helsley
@ 2006-11-03 13:22 ` Daniel Walker
2006-11-04 0:43 ` Matt Helsley
0 siblings, 1 reply; 16+ messages in thread
From: Daniel Walker @ 2006-11-03 13:22 UTC (permalink / raw)
To: Matt Helsley
Cc: Linux-Kernel, Jes Sorensen, LSE-Tech, Chandra S Seetharaman,
Christoph Hellwig, Al Viro, Steve Grubb, linux-audit,
Paul Jackson, Andrew Morton
On Thu, 2006-11-02 at 20:22 -0800, Matt Helsley wrote:
> +/*
> + * Watch for events occuring within a task and call the supplied
> function
> + * when (and only when) the given event happens.
> + * Only non-modular kernel code may register functions as
> task_watchers.
> + */
> +#define task_watcher_func(ev, fn) \
> +static task_watcher_fn __task_watcher_##ev##_##fn __attribute_used__
> \
> + __attribute__ ((__section__ (".task_watchers." #ev))) = fn
> +#else
> +#error "task_watcher() macro may not be used in modules."
> +#endif
You should make this TASK_WATCHER_FUNC() or even just TASK_WATCHER(). It
looks a little goofy in the code that uses it.
Looking at it now could you do something like,
static int __task_watcher_init
audit_alloc(unsigned long val, struct task_struct *tsk)
Instead of a macro? Might be a little less invasive.
Daniel
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 1/9] Task Watchers v2: Task watchers v2
2006-11-03 13:22 ` Daniel Walker
@ 2006-11-04 0:43 ` Matt Helsley
2006-11-04 1:13 ` Daniel Walker
0 siblings, 1 reply; 16+ messages in thread
From: Matt Helsley @ 2006-11-04 0:43 UTC (permalink / raw)
To: Daniel Walker
Cc: Linux-Kernel, Jes Sorensen, LSE-Tech, Chandra S Seetharaman,
Christoph Hellwig, Al Viro, Steve Grubb, linux-audit,
Paul Jackson, Andrew Morton
On Fri, 2006-11-03 at 08:22 -0500, Daniel Walker wrote:
> On Thu, 2006-11-02 at 20:22 -0800, Matt Helsley wrote:
> > +/*
> > + * Watch for events occuring within a task and call the supplied
> > function
> > + * when (and only when) the given event happens.
> > + * Only non-modular kernel code may register functions as
> > task_watchers.
> > + */
> > +#define task_watcher_func(ev, fn) \
> > +static task_watcher_fn __task_watcher_##ev##_##fn __attribute_used__
> > \
> > + __attribute__ ((__section__ (".task_watchers." #ev))) = fn
> > +#else
> > +#error "task_watcher() macro may not be used in modules."
> > +#endif
>
> You should make this TASK_WATCHER_FUNC() or even just TASK_WATCHER(). It
> looks a little goofy in the code that uses it.
I can certainly change this. In my defense I didn't capitalize it
because very similar macros in init.h were not capitalized. For example:
#define core_initcall(fn) __define_initcall("1",fn)
#define postcore_initcall(fn) __define_initcall("2",fn)
#define arch_initcall(fn) __define_initcall("3",fn)
#define subsys_initcall(fn) __define_initcall("4",fn)
#define fs_initcall(fn) __define_initcall("5",fn)
#define device_initcall(fn) __define_initcall("6",fn)
#define late_initcall(fn) __define_initcall("7",fn)
setup_param, early_param, module_init, etc. do not use all-caps. And I'm
sure that's not all.
All of these declare variables and assign them attributes and values.
> Looking at it now could you do something like,
>
> static int __task_watcher_init
> audit_alloc(unsigned long val, struct task_struct *tsk)
>
> Instead of a macro? Might be a little less invasive.
I like your suggestion. However, I don't see how such a macro could be
made to replace the current macro.
I need to be able to call every init function during task
initialization. The current macro creates and initializes a function
pointer in an array in the special ELF section. This allows the
notify_task_watchers function to traverse the array and make calls to
the init functions.
I use the name of the function and event to name and intialize the
function pointer. I don't see any way to get the name of the function
without taking a parameter. This also means it would have to be
initialized after the function was declared or defined.
I considered placing the function code in the ELF section. However I
don't know of any gcc or linker functions that would allow me to iterate
over all of the functions in an ELF section and call them from fork,
exec, exit, etc. I've even looked through the docs and googled.
I considered doing symbol lookups. Part of the problem is knowing the
names I need to look up. Furthermore, I think doing symbol lookups for
each call would be alot slower. I could create a dynamically-allocated
array and put the lookup results there. However that's more code and
more memory...
However, your suggestion could put all of the functions near each
other. That locality could improve performance. So I'll try adding
__task_watcher_<event> macros but I can't see a way to make them work as
you suggested.
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 1/9] Task Watchers v2: Task watchers v2
2006-11-04 0:43 ` Matt Helsley
@ 2006-11-04 1:13 ` Daniel Walker
2006-11-05 0:12 ` Matt Helsley
0 siblings, 1 reply; 16+ messages in thread
From: Daniel Walker @ 2006-11-04 1:13 UTC (permalink / raw)
To: Matt Helsley
Cc: Linux-Kernel, Jes Sorensen, LSE-Tech, Chandra S Seetharaman,
Christoph Hellwig, Al Viro, Steve Grubb, linux-audit,
Paul Jackson, Andrew Morton
On Fri, 2006-11-03 at 16:43 -0800, Matt Helsley wrote:
> I can certainly change this. In my defense I didn't capitalize it
> because very similar macros in init.h were not capitalized. For example:
>
> #define core_initcall(fn) __define_initcall("1",fn)
> #define postcore_initcall(fn) __define_initcall("2",fn)
> #define arch_initcall(fn) __define_initcall("3",fn)
> #define subsys_initcall(fn) __define_initcall("4",fn)
> #define fs_initcall(fn) __define_initcall("5",fn)
> #define device_initcall(fn) __define_initcall("6",fn)
> #define late_initcall(fn) __define_initcall("7",fn)
>
> setup_param, early_param, module_init, etc. do not use all-caps. And I'm
> sure that's not all.
True .. It's not mandatory. The reason that I mentioned it is because it
looked like a function was being called outside a function block, which
looks odd to me. I think I overlook the initcall functions because I see
them so often I know what they are.
> All of these declare variables and assign them attributes and values.
>
> > Looking at it now could you do something like,
> >
> > static int __task_watcher_init
> > audit_alloc(unsigned long val, struct task_struct *tsk)
> >
> > Instead of a macro? Might be a little less invasive.
>
> I like your suggestion. However, I don't see how such a macro could be
> made to replace the current macro.
>
> I need to be able to call every init function during task
> initialization. The current macro creates and initializes a function
> pointer in an array in the special ELF section. This allows the
> notify_task_watchers function to traverse the array and make calls to
> the init functions.
You get an "A" for research. I didn't notice you actually declare a
variable inside the macro. I thought it was only setting a section
attribute. You right, I don't see how you could call the functions in
the section without the variable declared. ( besides that's exactly how
the initcalls work. )
Daniel
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 1/9] Task Watchers v2: Task watchers v2
2006-11-04 1:13 ` Daniel Walker
@ 2006-11-05 0:12 ` Matt Helsley
0 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-05 0:12 UTC (permalink / raw)
To: dwalker
Cc: Linux-Kernel, Jes Sorensen, LSE-Tech, Chandra S Seetharaman,
Christoph Hellwig, Al Viro, Steve Grubb, linux-audit,
Paul Jackson, Andrew Morton
On Fri, 2006-11-03 at 17:13 -0800, Daniel Walker wrote:
> On Fri, 2006-11-03 at 16:43 -0800, Matt Helsley wrote:
>
> > I can certainly change this. In my defense I didn't capitalize it
> > because very similar macros in init.h were not capitalized. For example:
> >
> > #define core_initcall(fn) __define_initcall("1",fn)
> > #define postcore_initcall(fn) __define_initcall("2",fn)
> > #define arch_initcall(fn) __define_initcall("3",fn)
> > #define subsys_initcall(fn) __define_initcall("4",fn)
> > #define fs_initcall(fn) __define_initcall("5",fn)
> > #define device_initcall(fn) __define_initcall("6",fn)
> > #define late_initcall(fn) __define_initcall("7",fn)
> >
> > setup_param, early_param, module_init, etc. do not use all-caps. And I'm
> > sure that's not all.
>
> True .. It's not mandatory. The reason that I mentioned it is because it
> looked like a function was being called outside a function block, which
> looks odd to me. I think I overlook the initcall functions because I see
> them so often I know what they are.
This is a good point -- it does look odd. I'm considering:
DEFINE_TASK_INITCALL(audit_alloc);
With others like:
DEFINE_TASK_EXITCALL()
DEFINE_TASK_CLONECALL()
etc.
That resembles other macros which create variables. Though I'm not sure
this patten is appropriate because these variables should not be used by
name.
Seems that no matter what something about it is going to be unusual. :)
> > All of these declare variables and assign them attributes and values.
> >
> > > Looking at it now could you do something like,
> > >
> > > static int __task_watcher_init
> > > audit_alloc(unsigned long val, struct task_struct *tsk)
> > >
> > > Instead of a macro? Might be a little less invasive.
> >
> > I like your suggestion. However, I don't see how such a macro could be
> > made to replace the current macro.
> >
> > I need to be able to call every init function during task
> > initialization. The current macro creates and initializes a function
> > pointer in an array in the special ELF section. This allows the
> > notify_task_watchers function to traverse the array and make calls to
> > the init functions.
>
>
> You get an "A" for research. I didn't notice you actually declare a
Thanks!
<snip>
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/9] Task Watchers v2: Register audit task watcher
2006-11-03 4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
2006-11-03 4:22 ` [PATCH 1/9] Task Watchers v2: Task watchers v2 Matt Helsley
@ 2006-11-03 4:22 ` Matt Helsley
2006-11-03 4:23 ` [PATCH 3/9] Task Watchers v2: Register semundo " Matt Helsley
` (7 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03 4:22 UTC (permalink / raw)
To: Linux-Kernel
Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton
[-- Attachment #1: task-watchers-register-audit --]
[-- Type: text/plain, Size: 8662 bytes --]
Change audit to register a task watcher function rather than modify
the copy_process() and do_exit() paths directly.
Removes an unlikely() hint from kernel/exit.c:
if (unlikely(tsk->audit_context))
audit_free(tsk);
This use of unlikely() is an artifact of audit_free()'s former invocation from
__put_task_struct() (commit: fa84cb935d4ec601528f5e2f0d5d31e7876a5044).
Clearly in the __put_task_struct() path it would be called much more frequently
than do_exit() and hence the use of unlikely() there was justified. However, in
the new location the hint most likely offers no measurable performance impact.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: linux-audit@redhat.com
---
include/linux/audit.h | 4 ----
kernel/auditsc.c | 10 +++++++---
kernel/exit.c | 3 ---
kernel/fork.c | 7 +------
4 files changed, 8 insertions(+), 16 deletions(-)
Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel
Clone Number of Children Cloned
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 18053.2 18361.2 18474.4 18462 18594.7 18557.4
Dev 315.856 316.881 318.787 312.425 304.193 291.819
Err (%) 1.74958 1.72582 1.72557 1.69226 1.63592 1.57252
Fork Number of Children Forked
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 18008 18186 18400.6 18433.1 18481.1 18502.8
Dev 305.299 309.41 315.108 298.683 310.504 338.734
Err (%) 1.69536 1.70136 1.71248 1.62036 1.68011 1.83071
Kernbench:
Elapsed: 124.234s User: 439.7s System: 46.503s CPU: 390.8%
439.67user 46.48system 2:04.11elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.46system 2:03.71elapsed 393%CPU (0avgtext+0avgdata 0maxresident)k
439.62user 46.47system 2:04.54elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.68user 46.64system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.62user 46.46system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.50system 2:04.35elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.49system 2:04.39elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.66user 46.61system 2:04.17elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.82user 46.46system 2:04.57elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.46system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
Index: linux-2.6.19-rc2-mm2/kernel/auditsc.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/auditsc.c
+++ linux-2.6.19-rc2-mm2/kernel/auditsc.c
@@ -63,10 +63,11 @@
#include <linux/list.h>
#include <linux/tty.h>
#include <linux/selinux.h>
#include <linux/binfmts.h>
#include <linux/syscalls.h>
+#include <linux/task_watchers.h>
#include "audit.h"
extern struct list_head audit_filter_list[];
@@ -677,11 +678,11 @@ static inline struct audit_context *audi
* Filter on the task information and allocate a per-task audit context
* if necessary. Doing so turns on system call auditing for the
* specified task. This is called from copy_process, so no lock is
* needed.
*/
-int audit_alloc(struct task_struct *tsk)
+static int audit_alloc(unsigned long val, struct task_struct *tsk)
{
struct audit_context *context;
enum audit_state state;
if (likely(!audit_enabled))
@@ -703,10 +704,11 @@ int audit_alloc(struct task_struct *tsk)
tsk->audit_context = context;
set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
return 0;
}
+task_watcher_func(init, audit_alloc);
static inline void audit_free_context(struct audit_context *context)
{
struct audit_context *previous;
int count = 0;
@@ -1035,28 +1037,30 @@ static void audit_log_exit(struct audit_
* audit_free - free a per-task audit context
* @tsk: task whose audit context block to free
*
* Called from copy_process and do_exit
*/
-void audit_free(struct task_struct *tsk)
+static int audit_free(unsigned long val, struct task_struct *tsk)
{
struct audit_context *context;
context = audit_get_context(tsk, 0, 0);
if (likely(!context))
- return;
+ return 0;
/* Check for system calls that do not go through the exit
* function (e.g., exit_group), then free context block.
* We use GFP_ATOMIC here because we might be doing this
* in the context of the idle thread */
/* that can happen only if we are called from do_exit() */
if (context->in_syscall && context->auditable)
audit_log_exit(context, tsk);
audit_free_context(context);
+ return 0;
}
+task_watcher_func(free, audit_free);
/**
* audit_syscall_entry - fill in an audit record at syscall entry
* @tsk: task being audited
* @arch: architecture type
Index: linux-2.6.19-rc2-mm2/include/linux/audit.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/audit.h
+++ linux-2.6.19-rc2-mm2/include/linux/audit.h
@@ -332,12 +332,10 @@ struct mqstat;
extern int __init audit_register_class(int class, unsigned *list);
extern int audit_classify_syscall(int abi, unsigned syscall);
#ifdef CONFIG_AUDITSYSCALL
/* These are defined in auditsc.c */
/* Public API */
-extern int audit_alloc(struct task_struct *task);
-extern void audit_free(struct task_struct *task);
extern void audit_syscall_entry(int arch,
int major, unsigned long a0, unsigned long a1,
unsigned long a2, unsigned long a3);
extern void audit_syscall_exit(int failed, long return_code);
extern void __audit_getname(const char *name);
@@ -432,12 +430,10 @@ static inline int audit_mq_getsetattr(mq
return __audit_mq_getsetattr(mqdes, mqstat);
return 0;
}
extern int audit_n_rules;
#else
-#define audit_alloc(t) ({ 0; })
-#define audit_free(t) do { ; } while (0)
#define audit_syscall_entry(ta,a,b,c,d,e) do { ; } while (0)
#define audit_syscall_exit(f,r) do { ; } while (0)
#define audit_dummy_context() 1
#define audit_getname(n) do { ; } while (0)
#define audit_putname(n) do { ; } while (0)
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -37,11 +37,10 @@
#include <linux/jiffies.h>
#include <linux/futex.h>
#include <linux/rcupdate.h>
#include <linux/ptrace.h>
#include <linux/mount.h>
-#include <linux/audit.h>
#include <linux/profile.h>
#include <linux/rmap.h>
#include <linux/acct.h>
#include <linux/tsacct_kern.h>
#include <linux/cn_proc.h>
@@ -1095,15 +1094,13 @@ static struct task_struct *copy_process(
p->blocked_on = NULL; /* not blocked yet */
#endif
if ((retval = security_task_alloc(p)))
goto bad_fork_cleanup_policy;
- if ((retval = audit_alloc(p)))
- goto bad_fork_cleanup_security;
/* copy all the process information */
if ((retval = copy_semundo(clone_flags, p)))
- goto bad_fork_cleanup_audit;
+ goto bad_fork_cleanup_security;
if ((retval = copy_files(clone_flags, p)))
goto bad_fork_cleanup_semundo;
if ((retval = copy_fs(clone_flags, p)))
goto bad_fork_cleanup_files;
if ((retval = copy_sighand(clone_flags, p)))
@@ -1274,12 +1271,10 @@ bad_fork_cleanup_fs:
exit_fs(p); /* blocking */
bad_fork_cleanup_files:
exit_files(p); /* blocking */
bad_fork_cleanup_semundo:
exit_sem(p);
-bad_fork_cleanup_audit:
- audit_free(p);
bad_fork_cleanup_security:
security_task_free(p);
bad_fork_cleanup_policy:
#ifdef CONFIG_NUMA
mpol_free(p->mempolicy);
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -37,11 +37,10 @@
#include <linux/cn_proc.h>
#include <linux/mutex.h>
#include <linux/futex.h>
#include <linux/compat.h>
#include <linux/pipe_fs_i.h>
-#include <linux/audit.h> /* for audit_free() */
#include <linux/resource.h>
#include <linux/blkdev.h>
#include <linux/task_watchers.h>
#include <asm/uaccess.h>
@@ -912,12 +911,10 @@ fastcall NORET_TYPE void do_exit(long co
exit_robust_list(tsk);
#if defined(CONFIG_FUTEX) && defined(CONFIG_COMPAT)
if (unlikely(tsk->compat_robust_list))
compat_exit_robust_list(tsk);
#endif
- if (unlikely(tsk->audit_context))
- audit_free(tsk);
taskstats_exit_send(tsk, tidstats, group_dead, mycpu);
taskstats_exit_free(tidstats);
exit_mm(tsk);
notify_task_watchers(WATCH_TASK_FREE, code, tsk);
--
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 3/9] Task Watchers v2: Register semundo task watcher
2006-11-03 4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
2006-11-03 4:22 ` [PATCH 1/9] Task Watchers v2: Task watchers v2 Matt Helsley
2006-11-03 4:22 ` [PATCH 2/9] Task Watchers v2: Register audit task watcher Matt Helsley
@ 2006-11-03 4:23 ` Matt Helsley
2006-11-03 4:23 ` [PATCH 4/9] Task Watchers v2: Register cpuset " Matt Helsley
` (6 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03 4:23 UTC (permalink / raw)
To: Linux-Kernel
Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton
[-- Attachment #1: task-watchers-register-semundo --]
[-- Type: text/plain, Size: 7115 bytes --]
Make the semaphore undo code use a task watcher instead of hooking into
copy_process() and do_exit() directly.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
include/linux/sem.h | 17 -----------------
ipc/sem.c | 12 ++++++++----
kernel/exit.c | 3 ---
kernel/fork.c | 6 +-----
4 files changed, 9 insertions(+), 29 deletions(-)
Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel
Clone Number of Children Cloned
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17960.5 18169.3 18408.2 18479.9 18515.6 18465.4
Dev 305.381 314.209 292.395 284.992 299.331 295.311
Err (%) 1.70029 1.72934 1.5884 1.54217 1.61664 1.59927
Fork Number of Children Forked
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 18050.2 18141.4 18316.2 18386.2 18441.9 18476.2
Dev 295.68 312.922 296.962 298.81 300.985 294.046
Err (%) 1.63809 1.72491 1.62131 1.62519 1.63207 1.59149
Kernbench:
Elapsed: 124.272s User: 439.643s System: 46.32s CPU: 390.5%
439.64user 46.25system 2:04.46elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.70user 46.27system 2:04.04elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.64user 46.31system 2:04.18elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.49user 46.27system 2:04.41elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.55user 46.47system 2:04.32elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.29system 2:04.63elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.61user 46.31system 2:04.09elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.68user 46.31system 2:04.02elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.49system 2:04.59elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.59user 46.23system 2:03.98elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
Index: linux-2.6.19-rc2-mm2/ipc/sem.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/ipc/sem.c
+++ linux-2.6.19-rc2-mm2/ipc/sem.c
@@ -81,10 +81,11 @@
#include <linux/audit.h>
#include <linux/capability.h>
#include <linux/seq_file.h>
#include <linux/mutex.h>
#include <linux/nsproxy.h>
+#include <linux/task_watchers.h>
#include <asm/uaccess.h>
#include "util.h"
#define sem_ids(ns) (*((ns)->ids[IPC_SEM_IDS]))
@@ -1288,11 +1289,11 @@ asmlinkage long sys_semop (int semid, st
* See the notes above unlock_semundo() regarding the spin_lock_init()
* in this code. Initialize the undo_list->lock here instead of get_undo_list()
* because of the reasoning in the comment above unlock_semundo.
*/
-int copy_semundo(unsigned long clone_flags, struct task_struct *tsk)
+static int copy_semundo(unsigned long clone_flags, struct task_struct *tsk)
{
struct sem_undo_list *undo_list;
int error;
if (clone_flags & CLONE_SYSVSEM) {
@@ -1304,10 +1305,11 @@ int copy_semundo(unsigned long clone_fla
} else
tsk->sysvsem.undo_list = NULL;
return 0;
}
+task_watcher_func(init, copy_semundo);
/*
* add semadj values to semaphores, free undo structures.
* undo structures are not freed when semaphore arrays are destroyed
* so some of them may be out of date.
@@ -1317,22 +1319,22 @@ int copy_semundo(unsigned long clone_fla
* should we queue up and wait until we can do so legally?
* The original implementation attempted to do this (queue and wait).
* The current implementation does not do so. The POSIX standard
* and SVID should be consulted to determine what behavior is mandated.
*/
-void exit_sem(struct task_struct *tsk)
+static int exit_sem(unsigned long ignored, struct task_struct *tsk)
{
struct sem_undo_list *undo_list;
struct sem_undo *u, **up;
struct ipc_namespace *ns;
undo_list = tsk->sysvsem.undo_list;
if (!undo_list)
- return;
+ return 0;
if (!atomic_dec_and_test(&undo_list->refcnt))
- return;
+ return 0;
ns = tsk->nsproxy->ipc_ns;
/* There's no need to hold the semundo list lock, as current
* is the last task exiting for this undo list.
*/
@@ -1395,11 +1397,13 @@ found:
update_queue(sma);
next_entry:
sem_unlock(sma);
}
kfree(undo_list);
+ return 0;
}
+task_watcher_func(free, exit_sem);
#ifdef CONFIG_PROC_FS
static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
{
struct sem_array *sma = it;
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -46,12 +46,10 @@
#include <asm/uaccess.h>
#include <asm/unistd.h>
#include <asm/pgtable.h>
#include <asm/mmu_context.h>
-extern void sem_exit (void);
-
static void exit_mm(struct task_struct * tsk);
static void __unhash_process(struct task_struct *p)
{
nr_threads--;
@@ -919,11 +917,10 @@ fastcall NORET_TYPE void do_exit(long co
exit_mm(tsk);
notify_task_watchers(WATCH_TASK_FREE, code, tsk);
if (group_dead)
acct_process();
- exit_sem(tsk);
__exit_files(tsk);
__exit_fs(tsk);
exit_thread();
cpuset_exit(tsk);
exit_keys(tsk);
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -1095,14 +1095,12 @@ static struct task_struct *copy_process(
#endif
if ((retval = security_task_alloc(p)))
goto bad_fork_cleanup_policy;
/* copy all the process information */
- if ((retval = copy_semundo(clone_flags, p)))
- goto bad_fork_cleanup_security;
if ((retval = copy_files(clone_flags, p)))
- goto bad_fork_cleanup_semundo;
+ goto bad_fork_cleanup_security;
if ((retval = copy_fs(clone_flags, p)))
goto bad_fork_cleanup_files;
if ((retval = copy_sighand(clone_flags, p)))
goto bad_fork_cleanup_fs;
if ((retval = copy_signal(clone_flags, p)))
@@ -1269,12 +1267,10 @@ bad_fork_cleanup_sighand:
__cleanup_sighand(p->sighand);
bad_fork_cleanup_fs:
exit_fs(p); /* blocking */
bad_fork_cleanup_files:
exit_files(p); /* blocking */
-bad_fork_cleanup_semundo:
- exit_sem(p);
bad_fork_cleanup_security:
security_task_free(p);
bad_fork_cleanup_policy:
#ifdef CONFIG_NUMA
mpol_free(p->mempolicy);
Index: linux-2.6.19-rc2-mm2/include/linux/sem.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/sem.h
+++ linux-2.6.19-rc2-mm2/include/linux/sem.h
@@ -136,25 +136,8 @@ struct sem_undo_list {
struct sysv_sem {
struct sem_undo_list *undo_list;
};
-#ifdef CONFIG_SYSVIPC
-
-extern int copy_semundo(unsigned long clone_flags, struct task_struct *tsk);
-extern void exit_sem(struct task_struct *tsk);
-
-#else
-static inline int copy_semundo(unsigned long clone_flags, struct task_struct *tsk)
-{
- return 0;
-}
-
-static inline void exit_sem(struct task_struct *tsk)
-{
- return;
-}
-#endif
-
#endif /* __KERNEL__ */
#endif /* _LINUX_SEM_H */
--
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 4/9] Task Watchers v2: Register cpuset task watcher
2006-11-03 4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
` (2 preceding siblings ...)
2006-11-03 4:23 ` [PATCH 3/9] Task Watchers v2: Register semundo " Matt Helsley
@ 2006-11-03 4:23 ` Matt Helsley
2006-11-03 4:23 ` [PATCH 5/9] Task Watchers v2: Register NUMA mempolicy " Matt Helsley
` (5 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03 4:23 UTC (permalink / raw)
To: Linux-Kernel
Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton
[-- Attachment #1: task-watchers-register-cpusets --]
[-- Type: text/plain, Size: 7461 bytes --]
Register a task watcher for cpusets instead of hooking into
copy_process() and do_exit() directly.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
---
include/linux/cpuset.h | 4 ----
kernel/cpuset.c | 7 +++++--
kernel/exit.c | 2 --
kernel/fork.c | 6 +-----
4 files changed, 6 insertions(+), 13 deletions(-)
Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel
Clone Number of Children Cloned
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 18023.8 18243.8 18485.1 18422.9 18469.4 18505.1
Dev 317.163 297.266 298.965 288.518 294.607 290.491
Err (%) 1.75969 1.6294 1.61733 1.56608 1.59511 1.56979
Fork Number of Children Forked
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17950.9 18149.7 18283 18409.3 18414.1 18450.3
Dev 310.206 300.925 297.458 290.673 298.75 301.009
Err (%) 1.72808 1.65802 1.62696 1.57895 1.6224 1.63146
Kernbench:
Elapsed: 124.248s User: 439.83s System: 46.258s CPU: 390.7%
439.80user 46.26system 2:04.53elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.20system 2:04.29elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.80user 46.42system 2:04.37elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.88user 46.16system 2:04.36elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.21system 2:03.72elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
439.93user 46.21system 2:03.90elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
439.88user 46.25system 2:04.67elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.38system 2:04.31elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.90user 46.25system 2:04.09elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.24system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -28,11 +28,10 @@
#include <linux/mman.h>
#include <linux/fs.h>
#include <linux/nsproxy.h>
#include <linux/capability.h>
#include <linux/cpu.h>
-#include <linux/cpuset.h>
#include <linux/security.h>
#include <linux/swap.h>
#include <linux/syscalls.h>
#include <linux/jiffies.h>
#include <linux/futex.h>
@@ -1053,17 +1052,16 @@ static struct task_struct *copy_process(
p->tgid = current->tgid;
retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
if (retval < 0)
goto bad_fork_cleanup_delays_binfmt;
- cpuset_fork(p);
#ifdef CONFIG_NUMA
p->mempolicy = mpol_copy(p->mempolicy);
if (IS_ERR(p->mempolicy)) {
retval = PTR_ERR(p->mempolicy);
p->mempolicy = NULL;
- goto bad_fork_cleanup_cpuset;
+ goto bad_fork_cleanup_delays_binfmt;
}
mpol_fix_fork_child_flag(p);
#endif
#ifdef CONFIG_TRACE_IRQFLAGS
p->irq_events = 0;
@@ -1272,13 +1270,11 @@ bad_fork_cleanup_files:
bad_fork_cleanup_security:
security_task_free(p);
bad_fork_cleanup_policy:
#ifdef CONFIG_NUMA
mpol_free(p->mempolicy);
-bad_fork_cleanup_cpuset:
#endif
- cpuset_exit(p);
bad_fork_cleanup_delays_binfmt:
delayacct_tsk_free(p);
notify_task_watchers(WATCH_TASK_FREE, 0, p);
if (p->binfmt)
module_put(p->binfmt->module);
Index: linux-2.6.19-rc2-mm2/kernel/cpuset.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/cpuset.c
+++ linux-2.6.19-rc2-mm2/kernel/cpuset.c
@@ -47,10 +47,11 @@
#include <linux/stat.h>
#include <linux/string.h>
#include <linux/time.h>
#include <linux/backing-dev.h>
#include <linux/sort.h>
+#include <linux/task_watchers.h>
#include <asm/uaccess.h>
#include <asm/atomic.h>
#include <linux/mutex.h>
@@ -2172,17 +2173,18 @@ void __init cpuset_init_smp(void)
*
* At the point that cpuset_fork() is called, 'current' is the parent
* task, and the passed argument 'child' points to the child task.
**/
-void cpuset_fork(struct task_struct *child)
+static void cpuset_fork(unsigned long clone_flags, struct task_struct *child)
{
task_lock(current);
child->cpuset = current->cpuset;
atomic_inc(&child->cpuset->count);
task_unlock(current);
}
+task_watcher_func(init, cpuset_fork);
/**
* cpuset_exit - detach cpuset from exiting task
* @tsk: pointer to task_struct of exiting process
*
@@ -2239,11 +2241,11 @@ void cpuset_fork(struct task_struct *chi
* to NULL here, and check in cpuset_update_task_memory_state()
* for a NULL pointer. This hack avoids that NULL check, for no
* cost (other than this way too long comment ;).
**/
-void cpuset_exit(struct task_struct *tsk)
+static void cpuset_exit(unsigned long exit_code, struct task_struct *tsk)
{
struct cpuset *cs;
cs = tsk->cpuset;
tsk->cpuset = &top_cpuset; /* the_top_cpuset_hack - see above */
@@ -2258,10 +2260,11 @@ void cpuset_exit(struct task_struct *tsk
cpuset_release_agent(pathbuf);
} else {
atomic_dec(&cs->count);
}
}
+task_watcher_func(free, cpuset_exit);
/**
* cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset.
* @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
*
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -28,11 +28,10 @@
#include <linux/mount.h>
#include <linux/proc_fs.h>
#include <linux/mempolicy.h>
#include <linux/taskstats_kern.h>
#include <linux/delayacct.h>
-#include <linux/cpuset.h>
#include <linux/syscalls.h>
#include <linux/signal.h>
#include <linux/posix-timers.h>
#include <linux/cn_proc.h>
#include <linux/mutex.h>
@@ -920,11 +919,10 @@ fastcall NORET_TYPE void do_exit(long co
if (group_dead)
acct_process();
__exit_files(tsk);
__exit_fs(tsk);
exit_thread();
- cpuset_exit(tsk);
exit_keys(tsk);
if (group_dead && tsk->signal->leader)
disassociate_ctty(1);
Index: linux-2.6.19-rc2-mm2/include/linux/cpuset.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/cpuset.h
+++ linux-2.6.19-rc2-mm2/include/linux/cpuset.h
@@ -17,12 +17,10 @@
extern int number_of_cpusets; /* How many cpusets are defined in system? */
extern int cpuset_init_early(void);
extern int cpuset_init(void);
extern void cpuset_init_smp(void);
-extern void cpuset_fork(struct task_struct *p);
-extern void cpuset_exit(struct task_struct *p);
extern cpumask_t cpuset_cpus_allowed(struct task_struct *p);
extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
#define cpuset_current_mems_allowed (current->mems_allowed)
void cpuset_init_current_mems_allowed(void);
void cpuset_update_task_memory_state(void);
@@ -69,12 +67,10 @@ extern void cpuset_track_online_nodes(vo
#else /* !CONFIG_CPUSETS */
static inline int cpuset_init_early(void) { return 0; }
static inline int cpuset_init(void) { return 0; }
static inline void cpuset_init_smp(void) {}
-static inline void cpuset_fork(struct task_struct *p) {}
-static inline void cpuset_exit(struct task_struct *p) {}
static inline cpumask_t cpuset_cpus_allowed(struct task_struct *p)
{
return cpu_possible_map;
}
--
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 5/9] Task Watchers v2: Register NUMA mempolicy task watcher
2006-11-03 4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
` (3 preceding siblings ...)
2006-11-03 4:23 ` [PATCH 4/9] Task Watchers v2: Register cpuset " Matt Helsley
@ 2006-11-03 4:23 ` Matt Helsley
2006-11-03 4:23 ` [PATCH 6/9] Task Watchers v2: Register IRQ flag tracing " Matt Helsley
` (4 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03 4:23 UTC (permalink / raw)
To: Linux-Kernel
Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton
[-- Attachment #1: task-watchers-register-numa-mempolicy --]
[-- Type: text/plain, Size: 5458 bytes --]
Register a NUMA mempolicy task watcher instead of hooking into
copy_process() and do_exit() directly.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
kernel/exit.c | 4 ----
kernel/fork.c | 15 +--------------
mm/mempolicy.c | 24 ++++++++++++++++++++++++
3 files changed, 25 insertions(+), 18 deletions(-)
Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel
Clone Number of Children Cloned
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17836.3 18085.2 18220.4 18225 18319 18339
Dev 302.801 314.617 303.079 293.46 287.267 294.819
Err (%) 1.69767 1.73963 1.6634 1.6102 1.56814 1.60761
Fork Number of Children Forked
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17896.2 17990 18100.6 18242.3 18244 18346.9
Dev 301.64 285.698 295.646 304.361 299.472 287.153
Err (%) 1.6855 1.58809 1.63335 1.66844 1.64148 1.56513
Kernbench:
Elapsed: 124.532s User: 439.732s System: 46.497s CPU: 389.9%
439.71user 46.48system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.42system 2:05.10elapsed 388%CPU (0avgtext+0avgdata 0maxresident)k
439.74user 46.44system 2:04.60elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.75user 46.64system 2:04.74elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.61user 46.45system 2:05.36elapsed 387%CPU (0avgtext+0avgdata 0maxresident)k
439.60user 46.43system 2:04.33elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.47system 2:04.34elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.87user 46.45system 2:04.10elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.71system 2:04.58elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.48system 2:03.93elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
Index: linux-2.6.19-rc2-mm2/mm/mempolicy.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/mm/mempolicy.c
+++ linux-2.6.19-rc2-mm2/mm/mempolicy.c
@@ -87,10 +87,11 @@
#include <linux/seq_file.h>
#include <linux/proc_fs.h>
#include <linux/migrate.h>
#include <linux/rmap.h>
#include <linux/security.h>
+#include <linux/task_watchers.h>
#include <asm/tlbflush.h>
#include <asm/uaccess.h>
/* Internal flags */
@@ -1333,10 +1334,33 @@ struct mempolicy *__mpol_copy(struct mem
}
}
return new;
}
+static int init_task_mempolicy(unsigned long clone_flags,
+ struct task_struct *tsk)
+{
+ tsk->mempolicy = mpol_copy(tsk->mempolicy);
+ if (IS_ERR(tsk->mempolicy)) {
+ int retval;
+
+ retval = PTR_ERR(tsk->mempolicy);
+ tsk->mempolicy = NULL;
+ return retval;
+ }
+ mpol_fix_fork_child_flag(tsk);
+ return 0;
+}
+task_watcher_func(init, init_task_mempolicy);
+
+static int free_task_mempolicy(unsigned int ignored, struct task_struct *tsk)
+{
+ mpol_free(tsk);
+ tsk->mempolicy = NULL;
+}
+task_watcher_func(free, free_task_mempolicy);
+
/* Slow path of a mempolicy comparison */
int __mpol_equal(struct mempolicy *a, struct mempolicy *b)
{
if (!a || !b)
return 0;
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -1052,19 +1052,10 @@ static struct task_struct *copy_process(
p->tgid = current->tgid;
retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
if (retval < 0)
goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_NUMA
- p->mempolicy = mpol_copy(p->mempolicy);
- if (IS_ERR(p->mempolicy)) {
- retval = PTR_ERR(p->mempolicy);
- p->mempolicy = NULL;
- goto bad_fork_cleanup_delays_binfmt;
- }
- mpol_fix_fork_child_flag(p);
-#endif
#ifdef CONFIG_TRACE_IRQFLAGS
p->irq_events = 0;
#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
p->hardirqs_enabled = 1;
#else
@@ -1091,11 +1082,11 @@ static struct task_struct *copy_process(
#ifdef CONFIG_DEBUG_MUTEXES
p->blocked_on = NULL; /* not blocked yet */
#endif
if ((retval = security_task_alloc(p)))
- goto bad_fork_cleanup_policy;
+ goto bad_fork_cleanup_delays_binfmt;
/* copy all the process information */
if ((retval = copy_files(clone_flags, p)))
goto bad_fork_cleanup_security;
if ((retval = copy_fs(clone_flags, p)))
goto bad_fork_cleanup_files;
@@ -1267,14 +1258,10 @@ bad_fork_cleanup_fs:
exit_fs(p); /* blocking */
bad_fork_cleanup_files:
exit_files(p); /* blocking */
bad_fork_cleanup_security:
security_task_free(p);
-bad_fork_cleanup_policy:
-#ifdef CONFIG_NUMA
- mpol_free(p->mempolicy);
-#endif
bad_fork_cleanup_delays_binfmt:
delayacct_tsk_free(p);
notify_task_watchers(WATCH_TASK_FREE, 0, p);
if (p->binfmt)
module_put(p->binfmt->module);
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -932,14 +932,10 @@ fastcall NORET_TYPE void do_exit(long co
tsk->exit_code = code;
proc_exit_connector(tsk);
exit_notify(tsk);
exit_task_namespaces(tsk);
-#ifdef CONFIG_NUMA
- mpol_free(tsk->mempolicy);
- tsk->mempolicy = NULL;
-#endif
/*
* This must happen late, after the PID is not
* hashed anymore:
*/
if (unlikely(!list_empty(&tsk->pi_state_list)))
--
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 6/9] Task Watchers v2: Register IRQ flag tracing task watcher
2006-11-03 4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
` (4 preceding siblings ...)
2006-11-03 4:23 ` [PATCH 5/9] Task Watchers v2: Register NUMA mempolicy " Matt Helsley
@ 2006-11-03 4:23 ` Matt Helsley
2006-11-03 4:23 ` [PATCH 7/9] Task Watchers v2: Register lockdep " Matt Helsley
` (3 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03 4:23 UTC (permalink / raw)
To: Linux-Kernel
Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton
[-- Attachment #1: task-watchers-register-trace-irqflags --]
[-- Type: text/plain, Size: 4284 bytes --]
Register an irq-flag-tracing task watcher instead of hooking into
copy_process().
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
kernel/fork.c | 19 -------------------
kernel/irq/handle.c | 24 ++++++++++++++++++++++++
2 files changed, 24 insertions(+), 19 deletions(-)
Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel
Clone Number of Children Cloned
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17826.5 18077.4 18160.1 18263.6 18343 18350.8
Dev 305.841 306.331 283.323 284.761 292.732 292.882
Err (%) 1.71565 1.69455 1.56014 1.55917 1.59588 1.59602
Fork Number of Children Forked
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17813.5 18062.4 18140.5 18246.7 18237.8 18275.2
Dev 305.816 294.914 294.779 294.727 323.996 300.176
Err (%) 1.71677 1.63275 1.62498 1.61523 1.77651 1.64253
Kernbench:
Elapsed: 124.4s User: 439.787s System: 46.485s CPU: 390.3%
439.70user 46.43system 2:04.64elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.92user 46.38system 2:04.47elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.62system 2:04.44elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.83user 46.46system 2:04.29elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.73user 46.47system 2:04.12elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.83user 46.49system 2:04.10elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.42system 2:04.41elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.70user 46.64system 2:04.30elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.47system 2:04.76elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.82user 46.47system 2:04.47elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -1052,29 +1052,10 @@ static struct task_struct *copy_process(
p->tgid = current->tgid;
retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
if (retval < 0)
goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_TRACE_IRQFLAGS
- p->irq_events = 0;
-#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
- p->hardirqs_enabled = 1;
-#else
- p->hardirqs_enabled = 0;
-#endif
- p->hardirq_enable_ip = 0;
- p->hardirq_enable_event = 0;
- p->hardirq_disable_ip = _THIS_IP_;
- p->hardirq_disable_event = 0;
- p->softirqs_enabled = 1;
- p->softirq_enable_ip = _THIS_IP_;
- p->softirq_enable_event = 0;
- p->softirq_disable_ip = 0;
- p->softirq_disable_event = 0;
- p->hardirq_context = 0;
- p->softirq_context = 0;
-#endif
#ifdef CONFIG_LOCKDEP
p->lockdep_depth = 0; /* no locks held yet */
p->curr_chain_key = 0;
p->lockdep_recursion = 0;
#endif
Index: linux-2.6.19-rc2-mm2/kernel/irq/handle.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/irq/handle.c
+++ linux-2.6.19-rc2-mm2/kernel/irq/handle.c
@@ -13,10 +13,11 @@
#include <linux/irq.h>
#include <linux/module.h>
#include <linux/random.h>
#include <linux/interrupt.h>
#include <linux/kernel_stat.h>
+#include <linux/task_watchers.h>
#include "internals.h"
/**
* handle_bad_irq - handle spurious and unhandled irqs
@@ -266,6 +267,29 @@ void early_init_irq_lock_class(void)
for (i = 0; i < NR_IRQS; i++)
lockdep_set_class(&irq_desc[i].lock, &irq_desc_lock_class);
}
+static int init_task_trace_irqflags(unsigned long clone_flags,
+ struct task_struct *p)
+{
+ p->irq_events = 0;
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+ p->hardirqs_enabled = 1;
+#else
+ p->hardirqs_enabled = 0;
+#endif
+ p->hardirq_enable_ip = 0;
+ p->hardirq_enable_event = 0;
+ p->hardirq_disable_ip = _THIS_IP_;
+ p->hardirq_disable_event = 0;
+ p->softirqs_enabled = 1;
+ p->softirq_enable_ip = _THIS_IP_;
+ p->softirq_enable_event = 0;
+ p->softirq_disable_ip = 0;
+ p->softirq_disable_event = 0;
+ p->hardirq_context = 0;
+ p->softirq_context = 0;
+ return 0;
+}
+task_watcher_func(init, init_task_trace_irqflags);
#endif
--
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 7/9] Task Watchers v2: Register lockdep task watcher
2006-11-03 4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
` (5 preceding siblings ...)
2006-11-03 4:23 ` [PATCH 6/9] Task Watchers v2: Register IRQ flag tracing " Matt Helsley
@ 2006-11-03 4:23 ` Matt Helsley
2006-11-03 4:23 ` [PATCH 8/9] Task Watchers v2: Register process keyrings " Matt Helsley
` (2 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03 4:23 UTC (permalink / raw)
To: Linux-Kernel
Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton
[-- Attachment #1: task-watchers-register-lockdep --]
[-- Type: text/plain, Size: 3307 bytes --]
Register a task watcher for lockdep instead of hooking into copy_process().
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
kernel/fork.c | 5 -----
kernel/lockdep.c | 9 +++++++++
2 files changed, 9 insertions(+), 5 deletions(-)
Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel
Clone Number of Children Cloned
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17808.2 18092.3 18215.5 18183.6 18310.8 18342.8
Dev 302.333 317.786 303.385 280.608 281.378 294.009
Err (%) 1.69772 1.75647 1.66553 1.5432 1.53668 1.60285
Fork Number of Children Forked
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17821.8 18025.1 18112.5 18226 18217.4 18318
Dev 316.497 310.195 291.372 297.166 364.908 293.89
Err (%) 1.7759 1.7209 1.60868 1.63045 2.00307 1.60438
Kernbench:
Elapsed: 124.333s User: 439.787s System: 46.491s CPU: 390.7%
439.67user 46.42system 2:04.09elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.82user 46.46system 2:04.17elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.75user 46.65system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.43system 2:04.54elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.71user 46.43system 2:04.56elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.71user 46.51system 2:04.45elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.86user 46.64system 2:04.69elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.87user 46.44system 2:04.05elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
439.87user 46.48system 2:04.63elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.82user 46.45system 2:03.91elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -1052,15 +1052,10 @@ static struct task_struct *copy_process(
p->tgid = current->tgid;
retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
if (retval < 0)
goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_LOCKDEP
- p->lockdep_depth = 0; /* no locks held yet */
- p->curr_chain_key = 0;
- p->lockdep_recursion = 0;
-#endif
#ifdef CONFIG_DEBUG_MUTEXES
p->blocked_on = NULL; /* not blocked yet */
#endif
Index: linux-2.6.19-rc2-mm2/kernel/lockdep.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/lockdep.c
+++ linux-2.6.19-rc2-mm2/kernel/lockdep.c
@@ -2556,10 +2556,19 @@ void __init lockdep_init(void)
INIT_LIST_HEAD(chainhash_table + i);
lockdep_initialized = 1;
}
+static int init_task_lockdep(unsigned long clone_flags, struct task_struct *p)
+{
+ p->lockdep_depth = 0; /* no locks held yet */
+ p->curr_chain_key = 0;
+ p->lockdep_recursion = 0;
+ return 0;
+}
+task_watcher_func(init, init_task_lockdep);
+
void __init lockdep_info(void)
{
printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
printk("... MAX_LOCKDEP_SUBCLASSES: %lu\n", MAX_LOCKDEP_SUBCLASSES);
--
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 8/9] Task Watchers v2: Register process keyrings task watcher
2006-11-03 4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
` (6 preceding siblings ...)
2006-11-03 4:23 ` [PATCH 7/9] Task Watchers v2: Register lockdep " Matt Helsley
@ 2006-11-03 4:23 ` Matt Helsley
2006-11-03 4:23 ` [PATCH 9/9] Task Watchers v2: Register process events connector Matt Helsley
2006-11-03 8:57 ` [PATCH 0/9] Task Watchers v2: Introduction Paul Jackson
9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03 4:23 UTC (permalink / raw)
To: Linux-Kernel
Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton,
David Howells
[-- Attachment #1: task-watchers-register-keys --]
[-- Type: text/plain, Size: 12313 bytes --]
Make the keyring code use a task watcher to initialize and free per-task data.
NOTE:
We can't make copy_thread_group_keys() in copy_signal() a task watcher because it needs the task's signal field (struct signal_struct).
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
---
include/linux/key.h | 8 --------
kernel/exit.c | 2 --
kernel/fork.c | 6 +-----
kernel/sys.c | 8 --------
security/keys/process_keys.c | 19 ++++++++++++-------
5 files changed, 13 insertions(+), 30 deletions(-)
Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel
Clone Number of Children Cloned
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17746.8 17923.4 18079.1 18128.9 18182.7 18140.9
Dev 305.931 297.937 287.602 289.916 290.541 278.494
Err (%) 1.72387 1.66228 1.5908 1.5992 1.59789 1.53517
Fork Number of Children Forked
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17678.6 17872.6 17975.1 18072.5 18166.1 18167.7
Dev 311.175 279.804 293.091 296.378 293.13 292.623
Err (%) 1.76017 1.56555 1.63054 1.63993 1.61361 1.61068
Kernbench:
Elapsed: 124.357s User: 439.753s System: 46.582s CPU: 390.6%
439.90user 46.56system 2:04.09elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
439.71user 46.48system 2:04.23elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.82user 46.71system 2:04.77elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.67user 46.53system 2:04.31elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.80user 46.55system 2:04.10elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.54system 2:04.11elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.85user 46.79system 2:04.17elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.65user 46.50system 2:04.63elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.57user 46.55system 2:04.62elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.80user 46.61system 2:04.54elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
Index: linux-2.6.19-rc2-mm2/include/linux/key.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/key.h
+++ linux-2.6.19-rc2-mm2/include/linux/key.h
@@ -335,18 +335,14 @@ extern void keyring_replace_payload(stru
*/
extern struct key root_user_keyring, root_session_keyring;
extern int alloc_uid_keyring(struct user_struct *user,
struct task_struct *ctx);
extern void switch_uid_keyring(struct user_struct *new_user);
-extern int copy_keys(unsigned long clone_flags, struct task_struct *tsk);
extern int copy_thread_group_keys(struct task_struct *tsk);
-extern void exit_keys(struct task_struct *tsk);
extern void exit_thread_group_keys(struct signal_struct *tg);
extern int suid_keys(struct task_struct *tsk);
extern int exec_keys(struct task_struct *tsk);
-extern void key_fsuid_changed(struct task_struct *tsk);
-extern void key_fsgid_changed(struct task_struct *tsk);
extern void key_init(void);
#define __install_session_keyring(tsk, keyring) \
({ \
struct key *old_session = tsk->signal->session_keyring; \
@@ -365,18 +361,14 @@ extern void key_init(void);
#define key_ref_to_ptr(k) ({ NULL; })
#define is_key_possessed(k) 0
#define alloc_uid_keyring(u,c) 0
#define switch_uid_keyring(u) do { } while(0)
#define __install_session_keyring(t, k) ({ NULL; })
-#define copy_keys(f,t) 0
#define copy_thread_group_keys(t) 0
-#define exit_keys(t) do { } while(0)
#define exit_thread_group_keys(tg) do { } while(0)
#define suid_keys(t) do { } while(0)
#define exec_keys(t) do { } while(0)
-#define key_fsuid_changed(t) do { } while(0)
-#define key_fsgid_changed(t) do { } while(0)
#define key_init() do { } while(0)
/* Initial keyrings */
extern struct key root_user_keyring;
extern struct key root_session_keyring;
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -1070,14 +1070,12 @@ static struct task_struct *copy_process(
goto bad_fork_cleanup_fs;
if ((retval = copy_signal(clone_flags, p)))
goto bad_fork_cleanup_sighand;
if ((retval = copy_mm(clone_flags, p)))
goto bad_fork_cleanup_signal;
- if ((retval = copy_keys(clone_flags, p)))
- goto bad_fork_cleanup_mm;
if ((retval = copy_namespaces(clone_flags, p)))
- goto bad_fork_cleanup_keys;
+ goto bad_fork_cleanup_mm;
retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
if (retval)
goto bad_fork_cleanup_namespaces;
p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
@@ -1219,12 +1217,10 @@ static struct task_struct *copy_process(
proc_fork_connector(p);
return p;
bad_fork_cleanup_namespaces:
exit_task_namespaces(p);
-bad_fork_cleanup_keys:
- exit_keys(p);
bad_fork_cleanup_mm:
if (p->mm)
mmput(p->mm);
bad_fork_cleanup_signal:
cleanup_signal(p);
Index: linux-2.6.19-rc2-mm2/security/keys/process_keys.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/security/keys/process_keys.c
+++ linux-2.6.19-rc2-mm2/security/keys/process_keys.c
@@ -15,10 +15,11 @@
#include <linux/slab.h>
#include <linux/keyctl.h>
#include <linux/fs.h>
#include <linux/err.h>
#include <linux/mutex.h>
+#include <linux/task_watchers.h>
#include <asm/uaccess.h>
#include "internal.h"
/* session keyring create vs join semaphore */
static DEFINE_MUTEX(key_session_mutex);
@@ -276,11 +277,11 @@ int copy_thread_group_keys(struct task_s
/*****************************************************************************/
/*
* copy the keys for fork
*/
-int copy_keys(unsigned long clone_flags, struct task_struct *tsk)
+static int copy_keys(unsigned long clone_flags, struct task_struct *tsk)
{
key_check(tsk->thread_keyring);
key_check(tsk->request_key_auth);
/* no thread keyring yet */
@@ -290,10 +291,11 @@ int copy_keys(unsigned long clone_flags,
key_get(tsk->request_key_auth);
return 0;
} /* end copy_keys() */
+task_watcher_func(init, copy_keys);
/*****************************************************************************/
/*
* dispose of thread group keys upon thread group destruction
*/
@@ -306,16 +308,17 @@ void exit_thread_group_keys(struct signa
/*****************************************************************************/
/*
* dispose of per-thread keys upon thread exit
*/
-void exit_keys(struct task_struct *tsk)
+static int exit_keys(unsigned long exit_code, struct task_struct *tsk)
{
key_put(tsk->thread_keyring);
key_put(tsk->request_key_auth);
-
+ return 0;
} /* end exit_keys() */
+task_watcher_func(free, exit_keys);
/*****************************************************************************/
/*
* deal with execve()
*/
@@ -356,35 +359,37 @@ int suid_keys(struct task_struct *tsk)
/*****************************************************************************/
/*
* the filesystem user ID changed
*/
-void key_fsuid_changed(struct task_struct *tsk)
+static int key_fsuid_changed(unsigned long ignored, struct task_struct *tsk)
{
/* update the ownership of the thread keyring */
if (tsk->thread_keyring) {
down_write(&tsk->thread_keyring->sem);
tsk->thread_keyring->uid = tsk->fsuid;
up_write(&tsk->thread_keyring->sem);
}
-
+ return 0;
} /* end key_fsuid_changed() */
+task_watcher_func(uid, key_fsuid_changed);
/*****************************************************************************/
/*
* the filesystem group ID changed
*/
-void key_fsgid_changed(struct task_struct *tsk)
+static int key_fsgid_changed(unsigned long ignored, struct task_struct *tsk)
{
/* update the ownership of the thread keyring */
if (tsk->thread_keyring) {
down_write(&tsk->thread_keyring->sem);
tsk->thread_keyring->gid = tsk->fsgid;
up_write(&tsk->thread_keyring->sem);
}
-
+ return 0;
} /* end key_fsgid_changed() */
+task_watcher_func(gid, key_fsgid_changed);
/*****************************************************************************/
/*
* search the process keyrings for the first matching key
* - we use the supplied match function to see if the description (or other
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -12,11 +12,10 @@
#include <linux/capability.h>
#include <linux/completion.h>
#include <linux/personality.h>
#include <linux/tty.h>
#include <linux/mnt_namespace.h>
-#include <linux/key.h>
#include <linux/security.h>
#include <linux/cpu.h>
#include <linux/acct.h>
#include <linux/tsacct_kern.h>
#include <linux/file.h>
@@ -919,11 +918,10 @@ fastcall NORET_TYPE void do_exit(long co
if (group_dead)
acct_process();
__exit_files(tsk);
__exit_fs(tsk);
exit_thread();
- exit_keys(tsk);
if (group_dead && tsk->signal->leader)
disassociate_ctty(1);
module_put(task_thread_info(tsk)->exec_domain->module);
Index: linux-2.6.19-rc2-mm2/kernel/sys.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/sys.c
+++ linux-2.6.19-rc2-mm2/kernel/sys.c
@@ -957,11 +957,10 @@ asmlinkage long sys_setregid(gid_t rgid,
(egid != (gid_t) -1 && egid != old_rgid))
current->sgid = new_egid;
current->fsgid = new_egid;
current->egid = new_egid;
current->gid = new_rgid;
- key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
}
@@ -993,11 +992,10 @@ asmlinkage long sys_setgid(gid_t gid)
current->egid = current->fsgid = gid;
}
else
return -EPERM;
- key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
}
@@ -1082,11 +1080,10 @@ asmlinkage long sys_setreuid(uid_t ruid,
if (ruid != (uid_t) -1 ||
(euid != (uid_t) -1 && euid != old_ruid))
current->suid = current->euid;
current->fsuid = current->euid;
- key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
notify_task_watchers(WATCH_TASK_UID, 0, current);
return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE);
}
@@ -1130,11 +1127,10 @@ asmlinkage long sys_setuid(uid_t uid)
smp_wmb();
}
current->fsuid = current->euid = uid;
current->suid = new_suid;
- key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
notify_task_watchers(WATCH_TASK_UID, 0, current);
return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID);
}
@@ -1179,11 +1175,10 @@ asmlinkage long sys_setresuid(uid_t ruid
}
current->fsuid = current->euid;
if (suid != (uid_t) -1)
current->suid = suid;
- key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
notify_task_watchers(WATCH_TASK_UID, 0, current);
return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES);
}
@@ -1232,11 +1227,10 @@ asmlinkage long sys_setresgid(gid_t rgid
if (rgid != (gid_t) -1)
current->gid = rgid;
if (sgid != (gid_t) -1)
current->sgid = sgid;
- key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
}
@@ -1274,11 +1268,10 @@ asmlinkage long sys_setfsuid(uid_t uid)
smp_wmb();
}
current->fsuid = uid;
}
- key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
notify_task_watchers(WATCH_TASK_UID, 0, current);
security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS);
@@ -1302,11 +1295,10 @@ asmlinkage long sys_setfsgid(gid_t gid)
if (gid != old_fsgid) {
current->mm->dumpable = suid_dumpable;
smp_wmb();
}
current->fsgid = gid;
- key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
notify_task_watchers(WATCH_TASK_GID, 0, current);
}
return old_fsgid;
}
--
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 9/9] Task Watchers v2: Register process events connector
2006-11-03 4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
` (7 preceding siblings ...)
2006-11-03 4:23 ` [PATCH 8/9] Task Watchers v2: Register process keyrings " Matt Helsley
@ 2006-11-03 4:23 ` Matt Helsley
2006-11-03 8:57 ` [PATCH 0/9] Task Watchers v2: Introduction Paul Jackson
9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03 4:23 UTC (permalink / raw)
To: Linux-Kernel
Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton
[-- Attachment #1: task-watchers-register-procevents --]
[-- Type: text/plain, Size: 14005 bytes --]
Make the Process events connector use task watchers instead of hooking the
paths it's interested in.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
drivers/connector/cn_proc.c | 52 +++++++++++++++++++++++++++++++-------------
fs/exec.c | 1
include/linux/cn_proc.h | 21 -----------------
kernel/exit.c | 2 -
kernel/fork.c | 2 -
kernel/sys.c | 9 -------
6 files changed, 37 insertions(+), 50 deletions(-)
Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel
Clone Number of Children Cloned
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17602.2 17876.7 17977.4 18075.5 18134.3 18151.5
Dev 291.294 376.373 277.882 288.971 278.25 276.3
Err (%) 1.65487 2.10539 1.54573 1.59869 1.53439 1.52219
Fork Number of Children Forked
5000 7500 10000 12500 15000 17500
---------------------------------------------------------------------------------------
Mean 17691.1 17770.9 17932.6 17996 18096.4 18142.9
Dev 300.692 291.913 296.654 279.183 290.228 284.693
Err (%) 1.69968 1.64265 1.65428 1.55136 1.60379 1.56917
Kernbench:
Elapsed: 124.359s User: 439.756s System: 46.457s CPU: 390.3%
439.87user 46.42system 2:04.44elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.68user 46.42system 2:04.15elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.64system 2:04.40elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.81user 46.42system 2:03.92elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.39system 2:04.48elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.66user 46.41system 2:04.70elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.73user 46.59system 2:04.42elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.97user 46.46system 2:04.45elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.62user 46.40system 2:04.33elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.73user 46.42system 2:04.30elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
Index: linux-2.6.19-rc2-mm2/drivers/connector/cn_proc.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/drivers/connector/cn_proc.c
+++ linux-2.6.19-rc2-mm2/drivers/connector/cn_proc.c
@@ -25,10 +25,11 @@
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/ktime.h>
#include <linux/init.h>
#include <linux/connector.h>
+#include <linux/task_watchers.h>
#include <asm/atomic.h>
#include <linux/cn_proc.h>
#define CN_PROC_MSG_SIZE (sizeof(struct cn_msg) + sizeof(struct proc_event))
@@ -44,19 +45,20 @@ static inline void get_seq(__u32 *ts, in
*ts = get_cpu_var(proc_event_counts)++;
*cpu = smp_processor_id();
put_cpu_var(proc_event_counts);
}
-void proc_fork_connector(struct task_struct *task)
+static int proc_fork_connector(unsigned long clone_flags,
+ struct task_struct *task)
{
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE];
struct timespec ts;
if (atomic_read(&proc_event_num_listeners) < 1)
- return;
+ return 0;
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
@@ -70,21 +72,24 @@ void proc_fork_connector(struct task_str
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
/* If cn_netlink_send() failed, the data is not sent */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+ return 0;
}
+task_watcher_func(clone, proc_fork_connector);
-void proc_exec_connector(struct task_struct *task)
+static int proc_exec_connector(unsigned long ignore,
+ struct task_struct *task)
{
struct cn_msg *msg;
struct proc_event *ev;
struct timespec ts;
__u8 buffer[CN_PROC_MSG_SIZE];
if (atomic_read(&proc_event_num_listeners) < 1)
- return;
+ return 0;
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
@@ -95,21 +100,23 @@ void proc_exec_connector(struct task_str
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+ return 0;
}
+task_watcher_func(exec, proc_exec_connector);
-void proc_id_connector(struct task_struct *task, int which_id)
+static int process_change_id(unsigned long which_id, struct task_struct *task)
{
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE];
struct timespec ts;
if (atomic_read(&proc_event_num_listeners) < 1)
- return;
+ return 0;
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
ev->what = which_id;
ev->event_data.id.process_pid = task->pid;
@@ -119,47 +126,64 @@ void proc_id_connector(struct task_struc
ev->event_data.id.e.euid = task->euid;
} else if (which_id == PROC_EVENT_GID) {
ev->event_data.id.r.rgid = task->gid;
ev->event_data.id.e.egid = task->egid;
} else
- return;
+ return 0;
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
ev->timestamp_ns = timespec_to_ns(&ts);
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+ return 0;
+}
+
+static int proc_change_uid_connector(unsigned long ignore,
+ struct task_struct *task)
+{
+ return process_change_id(PROC_EVENT_UID, task);
+}
+task_watcher_func(uid, proc_change_uid_connector);
+
+static int proc_change_gid_connector(unsigned long ignore,
+ struct task_struct *task)
+{
+ return process_change_id(PROC_EVENT_GID, task);
}
+task_watcher_func(gid, proc_change_gid_connector);
-void proc_exit_connector(struct task_struct *task)
+static int proc_exit_connector(unsigned long code, struct task_struct *task)
{
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE];
struct timespec ts;
if (atomic_read(&proc_event_num_listeners) < 1)
- return;
+ return 0;
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
ev->timestamp_ns = timespec_to_ns(&ts);
ev->what = PROC_EVENT_EXIT;
ev->event_data.exit.process_pid = task->pid;
ev->event_data.exit.process_tgid = task->tgid;
- ev->event_data.exit.exit_code = task->exit_code;
+ ev->event_data.exit.exit_code = code;
ev->event_data.exit.exit_signal = task->exit_signal;
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+ return 0;
}
+task_watcher_func(exit, proc_exit_connector);
/*
* Send an acknowledgement message to userspace
*
* Use 0 for success, EFOO otherwise.
@@ -226,14 +250,12 @@ static void cn_proc_mcast_ctl(void *data
*/
static int __init cn_proc_init(void)
{
int err;
- if ((err = cn_add_callback(&cn_proc_event_id, "cn_proc",
- &cn_proc_mcast_ctl))) {
+ err = cn_add_callback(&cn_proc_event_id, "cn_proc", &cn_proc_mcast_ctl);
+ if (err)
printk(KERN_WARNING "cn_proc failed to register\n");
- return err;
- }
- return 0;
+ return err;
}
module_init(cn_proc_init);
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -40,11 +40,10 @@
#include <linux/mount.h>
#include <linux/profile.h>
#include <linux/rmap.h>
#include <linux/acct.h>
#include <linux/tsacct_kern.h>
-#include <linux/cn_proc.h>
#include <linux/delayacct.h>
#include <linux/taskstats_kern.h>
#include <linux/random.h>
#include <linux/task_watchers.h>
@@ -1212,11 +1211,10 @@ static struct task_struct *copy_process(
total_forks++;
spin_unlock(¤t->sighand->siglock);
write_unlock_irq(&tasklist_lock);
notify_task_watchers(WATCH_TASK_CLONE, clone_flags, p);
- proc_fork_connector(p);
return p;
bad_fork_cleanup_namespaces:
exit_task_namespaces(p);
bad_fork_cleanup_mm:
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -30,11 +30,10 @@
#include <linux/taskstats_kern.h>
#include <linux/delayacct.h>
#include <linux/syscalls.h>
#include <linux/signal.h>
#include <linux/posix-timers.h>
-#include <linux/cn_proc.h>
#include <linux/mutex.h>
#include <linux/futex.h>
#include <linux/compat.h>
#include <linux/pipe_fs_i.h>
#include <linux/resource.h>
@@ -927,11 +926,10 @@ fastcall NORET_TYPE void do_exit(long co
module_put(task_thread_info(tsk)->exec_domain->module);
if (tsk->binfmt)
module_put(tsk->binfmt->module);
tsk->exit_code = code;
- proc_exit_connector(tsk);
exit_notify(tsk);
exit_task_namespaces(tsk);
/*
* This must happen late, after the PID is not
* hashed anymore:
Index: linux-2.6.19-rc2-mm2/kernel/sys.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/sys.c
+++ linux-2.6.19-rc2-mm2/kernel/sys.c
@@ -25,11 +25,10 @@
#include <linux/security.h>
#include <linux/dcookies.h>
#include <linux/suspend.h>
#include <linux/tty.h>
#include <linux/signal.h>
-#include <linux/cn_proc.h>
#include <linux/getcpu.h>
#include <linux/seccomp.h>
#include <linux/task_watchers.h>
#include <linux/compat.h>
@@ -957,11 +956,10 @@ asmlinkage long sys_setregid(gid_t rgid,
(egid != (gid_t) -1 && egid != old_rgid))
current->sgid = new_egid;
current->fsgid = new_egid;
current->egid = new_egid;
current->gid = new_rgid;
- proc_id_connector(current, PROC_EVENT_GID);
notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
}
/*
@@ -992,11 +990,10 @@ asmlinkage long sys_setgid(gid_t gid)
current->egid = current->fsgid = gid;
}
else
return -EPERM;
- proc_id_connector(current, PROC_EVENT_GID);
notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
}
static int set_user(uid_t new_ruid, int dumpclear)
@@ -1080,11 +1077,10 @@ asmlinkage long sys_setreuid(uid_t ruid,
if (ruid != (uid_t) -1 ||
(euid != (uid_t) -1 && euid != old_ruid))
current->suid = current->euid;
current->fsuid = current->euid;
- proc_id_connector(current, PROC_EVENT_UID);
notify_task_watchers(WATCH_TASK_UID, 0, current);
return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE);
}
@@ -1127,11 +1123,10 @@ asmlinkage long sys_setuid(uid_t uid)
smp_wmb();
}
current->fsuid = current->euid = uid;
current->suid = new_suid;
- proc_id_connector(current, PROC_EVENT_UID);
notify_task_watchers(WATCH_TASK_UID, 0, current);
return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID);
}
@@ -1175,11 +1170,10 @@ asmlinkage long sys_setresuid(uid_t ruid
}
current->fsuid = current->euid;
if (suid != (uid_t) -1)
current->suid = suid;
- proc_id_connector(current, PROC_EVENT_UID);
notify_task_watchers(WATCH_TASK_UID, 0, current);
return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES);
}
@@ -1227,11 +1221,10 @@ asmlinkage long sys_setresgid(gid_t rgid
if (rgid != (gid_t) -1)
current->gid = rgid;
if (sgid != (gid_t) -1)
current->sgid = sgid;
- proc_id_connector(current, PROC_EVENT_GID);
notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
}
asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid)
@@ -1268,11 +1261,10 @@ asmlinkage long sys_setfsuid(uid_t uid)
smp_wmb();
}
current->fsuid = uid;
}
- proc_id_connector(current, PROC_EVENT_UID);
notify_task_watchers(WATCH_TASK_UID, 0, current);
security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS);
return old_fsuid;
@@ -1295,11 +1287,10 @@ asmlinkage long sys_setfsgid(gid_t gid)
if (gid != old_fsgid) {
current->mm->dumpable = suid_dumpable;
smp_wmb();
}
current->fsgid = gid;
- proc_id_connector(current, PROC_EVENT_GID);
notify_task_watchers(WATCH_TASK_GID, 0, current);
}
return old_fsgid;
}
Index: linux-2.6.19-rc2-mm2/fs/exec.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/fs/exec.c
+++ linux-2.6.19-rc2-mm2/fs/exec.c
@@ -1086,11 +1086,10 @@ int search_binary_handler(struct linux_b
fput(bprm->file);
bprm->file = NULL;
current->did_exec = 1;
notify_task_watchers(WATCH_TASK_EXEC, 0,
current);
- proc_exec_connector(current);
return retval;
}
read_lock(&binfmt_lock);
put_binfmt(fmt);
if (retval != -ENOEXEC || bprm->mm == NULL)
Index: linux-2.6.19-rc2-mm2/include/linux/cn_proc.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/cn_proc.h
+++ linux-2.6.19-rc2-mm2/include/linux/cn_proc.h
@@ -95,27 +95,6 @@ struct proc_event {
__u32 exit_code, exit_signal;
} exit;
} event_data;
};
-#ifdef __KERNEL__
-#ifdef CONFIG_PROC_EVENTS
-void proc_fork_connector(struct task_struct *task);
-void proc_exec_connector(struct task_struct *task);
-void proc_id_connector(struct task_struct *task, int which_id);
-void proc_exit_connector(struct task_struct *task);
-#else
-static inline void proc_fork_connector(struct task_struct *task)
-{}
-
-static inline void proc_exec_connector(struct task_struct *task)
-{}
-
-static inline void proc_id_connector(struct task_struct *task,
- int which_id)
-{}
-
-static inline void proc_exit_connector(struct task_struct *task)
-{}
-#endif /* CONFIG_PROC_EVENTS */
-#endif /* __KERNEL__ */
#endif /* CN_PROC_H */
--
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 0/9] Task Watchers v2: Introduction
2006-11-03 4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
` (8 preceding siblings ...)
2006-11-03 4:23 ` [PATCH 9/9] Task Watchers v2: Register process events connector Matt Helsley
@ 2006-11-03 8:57 ` Paul Jackson
2006-11-03 22:55 ` Matt Helsley
9 siblings, 1 reply; 16+ messages in thread
From: Paul Jackson @ 2006-11-03 8:57 UTC (permalink / raw)
To: Matt Helsley
Cc: linux-kernel, jes, lse-tech, sekharan, hch, viro, sgrubb,
linux-audit, akpm
Matt wrote:
> Task watchers is primarily useful to existing kernel code as a means of making
> the code in fork and exit more readable.
I don't get it. The benchmark data isn't explained in plain English
what it means, that I could find, so I am just guessing. But looking
at the last (17500) column of the fork results, after applying patch
1/9, I see a number of 18565, and looking at that same column in patch
9/9, I see a number of 18142.
I guess that means a drop of (18565 - 18142 / 18565) == 2% in the fork
rate, to make the code "more readable".
And I'm not even sure it makes it more readable. Looks to me like another
layer of apparatus, which is one more thing to figure out before a reader
understands what is going on.
I'd gladly put in a few long days to improve the fork rate 2%, and I am
grateful to those who have already done so - whoever they are.
Somewhere I must have missed the memo explaining why this patch is a
good idea - sorry.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 0/9] Task Watchers v2: Introduction
2006-11-03 8:57 ` [PATCH 0/9] Task Watchers v2: Introduction Paul Jackson
@ 2006-11-03 22:55 ` Matt Helsley
0 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03 22:55 UTC (permalink / raw)
To: Paul Jackson
Cc: linux-kernel, jes, lse-tech, sekharan, hch, viro, sgrubb,
linux-audit, akpm
On Fri, 2006-11-03 at 00:57 -0800, Paul Jackson wrote:
> Matt wrote:
> > Task watchers is primarily useful to existing kernel code as a means of making
> > the code in fork and exit more readable.
>
> I don't get it. The benchmark data isn't explained in plain English
Sorry, there were no units in the per-patch fork and clone data. Units
there are in tasks created per second. The kernbench units are in place
and should be fairly self-explanatory I think.
Here's what I did:
Measure the time it takes to fork N times. Retry 100 times. Try
different N. Try clone instead of fork to see how different the results
can be.
Then run kernbench.
Do the above after applying each patch. Then compare to the previous
patch (or unpatched source).
Run statistics on the numbers.
> what it means, that I could find, so I am just guessing. But looking
> at the last (17500) column of the fork results, after applying patch
> 1/9, I see a number of 18565, and looking at that same column in patch
> 9/9, I see a number of 18142.
>
> I guess that means a drop of (18565 - 18142 / 18565) == 2% in the fork
> rate, to make the code "more readable".
Well, it's a worst-case scenario. Without the patches I've seen the
fork rate intermittently (once every 300 samples) drop to 16k forks/sec
-- a much bigger drop than 2%. I also ran the tests on Andrew's hotfix
patches for rc2-mm2 and got similar differences even though the patches
don't change the fork path. And finally, don't forget to compare that to
the error -- about +/-1.6%. So on an absolute worst-case workload we
could have a drop anywhere from 0.4 to 3.6%.
To get a better idea of the normal impact of these patches I think you
have to look at benchmarks more like kernbench since it's not comprised
entirely of fork calls. There the measurements are easily within the
error margins with or without the patches.
Unfortunately the differences I get always seem to be right around the
size of the error. I can't seem to get a benchmark to have an error of
1% or less. I'm open to suggestions of different benchmarks or how to
obtain tighter bounds on the measurements (e.g. /proc knobs to fiddle
with).
> And I'm not even sure it makes it more readable. Looks to me like another
> layer of apparatus, which is one more thing to figure out before a reader
> understands what is going on.
It's nice to see a module's init function with the rest of the module
and not cluttering up the kernel's module loading code. The use,
benefits, disadvantages, and even the implementation of task watchers
are similar. I could rename it (task_init(), task_exit(), etc.) to make
the similarity more apparent.
> I'd gladly put in a few long days to improve the fork rate 2%, and I am
> grateful to those who have already done so - whoever they are.
I'm open to suggestions on how to improve the performance. :)
> Somewhere I must have missed the memo explaining why this patch is a
> good idea - sorry.
Well, it should make things look cleaner. It's also intended to be
useful in new code like containers and resource management -- pieces
many people don't want to pay attention to in those paths.
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 16+ messages in thread