public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Task Watchers v2: Introduction
@ 2006-11-03  4:22 Matt Helsley
  2006-11-03  4:22 ` [PATCH 1/9] Task Watchers v2: Task watchers v2 Matt Helsley
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03  4:22 UTC (permalink / raw)
  To: Linux-Kernel
  Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton

This is version 2 of my Task Watchers patches.

Task watchers calls functions whenever a task forks, execs, changes its
[re][ug]id, or exits.

Task watchers is primarily useful to existing kernel code as a means of making
the code in fork and exit more readable. Kernel code uses these paths by
marking a function as a task watcher much like modules mark their init
functions with module_init(). This improves the readability of copy_process().

The first patch adds the basic infrastructure of task watchers: notification
function calls in the various paths and a table of function pointers to be
called. It uses an ELF section because parts of the table must be gathered
from all over the kernel code and using the linker is easier than resolving
and maintaining complex header interdependencies. An ELF table is also ideal
because its "readonly" nature means that no locking nor list traversal are
required.

Subsequent patches adapt existing parts of the kernel to use a task watcher
 -- typically in the fork, clone, and exit paths:

        FEATURE (notes)                               RELEVANT CONFIG VARIABLE
	-----------------------------------------------------------------------
	audit                                         [ CONFIG_AUDIT ...      ]
	semundo                                       [ CONFIG_SYSVIPC        ]
	cpusets                                       [ CONFIG_CPUSETS        ]
	mempolicy                                     [ CONFIG_NUMA           ]
	trace irqflags                                [ CONFIG_TRACE_IRQFLAGS ]
	lockdep                                       [ CONFIG_LOCKDEP        ]
	keys (for processes -- not for thread groups) [ CONFIG_KEYS           ]
	process events connector                      [ CONFIG_PROC_EVENTS    ]


TODO:
	Mark the task watcher table ELF section read-only. I've tried to "fix"
	the .lds files to do this with no success. I'd really appreciate help
	from folks familiar with writing linker scripts.

	I'm working on three more patches that add support for creating a task
	watcher from within a module using an ELF section. They haven't recieved
	as much attention since I've been focusing on measuring the performance
	impact of these patches.

Changes:
since v2 RFC:
	Updated to 2.6.19-rc2-mm2
	Compiled, booted, tested, and benchmarked
	Testing
		Booted with audit=1 profile=2
		Enabled profiling tools
		Enabled auditing
		Ran random syscall test
		IRQ trace and lockdep CONFIG=y not tested
	Benchmarks
		A clone benchmark (try to clone as fast as possible)
			Unrealistic. Shows incremental cost of one task watcher
		A fork benchmark (try to fork as fast as possible)
			Unrealistic. Shows incremental cost of one task watcher
		Kernbench
			Closer to realistic.
		Result summaries follow changelog
		See patches for details
		Fork and clone samples available on request (too large for email)
		Fork and clone benchmark sources will be posted as replies to 00
v2:
	Dropped use of notifier chains
	Dropped per-task watchers
		Can be implemented on top of this
		Still requires notifier chains
	Dropped taskstats conversion
		Parts of taskstats had to move away from the regions of
		copy_process() and do_exit() where task_watchers are notified
	Used linker script mechanism suggested by Al Viro
	Created one "list" of watchers per event as requested by Andrew Morton
		No need to multiplex a single function call
	Easier to static register/unregister watchers: 1 line of code
	val param now used for:
		WATCH_TASK_INIT:  clone_flags
		WATCH_TASK_CLONE: clone_flags
		WATCH_TASK_EXIT:  exit code
		WATCH_TASK_*:     <unused>
	Renamed notify_watchers() to notify_task_watchers()
	Replaced: if (err != 0) --> if (err)
	Added patches converting more "features" to use task watchers
	Added return code handling to WATCH_TASK_INIT
		Return code handling elsewhere didn't seem appropriate
		since there was generally no response necessary
	Fixed process keys free to handle failure in fork as originally coded
		in copy_process
	Added process keys code to watch for [er][ug]id changes

v1:
        Added ability to cause fork to fail with NOTIFY_STOP_MASK
        Added WARN_ON() when watchers cause WATCH_TASK_FREE to stop early
        Moved fork invocation
        Moved exec invocation
        Added current as argument to exec invocation
        Moved exit code assignment
        Added id change invocations
	(70 insertions)
v0:
	Based on Jes Sorensen's Task Notifiers patches (posted to LSE-Tech)


Benchmark result summaries (sorry, this part is 86 columns):
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel

Clone - Incremental worst-case costs measured in tasks/second and as a percentage of
	expected rate
		Patch
		1 	2 	3 	4 	5 	6 	7 	8 	9
--------------------------------------------------------------------------------------
Incremental
Cost (tasks/s)	-38.12 	12.5 	-84 	25.2 	-187.5 	-0.5834 -11.36 	-125.2 	-64.05
Cost Err	122.3 	17.84 	67.11 	61.03 	41.8 	34.64 	45.53 	58.28 	53.18
Cost (%)	-0.2 	0.07 	-0.5 	0.1 	-1 	-0.004 	-0.06 	-0.7 	-0.4
Cost Err (%)	0.7 	0.1 	0.4 	0.3 	0.2 	0.2 	0.2 	0.3 	0.3


Fork - Incremental worst-case costs measured in tasks/second and as a percentage of
	expected rate
		Patch
		1 	2 	3 	4 	5 	6 	7 	8 	9
--------------------------------------------------------------------------------------
Incremental
Cost (tasks/s)	-64.58 	-35.74 	-33.29 	-25.8 	-139.5 	-7.311 	-9.2 	-131.4 	-50.47
Cost Err	54.09 	27.58 	41.76 	42.47 	49.87 	60.94 	29.72 	39.7 	40.89
Cost (%)	-0.3 	-0.2 	-0.2 	-0.1 	-0.8 	-0.04 	-0.05 	-0.7 	-0.3
Cost Err (%)	0.3 	0.2 	0.2 	0.2 	0.3 	0.3 	0.2 	0.2 	0.2

Kernbench Measurements
Patch	  Elapsed(s) User(s)    System(s) CPU(%)
-	  124.406    439.947    46.615    390.700  <-- baseline 2.6.19-rc2-mm2
1	  124.353    439.935    46.334    390.400
2	  124.234    439.700    46.503    390.800
3	  124.248    439.830    46.258    390.700
4	  124.357    439.753    46.582    390.600
5	  124.333    439.787    46.491    390.700
6	  124.532    439.732    46.497    389.900
7	  124.359    439.756    46.457    390.300
8	  124.272    439.643    46.320    390.500
9	  124.400    439.787    46.485    390.300

Mean:	  124.349    439.787    46.454    390.490
Stddev:	  0.087641   0.095917   0.115309  0.272641

Kernbench - Incremental costs
Patch	  Elapsed(s)  User(s)     System(s)   CPU(%)
1	  -0.053      -0.012      -0.281      -0.3      
2	  -0.119      -0.235       0.169       0.4      
3	   0.014       0.130      -0.245      -0.1      
4	   0.109      -0.077       0.324      -0.1      
5	  -0.024       0.034      -0.091       0.1      
6	   0.199      -0.055       0.006      -0.8      
7	  -0.173       0.024      -0.040       0.4      
8	  -0.087      -0.113      -0.137       0.2      
9	   0.128       0.144       0.165      -0.2      

Mean:	   0.005875   -0.0185      0.018875   -0.0125   
Stddev:	   0.13094     0.12738     0.1877      0.39074

Andrew, please consider these patches for 2.6.20's -mm tree.

Cheers,
	-Matt Helsley

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/9] Task Watchers v2: Task watchers v2
  2006-11-03  4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
@ 2006-11-03  4:22 ` Matt Helsley
  2006-11-03 13:22   ` Daniel Walker
  2006-11-03  4:22 ` [PATCH 2/9] Task Watchers v2: Register audit task watcher Matt Helsley
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Matt Helsley @ 2006-11-03  4:22 UTC (permalink / raw)
  To: Linux-Kernel
  Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton

[-- Attachment #1: task-watchers-v2 --]
[-- Type: text/plain, Size: 15855 bytes --]

Associate function calls with significant events in a task's lifetime much like
we handle kernel and module init/exit functions. This creates a table for each
of the following events in the task_watchers_table ELF section:

WATCH_TASK_INIT at the beginning of a fork/clone system call when the
new task struct first becomes available.

WATCH_TASK_CLONE just before returning successfully from a fork/clone.

WATCH_TASK_EXEC just before successfully returning from the exec
system call.

WATCH_TASK_UID every time a task's real or effective user id changes.

WATCH_TASK_GID every time a task's real or effective group id changes.

WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
for any reason. 

WATCH_TASK_FREE is called before critical task structures like
the mm_struct become inaccessible and the task is subsequently freed.

The next patch will add a debugfs interface for measuring fork and exit rates
which can be used to calculate the overhead of the task watcher infrastructure.

Subsequent patches will make use of task watchers to simplify fork, exit,
and many of the system calls that set [er][ug]ids.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chandra S. Seetharaman <sekharan@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: linux-audit@redhat.com
Cc: Paul Jackson <pj@sgi.com>
---
 fs/exec.c                         |    3 +++
 include/asm-generic/vmlinux.lds.h |   19 +++++++++++++++++++
 include/linux/task_watchers.h     |   31 +++++++++++++++++++++++++++++++
 kernel/Makefile                   |    2 +-
 kernel/exit.c                     |    3 +++
 kernel/fork.c                     |   15 +++++++++++----
 kernel/sys.c                      |    9 +++++++++
 kernel/task_watchers.c            |   37 +++++++++++++++++++++++++++++++++++++
 8 files changed, 114 insertions(+), 5 deletions(-)

Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel

Clone	Number of Children Cloned
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	18058.4 	18323.3 	18465.9 	18439.5 	18574.5 	18566.3
Dev	325.705 	306.322 	316.464 	291.979 	287.531 	281.275
Err (%)	1.80362 	1.67176 	1.71378 	1.58345 	1.54799 	1.51498

Fork	Number of Children Forked
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	18074 		18199.8 	18399.7 	18482.5 	18504.6 	18565.5
Dev	331.876 	315.515 	302.402 	309.314 	300.937 	309.168
Err (%)	1.83621 	1.73361 	1.64351 	1.67356 	1.62628 	1.66528

Kernbench:
Elapsed: 124.353s User: 439.935s System: 46.334s CPU: 390.4%
440.61user 46.24system 2:04.35elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
440.27user 46.21system 2:04.81elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
440.78user 46.70system 2:04.39elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.91user 46.35system 2:04.31elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.80user 46.28system 2:04.39elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.67user 46.27system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.63user 46.29system 2:04.01elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.49user 46.48system 2:04.67elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.63user 46.25system 2:04.34elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.56user 46.27system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k

Index: linux-2.6.19-rc2-mm2/kernel/sys.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/sys.c
+++ linux-2.6.19-rc2-mm2/kernel/sys.c
@@ -28,10 +28,11 @@
 #include <linux/tty.h>
 #include <linux/signal.h>
 #include <linux/cn_proc.h>
 #include <linux/getcpu.h>
 #include <linux/seccomp.h>
+#include <linux/task_watchers.h>
 
 #include <linux/compat.h>
 #include <linux/syscalls.h>
 #include <linux/kprobes.h>
 
@@ -958,10 +959,11 @@ asmlinkage long sys_setregid(gid_t rgid,
 	current->fsgid = new_egid;
 	current->egid = new_egid;
 	current->gid = new_rgid;
 	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
+	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
 /*
  * setgid() is implemented like SysV w/ SAVED_IDS 
@@ -993,10 +995,11 @@ asmlinkage long sys_setgid(gid_t gid)
 	else
 		return -EPERM;
 
 	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
+	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
   
 static int set_user(uid_t new_ruid, int dumpclear)
 {
@@ -1081,10 +1084,11 @@ asmlinkage long sys_setreuid(uid_t ruid,
 		current->suid = current->euid;
 	current->fsuid = current->euid;
 
 	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
+	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE);
 }
 
 
@@ -1128,10 +1132,11 @@ asmlinkage long sys_setuid(uid_t uid)
 	current->fsuid = current->euid = uid;
 	current->suid = new_suid;
 
 	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
+	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID);
 }
 
 
@@ -1176,10 +1181,11 @@ asmlinkage long sys_setresuid(uid_t ruid
 	if (suid != (uid_t) -1)
 		current->suid = suid;
 
 	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
+	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES);
 }
 
 asmlinkage long sys_getresuid(uid_t __user *ruid, uid_t __user *euid, uid_t __user *suid)
@@ -1228,10 +1234,11 @@ asmlinkage long sys_setresgid(gid_t rgid
 	if (sgid != (gid_t) -1)
 		current->sgid = sgid;
 
 	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
+	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
 asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid)
 {
@@ -1269,10 +1276,11 @@ asmlinkage long sys_setfsuid(uid_t uid)
 		current->fsuid = uid;
 	}
 
 	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
+	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS);
 
 	return old_fsuid;
 }
@@ -1296,10 +1304,11 @@ asmlinkage long sys_setfsgid(gid_t gid)
 			smp_wmb();
 		}
 		current->fsgid = gid;
 		key_fsgid_changed(current);
 		proc_id_connector(current, PROC_EVENT_GID);
+		notify_task_watchers(WATCH_TASK_GID, 0, current);
 	}
 	return old_fsgid;
 }
 
 asmlinkage long sys_times(struct tms __user * tbuf)
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -40,10 +40,11 @@
 #include <linux/compat.h>
 #include <linux/pipe_fs_i.h>
 #include <linux/audit.h> /* for audit_free() */
 #include <linux/resource.h>
 #include <linux/blkdev.h>
+#include <linux/task_watchers.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include <asm/pgtable.h>
 #include <asm/mmu_context.h>
@@ -885,10 +886,11 @@ fastcall NORET_TYPE void do_exit(long co
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		schedule();
 	}
 
 	tsk->flags |= PF_EXITING;
+	notify_task_watchers(WATCH_TASK_EXIT, code, tsk);
 
 	if (unlikely(in_atomic()))
 		printk(KERN_INFO "note: %s[%d] exited with preempt_count %d\n",
 				current->comm, current->pid,
 				preempt_count());
@@ -916,10 +918,11 @@ fastcall NORET_TYPE void do_exit(long co
 		audit_free(tsk);
 	taskstats_exit_send(tsk, tidstats, group_dead, mycpu);
 	taskstats_exit_free(tidstats);
 
 	exit_mm(tsk);
+	notify_task_watchers(WATCH_TASK_FREE, code, tsk);
 
 	if (group_dead)
 		acct_process();
 	exit_sem(tsk);
 	__exit_files(tsk);
Index: linux-2.6.19-rc2-mm2/fs/exec.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/fs/exec.c
+++ linux-2.6.19-rc2-mm2/fs/exec.c
@@ -48,10 +48,11 @@
 #include <linux/syscalls.h>
 #include <linux/rmap.h>
 #include <linux/tsacct_kern.h>
 #include <linux/cn_proc.h>
 #include <linux/audit.h>
+#include <linux/task_watchers.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
 
 #ifdef CONFIG_KMOD
@@ -1083,10 +1084,12 @@ int search_binary_handler(struct linux_b
 				allow_write_access(bprm->file);
 				if (bprm->file)
 					fput(bprm->file);
 				bprm->file = NULL;
 				current->did_exec = 1;
+				notify_task_watchers(WATCH_TASK_EXEC, 0,
+						     current);
 				proc_exec_connector(current);
 				return retval;
 			}
 			read_lock(&binfmt_lock);
 			put_binfmt(fmt);
Index: linux-2.6.19-rc2-mm2/include/linux/task_watchers.h
===================================================================
--- /dev/null
+++ linux-2.6.19-rc2-mm2/include/linux/task_watchers.h
@@ -0,0 +1,31 @@
+#ifndef _TASK_WATCHERS_H
+#define _TASK_WATCHERS_H
+#include <linux/sched.h>
+
+#define WATCH_TASK_INIT  0
+#define WATCH_TASK_CLONE 1
+#define WATCH_TASK_EXEC  2
+#define WATCH_TASK_UID   3
+#define WATCH_TASK_GID   4
+#define WATCH_TASK_EXIT  5
+#define WATCH_TASK_FREE  6
+#define NUM_WATCH_TASK_EVENTS 7
+
+#ifndef MODULE
+typedef int (*task_watcher_fn)(unsigned long, struct task_struct*);
+
+/*
+ * Watch for events occuring within a task and call the supplied function
+ * when (and only when) the given event happens.
+ * Only non-modular kernel code may register functions as task_watchers.
+ */
+#define task_watcher_func(ev, fn) \
+static task_watcher_fn __task_watcher_##ev##_##fn __attribute_used__ \
+	__attribute__ ((__section__ (".task_watchers." #ev))) = fn
+#else
+#error "task_watcher() macro may not be used in modules."
+#endif
+
+extern int notify_task_watchers(unsigned int ev_idx, unsigned long val,
+				struct task_struct *tsk);
+#endif /*  _TASK_WATCHERS_H */
Index: linux-2.6.19-rc2-mm2/kernel/task_watchers.c
===================================================================
--- /dev/null
+++ linux-2.6.19-rc2-mm2/kernel/task_watchers.c
@@ -0,0 +1,37 @@
+#include <linux/task_watchers.h>
+
+/* Defined in include/asm-generic/common.lds.h */
+extern const task_watcher_fn __start_task_watchers_init[],
+		__start_task_watchers_clone[], __start_task_watchers_exec[],
+		__start_task_watchers_uid[], __start_task_watchers_gid[],
+		__start_task_watchers_exit[], __start_task_watchers_free[],
+		__stop_task_watchers_free[];
+
+/*
+ *  Tables of ptrs to the first watcher func for WATCH_TASK_*
+ */
+static const task_watcher_fn *twtable[] = {
+	__start_task_watchers_init,
+	__start_task_watchers_clone,
+	__start_task_watchers_exec,
+	__start_task_watchers_uid,
+	__start_task_watchers_gid,
+	__start_task_watchers_exit,
+	__start_task_watchers_free,
+	__stop_task_watchers_free,
+};
+
+int notify_task_watchers(unsigned int ev, unsigned long val,
+			 struct task_struct *tsk)
+{
+	const task_watcher_fn *tw_call;
+	int ret_err = 0, err;
+
+	/* Call all of the watchers, report the first error */
+	for (tw_call = twtable[ev]; tw_call < twtable[ev + 1]; tw_call++) {
+		err = (*tw_call)(val, tsk);
+		if (unlikely((err < 0) && (ret_err == NOTIFY_OK)))
+			ret_err = err;
+	}
+	return ret_err;
+}
Index: linux-2.6.19-rc2-mm2/kernel/Makefile
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/Makefile
+++ linux-2.6.19-rc2-mm2/kernel/Makefile
@@ -6,11 +6,11 @@ obj-y     = sched.o fork.o exec_domain.o
 	    exit.o itimer.o time.o softirq.o resource.o \
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
 	    rcupdate.o extable.o params.o posix-timers.o \
 	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
-	    hrtimer.o rwsem.o latency.o nsproxy.o srcu.o
+	    hrtimer.o rwsem.o latency.o nsproxy.o srcu.o task_watchers.o
 
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
 obj-y += time/
 obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
 obj-$(CONFIG_LOCKDEP) += lockdep.o
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -46,10 +46,11 @@
 #include <linux/tsacct_kern.h>
 #include <linux/cn_proc.h>
 #include <linux/delayacct.h>
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
+#include <linux/task_watchers.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -1045,10 +1046,18 @@ static struct task_struct *copy_process(
 	do_posix_clock_monotonic_gettime(&p->start_time);
 	p->security = NULL;
 	p->io_context = NULL;
 	p->io_wait = NULL;
 	p->audit_context = NULL;
+
+	p->tgid = p->pid;
+	if (clone_flags & CLONE_THREAD)
+		p->tgid = current->tgid;
+
+	retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
+	if (retval < 0)
+		goto bad_fork_cleanup_delays_binfmt;
 	cpuset_fork(p);
 #ifdef CONFIG_NUMA
  	p->mempolicy = mpol_copy(p->mempolicy);
  	if (IS_ERR(p->mempolicy)) {
  		retval = PTR_ERR(p->mempolicy);
@@ -1084,14 +1093,10 @@ static struct task_struct *copy_process(
 
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
 
-	p->tgid = p->pid;
-	if (clone_flags & CLONE_THREAD)
-		p->tgid = current->tgid;
-
 	if ((retval = security_task_alloc(p)))
 		goto bad_fork_cleanup_policy;
 	if ((retval = audit_alloc(p)))
 		goto bad_fork_cleanup_security;
 	/* copy all the process information */
@@ -1248,10 +1253,11 @@ static struct task_struct *copy_process(
 	}
 
 	total_forks++;
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
+	notify_task_watchers(WATCH_TASK_CLONE, clone_flags, p);
 	proc_fork_connector(p);
 	return p;
 
 bad_fork_cleanup_namespaces:
 	exit_task_namespaces(p);
@@ -1280,10 +1286,11 @@ bad_fork_cleanup_policy:
 bad_fork_cleanup_cpuset:
 #endif
 	cpuset_exit(p);
 bad_fork_cleanup_delays_binfmt:
 	delayacct_tsk_free(p);
+	notify_task_watchers(WATCH_TASK_FREE, 0, p);
 	if (p->binfmt)
 		module_put(p->binfmt->module);
 bad_fork_cleanup_put_domain:
 	module_put(task_thread_info(p)->exec_domain->module);
 bad_fork_cleanup_count:
Index: linux-2.6.19-rc2-mm2/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/asm-generic/vmlinux.lds.h
+++ linux-2.6.19-rc2-mm2/include/asm-generic/vmlinux.lds.h
@@ -42,10 +42,29 @@
 		VMLINUX_SYMBOL(__start_rio_route_ops) = .;		\
 		*(.rio_route_ops)					\
 		VMLINUX_SYMBOL(__end_rio_route_ops) = .;		\
 	}								\
 									\
+	.task_watchers_table : AT(ADDR(.task_watchers_table) - LOAD_OFFSET) { \
+		*(.task_watchers_table)					\
+		VMLINUX_SYMBOL(__start_task_watchers_init) = .;		\
+		*(.task_watchers.init)					\
+		VMLINUX_SYMBOL(__start_task_watchers_clone) = .;	\
+		*(.task_watchers.clone)					\
+		VMLINUX_SYMBOL(__start_task_watchers_exec) = .;		\
+		*(.task_watchers.exec)					\
+		VMLINUX_SYMBOL(__start_task_watchers_uid) = .;		\
+		*(.task_watchers.uid)					\
+		VMLINUX_SYMBOL(__start_task_watchers_gid) = .;		\
+		*(.task_watchers.gid)					\
+		VMLINUX_SYMBOL(__start_task_watchers_exit) = .;		\
+		*(.task_watchers.exit)					\
+		VMLINUX_SYMBOL(__start_task_watchers_free) = .;		\
+		*(.task_watchers.free)					\
+		VMLINUX_SYMBOL(__stop_task_watchers_free) = .;		\
+	}								\
+									\
 	/* Kernel symbol table: Normal symbols */			\
 	__ksymtab         : AT(ADDR(__ksymtab) - LOAD_OFFSET) {		\
 		VMLINUX_SYMBOL(__start___ksymtab) = .;			\
 		*(__ksymtab)						\
 		VMLINUX_SYMBOL(__stop___ksymtab) = .;			\

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/9] Task Watchers v2: Register audit task watcher
  2006-11-03  4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
  2006-11-03  4:22 ` [PATCH 1/9] Task Watchers v2: Task watchers v2 Matt Helsley
@ 2006-11-03  4:22 ` Matt Helsley
  2006-11-03  4:23 ` [PATCH 3/9] Task Watchers v2: Register semundo " Matt Helsley
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03  4:22 UTC (permalink / raw)
  To: Linux-Kernel
  Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton

[-- Attachment #1: task-watchers-register-audit --]
[-- Type: text/plain, Size: 8662 bytes --]

Change audit to register a task watcher function rather than modify
the copy_process() and do_exit() paths directly.

Removes an unlikely() hint from kernel/exit.c:
	if (unlikely(tsk->audit_context))
		audit_free(tsk);
This use of unlikely() is an artifact of audit_free()'s former invocation from
__put_task_struct() (commit: fa84cb935d4ec601528f5e2f0d5d31e7876a5044).
Clearly in the __put_task_struct() path it would be called much more frequently
than do_exit() and hence the use of unlikely() there was justified. However, in
the new location the hint most likely offers no measurable performance impact.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: linux-audit@redhat.com
---
 include/linux/audit.h |    4 ----
 kernel/auditsc.c      |   10 +++++++---
 kernel/exit.c         |    3 ---
 kernel/fork.c         |    7 +------
 4 files changed, 8 insertions(+), 16 deletions(-)

Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel

Clone	Number of Children Cloned
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	18053.2 	18361.2 	18474.4 	18462 		18594.7 	18557.4
Dev	315.856 	316.881 	318.787 	312.425 	304.193 	291.819
Err (%)	1.74958 	1.72582 	1.72557 	1.69226 	1.63592 	1.57252

Fork	Number of Children Forked
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	18008	 	18186		18400.6 	18433.1 	18481.1 	18502.8
Dev	305.299 	309.41	 	315.108 	298.683 	310.504 	338.734
Err (%)	1.69536 	1.70136 	1.71248 	1.62036 	1.68011 	1.83071

Kernbench:
Elapsed: 124.234s User: 439.7s System: 46.503s CPU: 390.8%
439.67user 46.48system 2:04.11elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.46system 2:03.71elapsed 393%CPU (0avgtext+0avgdata 0maxresident)k
439.62user 46.47system 2:04.54elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.68user 46.64system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.62user 46.46system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.50system 2:04.35elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.49system 2:04.39elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.66user 46.61system 2:04.17elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.82user 46.46system 2:04.57elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.46system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k

Index: linux-2.6.19-rc2-mm2/kernel/auditsc.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/auditsc.c
+++ linux-2.6.19-rc2-mm2/kernel/auditsc.c
@@ -63,10 +63,11 @@
 #include <linux/list.h>
 #include <linux/tty.h>
 #include <linux/selinux.h>
 #include <linux/binfmts.h>
 #include <linux/syscalls.h>
+#include <linux/task_watchers.h>
 
 #include "audit.h"
 
 extern struct list_head audit_filter_list[];
 
@@ -677,11 +678,11 @@ static inline struct audit_context *audi
  * Filter on the task information and allocate a per-task audit context
  * if necessary.  Doing so turns on system call auditing for the
  * specified task.  This is called from copy_process, so no lock is
  * needed.
  */
-int audit_alloc(struct task_struct *tsk)
+static int audit_alloc(unsigned long val, struct task_struct *tsk)
 {
 	struct audit_context *context;
 	enum audit_state     state;
 
 	if (likely(!audit_enabled))
@@ -703,10 +704,11 @@ int audit_alloc(struct task_struct *tsk)
 
 	tsk->audit_context  = context;
 	set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
 	return 0;
 }
+task_watcher_func(init, audit_alloc);
 
 static inline void audit_free_context(struct audit_context *context)
 {
 	struct audit_context *previous;
 	int		     count = 0;
@@ -1035,28 +1037,30 @@ static void audit_log_exit(struct audit_
  * audit_free - free a per-task audit context
  * @tsk: task whose audit context block to free
  *
  * Called from copy_process and do_exit
  */
-void audit_free(struct task_struct *tsk)
+static int audit_free(unsigned long val, struct task_struct *tsk)
 {
 	struct audit_context *context;
 
 	context = audit_get_context(tsk, 0, 0);
 	if (likely(!context))
-		return;
+		return 0;
 
 	/* Check for system calls that do not go through the exit
 	 * function (e.g., exit_group), then free context block. 
 	 * We use GFP_ATOMIC here because we might be doing this 
 	 * in the context of the idle thread */
 	/* that can happen only if we are called from do_exit() */
 	if (context->in_syscall && context->auditable)
 		audit_log_exit(context, tsk);
 
 	audit_free_context(context);
+	return 0;
 }
+task_watcher_func(free, audit_free);
 
 /**
  * audit_syscall_entry - fill in an audit record at syscall entry
  * @tsk: task being audited
  * @arch: architecture type
Index: linux-2.6.19-rc2-mm2/include/linux/audit.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/audit.h
+++ linux-2.6.19-rc2-mm2/include/linux/audit.h
@@ -332,12 +332,10 @@ struct mqstat;
 extern int __init audit_register_class(int class, unsigned *list);
 extern int audit_classify_syscall(int abi, unsigned syscall);
 #ifdef CONFIG_AUDITSYSCALL
 /* These are defined in auditsc.c */
 				/* Public API */
-extern int  audit_alloc(struct task_struct *task);
-extern void audit_free(struct task_struct *task);
 extern void audit_syscall_entry(int arch,
 				int major, unsigned long a0, unsigned long a1,
 				unsigned long a2, unsigned long a3);
 extern void audit_syscall_exit(int failed, long return_code);
 extern void __audit_getname(const char *name);
@@ -432,12 +430,10 @@ static inline int audit_mq_getsetattr(mq
 		return __audit_mq_getsetattr(mqdes, mqstat);
 	return 0;
 }
 extern int audit_n_rules;
 #else
-#define audit_alloc(t) ({ 0; })
-#define audit_free(t) do { ; } while (0)
 #define audit_syscall_entry(ta,a,b,c,d,e) do { ; } while (0)
 #define audit_syscall_exit(f,r) do { ; } while (0)
 #define audit_dummy_context() 1
 #define audit_getname(n) do { ; } while (0)
 #define audit_putname(n) do { ; } while (0)
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -37,11 +37,10 @@
 #include <linux/jiffies.h>
 #include <linux/futex.h>
 #include <linux/rcupdate.h>
 #include <linux/ptrace.h>
 #include <linux/mount.h>
-#include <linux/audit.h>
 #include <linux/profile.h>
 #include <linux/rmap.h>
 #include <linux/acct.h>
 #include <linux/tsacct_kern.h>
 #include <linux/cn_proc.h>
@@ -1095,15 +1094,13 @@ static struct task_struct *copy_process(
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
 
 	if ((retval = security_task_alloc(p)))
 		goto bad_fork_cleanup_policy;
-	if ((retval = audit_alloc(p)))
-		goto bad_fork_cleanup_security;
 	/* copy all the process information */
 	if ((retval = copy_semundo(clone_flags, p)))
-		goto bad_fork_cleanup_audit;
+		goto bad_fork_cleanup_security;
 	if ((retval = copy_files(clone_flags, p)))
 		goto bad_fork_cleanup_semundo;
 	if ((retval = copy_fs(clone_flags, p)))
 		goto bad_fork_cleanup_files;
 	if ((retval = copy_sighand(clone_flags, p)))
@@ -1274,12 +1271,10 @@ bad_fork_cleanup_fs:
 	exit_fs(p); /* blocking */
 bad_fork_cleanup_files:
 	exit_files(p); /* blocking */
 bad_fork_cleanup_semundo:
 	exit_sem(p);
-bad_fork_cleanup_audit:
-	audit_free(p);
 bad_fork_cleanup_security:
 	security_task_free(p);
 bad_fork_cleanup_policy:
 #ifdef CONFIG_NUMA
 	mpol_free(p->mempolicy);
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -37,11 +37,10 @@
 #include <linux/cn_proc.h>
 #include <linux/mutex.h>
 #include <linux/futex.h>
 #include <linux/compat.h>
 #include <linux/pipe_fs_i.h>
-#include <linux/audit.h> /* for audit_free() */
 #include <linux/resource.h>
 #include <linux/blkdev.h>
 #include <linux/task_watchers.h>
 
 #include <asm/uaccess.h>
@@ -912,12 +911,10 @@ fastcall NORET_TYPE void do_exit(long co
 		exit_robust_list(tsk);
 #if defined(CONFIG_FUTEX) && defined(CONFIG_COMPAT)
 	if (unlikely(tsk->compat_robust_list))
 		compat_exit_robust_list(tsk);
 #endif
-	if (unlikely(tsk->audit_context))
-		audit_free(tsk);
 	taskstats_exit_send(tsk, tidstats, group_dead, mycpu);
 	taskstats_exit_free(tidstats);
 
 	exit_mm(tsk);
 	notify_task_watchers(WATCH_TASK_FREE, code, tsk);

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/9] Task Watchers v2: Register semundo task watcher
  2006-11-03  4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
  2006-11-03  4:22 ` [PATCH 1/9] Task Watchers v2: Task watchers v2 Matt Helsley
  2006-11-03  4:22 ` [PATCH 2/9] Task Watchers v2: Register audit task watcher Matt Helsley
@ 2006-11-03  4:23 ` Matt Helsley
  2006-11-03  4:23 ` [PATCH 4/9] Task Watchers v2: Register cpuset " Matt Helsley
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03  4:23 UTC (permalink / raw)
  To: Linux-Kernel
  Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton

[-- Attachment #1: task-watchers-register-semundo --]
[-- Type: text/plain, Size: 7115 bytes --]

Make the semaphore undo code use a task watcher instead of hooking into
copy_process() and do_exit() directly.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
 include/linux/sem.h |   17 -----------------
 ipc/sem.c           |   12 ++++++++----
 kernel/exit.c       |    3 ---
 kernel/fork.c       |    6 +-----
 4 files changed, 9 insertions(+), 29 deletions(-)

Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel

Clone	Number of Children Cloned
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17960.5 	18169.3 	18408.2 	18479.9 	18515.6 	18465.4
Dev	305.381 	314.209 	292.395 	284.992 	299.331 	295.311
Err (%)	1.70029 	1.72934 	1.5884 		1.54217 	1.61664 	1.59927

Fork	Number of Children Forked
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	18050.2 	18141.4 	18316.2 	18386.2 	18441.9 	18476.2
Dev	295.68	 	312.922 	296.962 	298.81 		300.985 	294.046
Err (%)	1.63809 	1.72491 	1.62131 	1.62519 	1.63207 	1.59149

Kernbench:
Elapsed: 124.272s User: 439.643s System: 46.32s CPU: 390.5%
439.64user 46.25system 2:04.46elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.70user 46.27system 2:04.04elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.64user 46.31system 2:04.18elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.49user 46.27system 2:04.41elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.55user 46.47system 2:04.32elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.29system 2:04.63elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.61user 46.31system 2:04.09elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.68user 46.31system 2:04.02elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.49system 2:04.59elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.59user 46.23system 2:03.98elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k

Index: linux-2.6.19-rc2-mm2/ipc/sem.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/ipc/sem.c
+++ linux-2.6.19-rc2-mm2/ipc/sem.c
@@ -81,10 +81,11 @@
 #include <linux/audit.h>
 #include <linux/capability.h>
 #include <linux/seq_file.h>
 #include <linux/mutex.h>
 #include <linux/nsproxy.h>
+#include <linux/task_watchers.h>
 
 #include <asm/uaccess.h>
 #include "util.h"
 
 #define sem_ids(ns)	(*((ns)->ids[IPC_SEM_IDS]))
@@ -1288,11 +1289,11 @@ asmlinkage long sys_semop (int semid, st
  * See the notes above unlock_semundo() regarding the spin_lock_init()
  * in this code.  Initialize the undo_list->lock here instead of get_undo_list()
  * because of the reasoning in the comment above unlock_semundo.
  */
 
-int copy_semundo(unsigned long clone_flags, struct task_struct *tsk)
+static int copy_semundo(unsigned long clone_flags, struct task_struct *tsk)
 {
 	struct sem_undo_list *undo_list;
 	int error;
 
 	if (clone_flags & CLONE_SYSVSEM) {
@@ -1304,10 +1305,11 @@ int copy_semundo(unsigned long clone_fla
 	} else 
 		tsk->sysvsem.undo_list = NULL;
 
 	return 0;
 }
+task_watcher_func(init, copy_semundo);
 
 /*
  * add semadj values to semaphores, free undo structures.
  * undo structures are not freed when semaphore arrays are destroyed
  * so some of them may be out of date.
@@ -1317,22 +1319,22 @@ int copy_semundo(unsigned long clone_fla
  * should we queue up and wait until we can do so legally?
  * The original implementation attempted to do this (queue and wait).
  * The current implementation does not do so. The POSIX standard
  * and SVID should be consulted to determine what behavior is mandated.
  */
-void exit_sem(struct task_struct *tsk)
+static int exit_sem(unsigned long ignored, struct task_struct *tsk)
 {
 	struct sem_undo_list *undo_list;
 	struct sem_undo *u, **up;
 	struct ipc_namespace *ns;
 
 	undo_list = tsk->sysvsem.undo_list;
 	if (!undo_list)
-		return;
+		return 0;
 
 	if (!atomic_dec_and_test(&undo_list->refcnt))
-		return;
+		return 0;
 
 	ns = tsk->nsproxy->ipc_ns;
 	/* There's no need to hold the semundo list lock, as current
          * is the last task exiting for this undo list.
 	 */
@@ -1395,11 +1397,13 @@ found:
 		update_queue(sma);
 next_entry:
 		sem_unlock(sma);
 	}
 	kfree(undo_list);
+	return 0;
 }
+task_watcher_func(free, exit_sem);
 
 #ifdef CONFIG_PROC_FS
 static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
 {
 	struct sem_array *sma = it;
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -46,12 +46,10 @@
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include <asm/pgtable.h>
 #include <asm/mmu_context.h>
 
-extern void sem_exit (void);
-
 static void exit_mm(struct task_struct * tsk);
 
 static void __unhash_process(struct task_struct *p)
 {
 	nr_threads--;
@@ -919,11 +917,10 @@ fastcall NORET_TYPE void do_exit(long co
 	exit_mm(tsk);
 	notify_task_watchers(WATCH_TASK_FREE, code, tsk);
 
 	if (group_dead)
 		acct_process();
-	exit_sem(tsk);
 	__exit_files(tsk);
 	__exit_fs(tsk);
 	exit_thread();
 	cpuset_exit(tsk);
 	exit_keys(tsk);
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -1095,14 +1095,12 @@ static struct task_struct *copy_process(
 #endif
 
 	if ((retval = security_task_alloc(p)))
 		goto bad_fork_cleanup_policy;
 	/* copy all the process information */
-	if ((retval = copy_semundo(clone_flags, p)))
-		goto bad_fork_cleanup_security;
 	if ((retval = copy_files(clone_flags, p)))
-		goto bad_fork_cleanup_semundo;
+		goto bad_fork_cleanup_security;
 	if ((retval = copy_fs(clone_flags, p)))
 		goto bad_fork_cleanup_files;
 	if ((retval = copy_sighand(clone_flags, p)))
 		goto bad_fork_cleanup_fs;
 	if ((retval = copy_signal(clone_flags, p)))
@@ -1269,12 +1267,10 @@ bad_fork_cleanup_sighand:
 	__cleanup_sighand(p->sighand);
 bad_fork_cleanup_fs:
 	exit_fs(p); /* blocking */
 bad_fork_cleanup_files:
 	exit_files(p); /* blocking */
-bad_fork_cleanup_semundo:
-	exit_sem(p);
 bad_fork_cleanup_security:
 	security_task_free(p);
 bad_fork_cleanup_policy:
 #ifdef CONFIG_NUMA
 	mpol_free(p->mempolicy);
Index: linux-2.6.19-rc2-mm2/include/linux/sem.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/sem.h
+++ linux-2.6.19-rc2-mm2/include/linux/sem.h
@@ -136,25 +136,8 @@ struct sem_undo_list {
 
 struct sysv_sem {
 	struct sem_undo_list *undo_list;
 };
 
-#ifdef CONFIG_SYSVIPC
-
-extern int copy_semundo(unsigned long clone_flags, struct task_struct *tsk);
-extern void exit_sem(struct task_struct *tsk);
-
-#else
-static inline int copy_semundo(unsigned long clone_flags, struct task_struct *tsk)
-{
-	return 0;
-}
-
-static inline void exit_sem(struct task_struct *tsk)
-{
-	return;
-}
-#endif
-
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_SEM_H */

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/9] Task Watchers v2: Register cpuset task watcher
  2006-11-03  4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
                   ` (2 preceding siblings ...)
  2006-11-03  4:23 ` [PATCH 3/9] Task Watchers v2: Register semundo " Matt Helsley
@ 2006-11-03  4:23 ` Matt Helsley
  2006-11-03  4:23 ` [PATCH 5/9] Task Watchers v2: Register NUMA mempolicy " Matt Helsley
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03  4:23 UTC (permalink / raw)
  To: Linux-Kernel
  Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton

[-- Attachment #1: task-watchers-register-cpusets --]
[-- Type: text/plain, Size: 7461 bytes --]

Register a task watcher for cpusets instead of hooking into
copy_process() and do_exit() directly.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
---
 include/linux/cpuset.h |    4 ----
 kernel/cpuset.c        |    7 +++++--
 kernel/exit.c          |    2 --
 kernel/fork.c          |    6 +-----
 4 files changed, 6 insertions(+), 13 deletions(-)

Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel

Clone	Number of Children Cloned
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	18023.8 	18243.8 	18485.1 	18422.9 	18469.4 	18505.1
Dev	317.163 	297.266 	298.965 	288.518 	294.607 	290.491
Err (%)	1.75969 	1.6294 		1.61733 	1.56608 	1.59511 	1.56979

Fork	Number of Children Forked
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17950.9 	18149.7 	18283 		18409.3 	18414.1 	18450.3
Dev	310.206 	300.925 	297.458 	290.673 	298.75	 	301.009
Err (%)	1.72808 	1.65802 	1.62696 	1.57895 	1.6224 		1.63146

Kernbench:
Elapsed: 124.248s User: 439.83s System: 46.258s CPU: 390.7%
439.80user 46.26system 2:04.53elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.20system 2:04.29elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.80user 46.42system 2:04.37elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.88user 46.16system 2:04.36elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.21system 2:03.72elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
439.93user 46.21system 2:03.90elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
439.88user 46.25system 2:04.67elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.38system 2:04.31elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.90user 46.25system 2:04.09elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.24system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k

Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -28,11 +28,10 @@
 #include <linux/mman.h>
 #include <linux/fs.h>
 #include <linux/nsproxy.h>
 #include <linux/capability.h>
 #include <linux/cpu.h>
-#include <linux/cpuset.h>
 #include <linux/security.h>
 #include <linux/swap.h>
 #include <linux/syscalls.h>
 #include <linux/jiffies.h>
 #include <linux/futex.h>
@@ -1053,17 +1052,16 @@ static struct task_struct *copy_process(
 		p->tgid = current->tgid;
 
 	retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
 	if (retval < 0)
 		goto bad_fork_cleanup_delays_binfmt;
-	cpuset_fork(p);
 #ifdef CONFIG_NUMA
  	p->mempolicy = mpol_copy(p->mempolicy);
  	if (IS_ERR(p->mempolicy)) {
  		retval = PTR_ERR(p->mempolicy);
  		p->mempolicy = NULL;
- 		goto bad_fork_cleanup_cpuset;
+ 		goto bad_fork_cleanup_delays_binfmt;
  	}
 	mpol_fix_fork_child_flag(p);
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
@@ -1272,13 +1270,11 @@ bad_fork_cleanup_files:
 bad_fork_cleanup_security:
 	security_task_free(p);
 bad_fork_cleanup_policy:
 #ifdef CONFIG_NUMA
 	mpol_free(p->mempolicy);
-bad_fork_cleanup_cpuset:
 #endif
-	cpuset_exit(p);
 bad_fork_cleanup_delays_binfmt:
 	delayacct_tsk_free(p);
 	notify_task_watchers(WATCH_TASK_FREE, 0, p);
 	if (p->binfmt)
 		module_put(p->binfmt->module);
Index: linux-2.6.19-rc2-mm2/kernel/cpuset.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/cpuset.c
+++ linux-2.6.19-rc2-mm2/kernel/cpuset.c
@@ -47,10 +47,11 @@
 #include <linux/stat.h>
 #include <linux/string.h>
 #include <linux/time.h>
 #include <linux/backing-dev.h>
 #include <linux/sort.h>
+#include <linux/task_watchers.h>
 
 #include <asm/uaccess.h>
 #include <asm/atomic.h>
 #include <linux/mutex.h>
 
@@ -2172,17 +2173,18 @@ void __init cpuset_init_smp(void)
  *
  * At the point that cpuset_fork() is called, 'current' is the parent
  * task, and the passed argument 'child' points to the child task.
  **/
 
-void cpuset_fork(struct task_struct *child)
+static void cpuset_fork(unsigned long clone_flags, struct task_struct *child)
 {
 	task_lock(current);
 	child->cpuset = current->cpuset;
 	atomic_inc(&child->cpuset->count);
 	task_unlock(current);
 }
+task_watcher_func(init, cpuset_fork);
 
 /**
  * cpuset_exit - detach cpuset from exiting task
  * @tsk: pointer to task_struct of exiting process
  *
@@ -2239,11 +2241,11 @@ void cpuset_fork(struct task_struct *chi
  *    to NULL here, and check in cpuset_update_task_memory_state()
  *    for a NULL pointer.  This hack avoids that NULL check, for no
  *    cost (other than this way too long comment ;).
  **/
 
-void cpuset_exit(struct task_struct *tsk)
+static void cpuset_exit(unsigned long exit_code, struct task_struct *tsk)
 {
 	struct cpuset *cs;
 
 	cs = tsk->cpuset;
 	tsk->cpuset = &top_cpuset;	/* the_top_cpuset_hack - see above */
@@ -2258,10 +2260,11 @@ void cpuset_exit(struct task_struct *tsk
 		cpuset_release_agent(pathbuf);
 	} else {
 		atomic_dec(&cs->count);
 	}
 }
+task_watcher_func(free, cpuset_exit);
 
 /**
  * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset.
  * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
  *
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -28,11 +28,10 @@
 #include <linux/mount.h>
 #include <linux/proc_fs.h>
 #include <linux/mempolicy.h>
 #include <linux/taskstats_kern.h>
 #include <linux/delayacct.h>
-#include <linux/cpuset.h>
 #include <linux/syscalls.h>
 #include <linux/signal.h>
 #include <linux/posix-timers.h>
 #include <linux/cn_proc.h>
 #include <linux/mutex.h>
@@ -920,11 +919,10 @@ fastcall NORET_TYPE void do_exit(long co
 	if (group_dead)
 		acct_process();
 	__exit_files(tsk);
 	__exit_fs(tsk);
 	exit_thread();
-	cpuset_exit(tsk);
 	exit_keys(tsk);
 
 	if (group_dead && tsk->signal->leader)
 		disassociate_ctty(1);
 
Index: linux-2.6.19-rc2-mm2/include/linux/cpuset.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/cpuset.h
+++ linux-2.6.19-rc2-mm2/include/linux/cpuset.h
@@ -17,12 +17,10 @@
 extern int number_of_cpusets;	/* How many cpusets are defined in system? */
 
 extern int cpuset_init_early(void);
 extern int cpuset_init(void);
 extern void cpuset_init_smp(void);
-extern void cpuset_fork(struct task_struct *p);
-extern void cpuset_exit(struct task_struct *p);
 extern cpumask_t cpuset_cpus_allowed(struct task_struct *p);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
 #define cpuset_current_mems_allowed (current->mems_allowed)
 void cpuset_init_current_mems_allowed(void);
 void cpuset_update_task_memory_state(void);
@@ -69,12 +67,10 @@ extern void cpuset_track_online_nodes(vo
 #else /* !CONFIG_CPUSETS */
 
 static inline int cpuset_init_early(void) { return 0; }
 static inline int cpuset_init(void) { return 0; }
 static inline void cpuset_init_smp(void) {}
-static inline void cpuset_fork(struct task_struct *p) {}
-static inline void cpuset_exit(struct task_struct *p) {}
 
 static inline cpumask_t cpuset_cpus_allowed(struct task_struct *p)
 {
 	return cpu_possible_map;
 }

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 5/9] Task Watchers v2: Register NUMA mempolicy task watcher
  2006-11-03  4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
                   ` (3 preceding siblings ...)
  2006-11-03  4:23 ` [PATCH 4/9] Task Watchers v2: Register cpuset " Matt Helsley
@ 2006-11-03  4:23 ` Matt Helsley
  2006-11-03  4:23 ` [PATCH 6/9] Task Watchers v2: Register IRQ flag tracing " Matt Helsley
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03  4:23 UTC (permalink / raw)
  To: Linux-Kernel
  Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton

[-- Attachment #1: task-watchers-register-numa-mempolicy --]
[-- Type: text/plain, Size: 5458 bytes --]

Register a NUMA mempolicy task watcher instead of hooking into
copy_process() and do_exit() directly.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
 kernel/exit.c  |    4 ----
 kernel/fork.c  |   15 +--------------
 mm/mempolicy.c |   24 ++++++++++++++++++++++++
 3 files changed, 25 insertions(+), 18 deletions(-)

Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel

Clone	Number of Children Cloned
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17836.3 	18085.2 	18220.4 	18225 		18319	 	18339
Dev	302.801 	314.617 	303.079 	293.46 		287.267 	294.819
Err (%)	1.69767 	1.73963 	1.6634	 	1.6102	 	1.56814 	1.60761

Fork	Number of Children Forked
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17896.2 	17990 		18100.6 	18242.3 	18244 		18346.9
Dev	301.64	 	285.698 	295.646 	304.361 	299.472 	287.153
Err (%)	1.6855 		1.58809 	1.63335 	1.66844 	1.64148 	1.56513

Kernbench:
Elapsed: 124.532s User: 439.732s System: 46.497s CPU: 389.9%
439.71user 46.48system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.42system 2:05.10elapsed 388%CPU (0avgtext+0avgdata 0maxresident)k
439.74user 46.44system 2:04.60elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.75user 46.64system 2:04.74elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.61user 46.45system 2:05.36elapsed 387%CPU (0avgtext+0avgdata 0maxresident)k
439.60user 46.43system 2:04.33elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.47system 2:04.34elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.87user 46.45system 2:04.10elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.71system 2:04.58elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.48system 2:03.93elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k

Index: linux-2.6.19-rc2-mm2/mm/mempolicy.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/mm/mempolicy.c
+++ linux-2.6.19-rc2-mm2/mm/mempolicy.c
@@ -87,10 +87,11 @@
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
 #include <linux/migrate.h>
 #include <linux/rmap.h>
 #include <linux/security.h>
+#include <linux/task_watchers.h>
 
 #include <asm/tlbflush.h>
 #include <asm/uaccess.h>
 
 /* Internal flags */
@@ -1333,10 +1334,33 @@ struct mempolicy *__mpol_copy(struct mem
 		}
 	}
 	return new;
 }
 
+static int init_task_mempolicy(unsigned long clone_flags,
+			       struct task_struct *tsk)
+{
+ 	tsk->mempolicy = mpol_copy(tsk->mempolicy);
+ 	if (IS_ERR(tsk->mempolicy)) {
+		int retval;
+
+ 		retval = PTR_ERR(tsk->mempolicy);
+ 		tsk->mempolicy = NULL;
+		return retval;
+ 	}
+	mpol_fix_fork_child_flag(tsk);
+	return 0;
+}
+task_watcher_func(init, init_task_mempolicy);
+
+static int free_task_mempolicy(unsigned int ignored, struct task_struct *tsk)
+{
+	mpol_free(tsk);
+	tsk->mempolicy = NULL;
+}
+task_watcher_func(free, free_task_mempolicy);
+
 /* Slow path of a mempolicy comparison */
 int __mpol_equal(struct mempolicy *a, struct mempolicy *b)
 {
 	if (!a || !b)
 		return 0;
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -1052,19 +1052,10 @@ static struct task_struct *copy_process(
 		p->tgid = current->tgid;
 
 	retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
 	if (retval < 0)
 		goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_NUMA
- 	p->mempolicy = mpol_copy(p->mempolicy);
- 	if (IS_ERR(p->mempolicy)) {
- 		retval = PTR_ERR(p->mempolicy);
- 		p->mempolicy = NULL;
- 		goto bad_fork_cleanup_delays_binfmt;
- 	}
-	mpol_fix_fork_child_flag(p);
-#endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
 #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
 	p->hardirqs_enabled = 1;
 #else
@@ -1091,11 +1082,11 @@ static struct task_struct *copy_process(
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
 
 	if ((retval = security_task_alloc(p)))
-		goto bad_fork_cleanup_policy;
+		goto bad_fork_cleanup_delays_binfmt;
 	/* copy all the process information */
 	if ((retval = copy_files(clone_flags, p)))
 		goto bad_fork_cleanup_security;
 	if ((retval = copy_fs(clone_flags, p)))
 		goto bad_fork_cleanup_files;
@@ -1267,14 +1258,10 @@ bad_fork_cleanup_fs:
 	exit_fs(p); /* blocking */
 bad_fork_cleanup_files:
 	exit_files(p); /* blocking */
 bad_fork_cleanup_security:
 	security_task_free(p);
-bad_fork_cleanup_policy:
-#ifdef CONFIG_NUMA
-	mpol_free(p->mempolicy);
-#endif
 bad_fork_cleanup_delays_binfmt:
 	delayacct_tsk_free(p);
 	notify_task_watchers(WATCH_TASK_FREE, 0, p);
 	if (p->binfmt)
 		module_put(p->binfmt->module);
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -932,14 +932,10 @@ fastcall NORET_TYPE void do_exit(long co
 
 	tsk->exit_code = code;
 	proc_exit_connector(tsk);
 	exit_notify(tsk);
 	exit_task_namespaces(tsk);
-#ifdef CONFIG_NUMA
-	mpol_free(tsk->mempolicy);
-	tsk->mempolicy = NULL;
-#endif
 	/*
 	 * This must happen late, after the PID is not
 	 * hashed anymore:
 	 */
 	if (unlikely(!list_empty(&tsk->pi_state_list)))

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 6/9] Task Watchers v2: Register IRQ flag tracing task watcher
  2006-11-03  4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
                   ` (4 preceding siblings ...)
  2006-11-03  4:23 ` [PATCH 5/9] Task Watchers v2: Register NUMA mempolicy " Matt Helsley
@ 2006-11-03  4:23 ` Matt Helsley
  2006-11-03  4:23 ` [PATCH 7/9] Task Watchers v2: Register lockdep " Matt Helsley
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03  4:23 UTC (permalink / raw)
  To: Linux-Kernel
  Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton

[-- Attachment #1: task-watchers-register-trace-irqflags --]
[-- Type: text/plain, Size: 4284 bytes --]

Register an irq-flag-tracing task watcher instead of hooking into
copy_process().

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
 kernel/fork.c       |   19 -------------------
 kernel/irq/handle.c |   24 ++++++++++++++++++++++++
 2 files changed, 24 insertions(+), 19 deletions(-)

Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel

Clone	Number of Children Cloned
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17826.5 	18077.4 	18160.1 	18263.6 	18343	 	18350.8
Dev	305.841 	306.331 	283.323 	284.761 	292.732 	292.882
Err (%)	1.71565 	1.69455 	1.56014 	1.55917 	1.59588 	1.59602

Fork	Number of Children Forked
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17813.5 	18062.4 	18140.5 	18246.7 	18237.8 	18275.2
Dev	305.816 	294.914 	294.779 	294.727 	323.996 	300.176
Err (%)	1.71677 	1.63275 	1.62498 	1.61523 	1.77651 	1.64253

Kernbench:
Elapsed: 124.4s User: 439.787s System: 46.485s CPU: 390.3%
439.70user 46.43system 2:04.64elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.92user 46.38system 2:04.47elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.62system 2:04.44elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.83user 46.46system 2:04.29elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.73user 46.47system 2:04.12elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.83user 46.49system 2:04.10elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.42system 2:04.41elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.70user 46.64system 2:04.30elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.47system 2:04.76elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.82user 46.47system 2:04.47elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k

Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -1052,29 +1052,10 @@ static struct task_struct *copy_process(
 		p->tgid = current->tgid;
 
 	retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
 	if (retval < 0)
 		goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_TRACE_IRQFLAGS
-	p->irq_events = 0;
-#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
-	p->hardirqs_enabled = 1;
-#else
-	p->hardirqs_enabled = 0;
-#endif
-	p->hardirq_enable_ip = 0;
-	p->hardirq_enable_event = 0;
-	p->hardirq_disable_ip = _THIS_IP_;
-	p->hardirq_disable_event = 0;
-	p->softirqs_enabled = 1;
-	p->softirq_enable_ip = _THIS_IP_;
-	p->softirq_enable_event = 0;
-	p->softirq_disable_ip = 0;
-	p->softirq_disable_event = 0;
-	p->hardirq_context = 0;
-	p->softirq_context = 0;
-#endif
 #ifdef CONFIG_LOCKDEP
 	p->lockdep_depth = 0; /* no locks held yet */
 	p->curr_chain_key = 0;
 	p->lockdep_recursion = 0;
 #endif
Index: linux-2.6.19-rc2-mm2/kernel/irq/handle.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/irq/handle.c
+++ linux-2.6.19-rc2-mm2/kernel/irq/handle.c
@@ -13,10 +13,11 @@
 #include <linux/irq.h>
 #include <linux/module.h>
 #include <linux/random.h>
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
+#include <linux/task_watchers.h>
 
 #include "internals.h"
 
 /**
  * handle_bad_irq - handle spurious and unhandled irqs
@@ -266,6 +267,29 @@ void early_init_irq_lock_class(void)
 
 	for (i = 0; i < NR_IRQS; i++)
 		lockdep_set_class(&irq_desc[i].lock, &irq_desc_lock_class);
 }
 
+static int init_task_trace_irqflags(unsigned long clone_flags,
+				    struct task_struct *p)
+{
+	p->irq_events = 0;
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+	p->hardirqs_enabled = 1;
+#else
+	p->hardirqs_enabled = 0;
+#endif
+	p->hardirq_enable_ip = 0;
+	p->hardirq_enable_event = 0;
+	p->hardirq_disable_ip = _THIS_IP_;
+	p->hardirq_disable_event = 0;
+	p->softirqs_enabled = 1;
+	p->softirq_enable_ip = _THIS_IP_;
+	p->softirq_enable_event = 0;
+	p->softirq_disable_ip = 0;
+	p->softirq_disable_event = 0;
+	p->hardirq_context = 0;
+	p->softirq_context = 0;
+	return 0;
+}
+task_watcher_func(init, init_task_trace_irqflags);
 #endif

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 7/9] Task Watchers v2: Register lockdep task watcher
  2006-11-03  4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
                   ` (5 preceding siblings ...)
  2006-11-03  4:23 ` [PATCH 6/9] Task Watchers v2: Register IRQ flag tracing " Matt Helsley
@ 2006-11-03  4:23 ` Matt Helsley
  2006-11-03  4:23 ` [PATCH 8/9] Task Watchers v2: Register process keyrings " Matt Helsley
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03  4:23 UTC (permalink / raw)
  To: Linux-Kernel
  Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton

[-- Attachment #1: task-watchers-register-lockdep --]
[-- Type: text/plain, Size: 3307 bytes --]

Register a task watcher for lockdep instead of hooking into copy_process().

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
 kernel/fork.c    |    5 -----
 kernel/lockdep.c |    9 +++++++++
 2 files changed, 9 insertions(+), 5 deletions(-)

Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel

Clone	Number of Children Cloned
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17808.2 	18092.3 	18215.5 	18183.6 	18310.8 	18342.8
Dev	302.333 	317.786 	303.385 	280.608 	281.378 	294.009
Err (%)	1.69772 	1.75647 	1.66553 	1.5432	 	1.53668 	1.60285

Fork	Number of Children Forked
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17821.8 	18025.1 	18112.5 	18226	 	18217.4 	18318
Dev	316.497 	310.195 	291.372 	297.166 	364.908 	293.89
Err (%)	1.7759 		1.7209 		1.60868 	1.63045 	2.00307 	1.60438

Kernbench:
Elapsed: 124.333s User: 439.787s System: 46.491s CPU: 390.7%
439.67user 46.42system 2:04.09elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.82user 46.46system 2:04.17elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.75user 46.65system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.43system 2:04.54elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.71user 46.43system 2:04.56elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.71user 46.51system 2:04.45elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.86user 46.64system 2:04.69elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.87user 46.44system 2:04.05elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
439.87user 46.48system 2:04.63elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.82user 46.45system 2:03.91elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k

Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -1052,15 +1052,10 @@ static struct task_struct *copy_process(
 		p->tgid = current->tgid;
 
 	retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
 	if (retval < 0)
 		goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_LOCKDEP
-	p->lockdep_depth = 0; /* no locks held yet */
-	p->curr_chain_key = 0;
-	p->lockdep_recursion = 0;
-#endif
 
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
 
Index: linux-2.6.19-rc2-mm2/kernel/lockdep.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/lockdep.c
+++ linux-2.6.19-rc2-mm2/kernel/lockdep.c
@@ -2556,10 +2556,19 @@ void __init lockdep_init(void)
 		INIT_LIST_HEAD(chainhash_table + i);
 
 	lockdep_initialized = 1;
 }
 
+static int init_task_lockdep(unsigned long clone_flags, struct task_struct *p)
+{
+	p->lockdep_depth = 0; /* no locks held yet */
+	p->curr_chain_key = 0;
+	p->lockdep_recursion = 0;
+	return 0;
+}
+task_watcher_func(init, init_task_lockdep);
+
 void __init lockdep_info(void)
 {
 	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
 
 	printk("... MAX_LOCKDEP_SUBCLASSES:    %lu\n", MAX_LOCKDEP_SUBCLASSES);

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 8/9] Task Watchers v2: Register process keyrings task watcher
  2006-11-03  4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
                   ` (6 preceding siblings ...)
  2006-11-03  4:23 ` [PATCH 7/9] Task Watchers v2: Register lockdep " Matt Helsley
@ 2006-11-03  4:23 ` Matt Helsley
  2006-11-03  4:23 ` [PATCH 9/9] Task Watchers v2: Register process events connector Matt Helsley
  2006-11-03  8:57 ` [PATCH 0/9] Task Watchers v2: Introduction Paul Jackson
  9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03  4:23 UTC (permalink / raw)
  To: Linux-Kernel
  Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton,
	David Howells

[-- Attachment #1: task-watchers-register-keys --]
[-- Type: text/plain, Size: 12313 bytes --]

Make the keyring code use a task watcher to initialize and free per-task data.

NOTE:
We can't make copy_thread_group_keys() in copy_signal() a task watcher because it needs the task's signal field (struct signal_struct).

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
---
 include/linux/key.h          |    8 --------
 kernel/exit.c                |    2 --
 kernel/fork.c                |    6 +-----
 kernel/sys.c                 |    8 --------
 security/keys/process_keys.c |   19 ++++++++++++-------
 5 files changed, 13 insertions(+), 30 deletions(-)

Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel

Clone	Number of Children Cloned
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17746.8 	17923.4 	18079.1 	18128.9 	18182.7 	18140.9
Dev	305.931 	297.937 	287.602 	289.916 	290.541 	278.494
Err (%)	1.72387 	1.66228 	1.5908	 	1.5992 		1.59789 	1.53517

Fork	Number of Children Forked
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17678.6 	17872.6 	17975.1 	18072.5 	18166.1 	18167.7
Dev	311.175 	279.804 	293.091 	296.378 	293.13	 	292.623
Err (%)	1.76017 	1.56555 	1.63054 	1.63993 	1.61361 	1.61068

Kernbench:
Elapsed: 124.357s User: 439.753s System: 46.582s CPU: 390.6%
439.90user 46.56system 2:04.09elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
439.71user 46.48system 2:04.23elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.82user 46.71system 2:04.77elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.67user 46.53system 2:04.31elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.80user 46.55system 2:04.10elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.54system 2:04.11elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.85user 46.79system 2:04.17elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.65user 46.50system 2:04.63elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.57user 46.55system 2:04.62elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.80user 46.61system 2:04.54elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k

Index: linux-2.6.19-rc2-mm2/include/linux/key.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/key.h
+++ linux-2.6.19-rc2-mm2/include/linux/key.h
@@ -335,18 +335,14 @@ extern void keyring_replace_payload(stru
  */
 extern struct key root_user_keyring, root_session_keyring;
 extern int alloc_uid_keyring(struct user_struct *user,
 			     struct task_struct *ctx);
 extern void switch_uid_keyring(struct user_struct *new_user);
-extern int copy_keys(unsigned long clone_flags, struct task_struct *tsk);
 extern int copy_thread_group_keys(struct task_struct *tsk);
-extern void exit_keys(struct task_struct *tsk);
 extern void exit_thread_group_keys(struct signal_struct *tg);
 extern int suid_keys(struct task_struct *tsk);
 extern int exec_keys(struct task_struct *tsk);
-extern void key_fsuid_changed(struct task_struct *tsk);
-extern void key_fsgid_changed(struct task_struct *tsk);
 extern void key_init(void);
 
 #define __install_session_keyring(tsk, keyring)			\
 ({								\
 	struct key *old_session = tsk->signal->session_keyring;	\
@@ -365,18 +361,14 @@ extern void key_init(void);
 #define key_ref_to_ptr(k)		({ NULL; })
 #define is_key_possessed(k)		0
 #define alloc_uid_keyring(u,c)		0
 #define switch_uid_keyring(u)		do { } while(0)
 #define __install_session_keyring(t, k)	({ NULL; })
-#define copy_keys(f,t)			0
 #define copy_thread_group_keys(t)	0
-#define exit_keys(t)			do { } while(0)
 #define exit_thread_group_keys(tg)	do { } while(0)
 #define suid_keys(t)			do { } while(0)
 #define exec_keys(t)			do { } while(0)
-#define key_fsuid_changed(t)		do { } while(0)
-#define key_fsgid_changed(t)		do { } while(0)
 #define key_init()			do { } while(0)
 
 /* Initial keyrings */
 extern struct key root_user_keyring;
 extern struct key root_session_keyring;
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -1070,14 +1070,12 @@ static struct task_struct *copy_process(
 		goto bad_fork_cleanup_fs;
 	if ((retval = copy_signal(clone_flags, p)))
 		goto bad_fork_cleanup_sighand;
 	if ((retval = copy_mm(clone_flags, p)))
 		goto bad_fork_cleanup_signal;
-	if ((retval = copy_keys(clone_flags, p)))
-		goto bad_fork_cleanup_mm;
 	if ((retval = copy_namespaces(clone_flags, p)))
-		goto bad_fork_cleanup_keys;
+		goto bad_fork_cleanup_mm;
 	retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
 	if (retval)
 		goto bad_fork_cleanup_namespaces;
 
 	p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
@@ -1219,12 +1217,10 @@ static struct task_struct *copy_process(
 	proc_fork_connector(p);
 	return p;
 
 bad_fork_cleanup_namespaces:
 	exit_task_namespaces(p);
-bad_fork_cleanup_keys:
-	exit_keys(p);
 bad_fork_cleanup_mm:
 	if (p->mm)
 		mmput(p->mm);
 bad_fork_cleanup_signal:
 	cleanup_signal(p);
Index: linux-2.6.19-rc2-mm2/security/keys/process_keys.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/security/keys/process_keys.c
+++ linux-2.6.19-rc2-mm2/security/keys/process_keys.c
@@ -15,10 +15,11 @@
 #include <linux/slab.h>
 #include <linux/keyctl.h>
 #include <linux/fs.h>
 #include <linux/err.h>
 #include <linux/mutex.h>
+#include <linux/task_watchers.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
 /* session keyring create vs join semaphore */
 static DEFINE_MUTEX(key_session_mutex);
@@ -276,11 +277,11 @@ int copy_thread_group_keys(struct task_s
 
 /*****************************************************************************/
 /*
  * copy the keys for fork
  */
-int copy_keys(unsigned long clone_flags, struct task_struct *tsk)
+static int copy_keys(unsigned long clone_flags, struct task_struct *tsk)
 {
 	key_check(tsk->thread_keyring);
 	key_check(tsk->request_key_auth);
 
 	/* no thread keyring yet */
@@ -290,10 +291,11 @@ int copy_keys(unsigned long clone_flags,
 	key_get(tsk->request_key_auth);
 
 	return 0;
 
 } /* end copy_keys() */
+task_watcher_func(init, copy_keys);
 
 /*****************************************************************************/
 /*
  * dispose of thread group keys upon thread group destruction
  */
@@ -306,16 +308,17 @@ void exit_thread_group_keys(struct signa
 
 /*****************************************************************************/
 /*
  * dispose of per-thread keys upon thread exit
  */
-void exit_keys(struct task_struct *tsk)
+static int exit_keys(unsigned long exit_code, struct task_struct *tsk)
 {
 	key_put(tsk->thread_keyring);
 	key_put(tsk->request_key_auth);
-
+	return 0;
 } /* end exit_keys() */
+task_watcher_func(free, exit_keys);
 
 /*****************************************************************************/
 /*
  * deal with execve()
  */
@@ -356,35 +359,37 @@ int suid_keys(struct task_struct *tsk)
 
 /*****************************************************************************/
 /*
  * the filesystem user ID changed
  */
-void key_fsuid_changed(struct task_struct *tsk)
+static int key_fsuid_changed(unsigned long ignored, struct task_struct *tsk)
 {
 	/* update the ownership of the thread keyring */
 	if (tsk->thread_keyring) {
 		down_write(&tsk->thread_keyring->sem);
 		tsk->thread_keyring->uid = tsk->fsuid;
 		up_write(&tsk->thread_keyring->sem);
 	}
-
+	return 0;
 } /* end key_fsuid_changed() */
+task_watcher_func(uid, key_fsuid_changed);
 
 /*****************************************************************************/
 /*
  * the filesystem group ID changed
  */
-void key_fsgid_changed(struct task_struct *tsk)
+static int key_fsgid_changed(unsigned long ignored, struct task_struct *tsk)
 {
 	/* update the ownership of the thread keyring */
 	if (tsk->thread_keyring) {
 		down_write(&tsk->thread_keyring->sem);
 		tsk->thread_keyring->gid = tsk->fsgid;
 		up_write(&tsk->thread_keyring->sem);
 	}
-
+	return 0;
 } /* end key_fsgid_changed() */
+task_watcher_func(gid, key_fsgid_changed);
 
 /*****************************************************************************/
 /*
  * search the process keyrings for the first matching key
  * - we use the supplied match function to see if the description (or other
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -12,11 +12,10 @@
 #include <linux/capability.h>
 #include <linux/completion.h>
 #include <linux/personality.h>
 #include <linux/tty.h>
 #include <linux/mnt_namespace.h>
-#include <linux/key.h>
 #include <linux/security.h>
 #include <linux/cpu.h>
 #include <linux/acct.h>
 #include <linux/tsacct_kern.h>
 #include <linux/file.h>
@@ -919,11 +918,10 @@ fastcall NORET_TYPE void do_exit(long co
 	if (group_dead)
 		acct_process();
 	__exit_files(tsk);
 	__exit_fs(tsk);
 	exit_thread();
-	exit_keys(tsk);
 
 	if (group_dead && tsk->signal->leader)
 		disassociate_ctty(1);
 
 	module_put(task_thread_info(tsk)->exec_domain->module);
Index: linux-2.6.19-rc2-mm2/kernel/sys.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/sys.c
+++ linux-2.6.19-rc2-mm2/kernel/sys.c
@@ -957,11 +957,10 @@ asmlinkage long sys_setregid(gid_t rgid,
 	    (egid != (gid_t) -1 && egid != old_rgid))
 		current->sgid = new_egid;
 	current->fsgid = new_egid;
 	current->egid = new_egid;
 	current->gid = new_rgid;
-	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
@@ -993,11 +992,10 @@ asmlinkage long sys_setgid(gid_t gid)
 		current->egid = current->fsgid = gid;
 	}
 	else
 		return -EPERM;
 
-	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
   
@@ -1082,11 +1080,10 @@ asmlinkage long sys_setreuid(uid_t ruid,
 	if (ruid != (uid_t) -1 ||
 	    (euid != (uid_t) -1 && euid != old_ruid))
 		current->suid = current->euid;
 	current->fsuid = current->euid;
 
-	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE);
 }
@@ -1130,11 +1127,10 @@ asmlinkage long sys_setuid(uid_t uid)
 		smp_wmb();
 	}
 	current->fsuid = current->euid = uid;
 	current->suid = new_suid;
 
-	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID);
 }
@@ -1179,11 +1175,10 @@ asmlinkage long sys_setresuid(uid_t ruid
 	}
 	current->fsuid = current->euid;
 	if (suid != (uid_t) -1)
 		current->suid = suid;
 
-	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES);
 }
@@ -1232,11 +1227,10 @@ asmlinkage long sys_setresgid(gid_t rgid
 	if (rgid != (gid_t) -1)
 		current->gid = rgid;
 	if (sgid != (gid_t) -1)
 		current->sgid = sgid;
 
-	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
@@ -1274,11 +1268,10 @@ asmlinkage long sys_setfsuid(uid_t uid)
 			smp_wmb();
 		}
 		current->fsuid = uid;
 	}
 
-	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS);
 
@@ -1302,11 +1295,10 @@ asmlinkage long sys_setfsgid(gid_t gid)
 		if (gid != old_fsgid) {
 			current->mm->dumpable = suid_dumpable;
 			smp_wmb();
 		}
 		current->fsgid = gid;
-		key_fsgid_changed(current);
 		proc_id_connector(current, PROC_EVENT_GID);
 		notify_task_watchers(WATCH_TASK_GID, 0, current);
 	}
 	return old_fsgid;
 }

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 9/9] Task Watchers v2: Register process events connector
  2006-11-03  4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
                   ` (7 preceding siblings ...)
  2006-11-03  4:23 ` [PATCH 8/9] Task Watchers v2: Register process keyrings " Matt Helsley
@ 2006-11-03  4:23 ` Matt Helsley
  2006-11-03  8:57 ` [PATCH 0/9] Task Watchers v2: Introduction Paul Jackson
  9 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03  4:23 UTC (permalink / raw)
  To: Linux-Kernel
  Cc: Jes Sorensen, LSE-Tech, Chandra S Seetharaman, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, Andrew Morton

[-- Attachment #1: task-watchers-register-procevents --]
[-- Type: text/plain, Size: 14005 bytes --]

Make the Process events connector use task watchers instead of hooking the
paths it's interested in.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
 drivers/connector/cn_proc.c |   52 +++++++++++++++++++++++++++++++-------------
 fs/exec.c                   |    1 
 include/linux/cn_proc.h     |   21 -----------------
 kernel/exit.c               |    2 -
 kernel/fork.c               |    2 -
 kernel/sys.c                |    9 -------
 6 files changed, 37 insertions(+), 50 deletions(-)

Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel

Clone	Number of Children Cloned
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17602.2 	17876.7 	17977.4 	18075.5 	18134.3 	18151.5
Dev	291.294 	376.373 	277.882 	288.971 	278.25	 	276.3
Err (%)	1.65487 	2.10539 	1.54573 	1.59869 	1.53439 	1.52219

Fork	Number of Children Forked
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17691.1 	17770.9 	17932.6 	17996 		18096.4 	18142.9
Dev	300.692 	291.913 	296.654 	279.183 	290.228 	284.693
Err (%)	1.69968 	1.64265 	1.65428 	1.55136 	1.60379 	1.56917

Kernbench:
Elapsed: 124.359s User: 439.756s System: 46.457s CPU: 390.3%
439.87user 46.42system 2:04.44elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.68user 46.42system 2:04.15elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.64system 2:04.40elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.81user 46.42system 2:03.92elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.39system 2:04.48elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.66user 46.41system 2:04.70elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.73user 46.59system 2:04.42elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.97user 46.46system 2:04.45elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.62user 46.40system 2:04.33elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.73user 46.42system 2:04.30elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k

Index: linux-2.6.19-rc2-mm2/drivers/connector/cn_proc.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/drivers/connector/cn_proc.c
+++ linux-2.6.19-rc2-mm2/drivers/connector/cn_proc.c
@@ -25,10 +25,11 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/ktime.h>
 #include <linux/init.h>
 #include <linux/connector.h>
+#include <linux/task_watchers.h>
 #include <asm/atomic.h>
 
 #include <linux/cn_proc.h>
 
 #define CN_PROC_MSG_SIZE (sizeof(struct cn_msg) + sizeof(struct proc_event))
@@ -44,19 +45,20 @@ static inline void get_seq(__u32 *ts, in
 	*ts = get_cpu_var(proc_event_counts)++;
 	*cpu = smp_processor_id();
 	put_cpu_var(proc_event_counts);
 }
 
-void proc_fork_connector(struct task_struct *task)
+static int proc_fork_connector(unsigned long clone_flags,
+			       struct task_struct *task)
 {
 	struct cn_msg *msg;
 	struct proc_event *ev;
 	__u8 buffer[CN_PROC_MSG_SIZE];
 	struct timespec ts;
 
 	if (atomic_read(&proc_event_num_listeners) < 1)
-		return;
+		return 0;
 
 	msg = (struct cn_msg*)buffer;
 	ev = (struct proc_event*)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
@@ -70,21 +72,24 @@ void proc_fork_connector(struct task_str
 	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
 	msg->ack = 0; /* not used */
 	msg->len = sizeof(*ev);
 	/*  If cn_netlink_send() failed, the data is not sent */
 	cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+	return 0;
 }
+task_watcher_func(clone, proc_fork_connector);
 
-void proc_exec_connector(struct task_struct *task)
+static int proc_exec_connector(unsigned long ignore,
+			       struct task_struct *task)
 {
 	struct cn_msg *msg;
 	struct proc_event *ev;
 	struct timespec ts;
 	__u8 buffer[CN_PROC_MSG_SIZE];
 
 	if (atomic_read(&proc_event_num_listeners) < 1)
-		return;
+		return 0;
 
 	msg = (struct cn_msg*)buffer;
 	ev = (struct proc_event*)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
@@ -95,21 +100,23 @@ void proc_exec_connector(struct task_str
 
 	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
 	msg->ack = 0; /* not used */
 	msg->len = sizeof(*ev);
 	cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+	return 0;
 }
+task_watcher_func(exec, proc_exec_connector);
 
-void proc_id_connector(struct task_struct *task, int which_id)
+static int process_change_id(unsigned long which_id, struct task_struct *task)
 {
 	struct cn_msg *msg;
 	struct proc_event *ev;
 	__u8 buffer[CN_PROC_MSG_SIZE];
 	struct timespec ts;
 
 	if (atomic_read(&proc_event_num_listeners) < 1)
-		return;
+		return 0;
 
 	msg = (struct cn_msg*)buffer;
 	ev = (struct proc_event*)msg->data;
 	ev->what = which_id;
 	ev->event_data.id.process_pid = task->pid;
@@ -119,47 +126,64 @@ void proc_id_connector(struct task_struc
 	 	ev->event_data.id.e.euid = task->euid;
 	} else if (which_id == PROC_EVENT_GID) {
 	   	ev->event_data.id.r.rgid = task->gid;
 	   	ev->event_data.id.e.egid = task->egid;
 	} else
-	     	return;
+	     	return 0;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	ev->timestamp_ns = timespec_to_ns(&ts);
 
 	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
 	msg->ack = 0; /* not used */
 	msg->len = sizeof(*ev);
 	cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+	return 0;
+}
+
+static int proc_change_uid_connector(unsigned long ignore,
+				     struct task_struct *task)
+{
+	return process_change_id(PROC_EVENT_UID, task);
+}
+task_watcher_func(uid, proc_change_uid_connector);
+
+static int proc_change_gid_connector(unsigned long ignore,
+				     struct task_struct *task)
+{
+	return process_change_id(PROC_EVENT_GID, task);
 }
+task_watcher_func(gid, proc_change_gid_connector);
 
-void proc_exit_connector(struct task_struct *task)
+static int proc_exit_connector(unsigned long code, struct task_struct *task)
 {
 	struct cn_msg *msg;
 	struct proc_event *ev;
 	__u8 buffer[CN_PROC_MSG_SIZE];
 	struct timespec ts;
 
 	if (atomic_read(&proc_event_num_listeners) < 1)
-		return;
+		return 0;
 
 	msg = (struct cn_msg*)buffer;
 	ev = (struct proc_event*)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	ev->timestamp_ns = timespec_to_ns(&ts);
 	ev->what = PROC_EVENT_EXIT;
 	ev->event_data.exit.process_pid = task->pid;
 	ev->event_data.exit.process_tgid = task->tgid;
-	ev->event_data.exit.exit_code = task->exit_code;
+	ev->event_data.exit.exit_code = code;
 	ev->event_data.exit.exit_signal = task->exit_signal;
 
 	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
 	msg->ack = 0; /* not used */
 	msg->len = sizeof(*ev);
 	cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+	return 0;
 }
+task_watcher_func(exit, proc_exit_connector);
 
 /*
  * Send an acknowledgement message to userspace
  *
  * Use 0 for success, EFOO otherwise.
@@ -226,14 +250,12 @@ static void cn_proc_mcast_ctl(void *data
  */
 static int __init cn_proc_init(void)
 {
 	int err;
 
-	if ((err = cn_add_callback(&cn_proc_event_id, "cn_proc",
-	 			   &cn_proc_mcast_ctl))) {
+	err = cn_add_callback(&cn_proc_event_id, "cn_proc", &cn_proc_mcast_ctl);
+	if (err)
 		printk(KERN_WARNING "cn_proc failed to register\n");
-		return err;
-	}
-	return 0;
+	return err;
 }
 
 module_init(cn_proc_init);
Index: linux-2.6.19-rc2-mm2/kernel/fork.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/fork.c
+++ linux-2.6.19-rc2-mm2/kernel/fork.c
@@ -40,11 +40,10 @@
 #include <linux/mount.h>
 #include <linux/profile.h>
 #include <linux/rmap.h>
 #include <linux/acct.h>
 #include <linux/tsacct_kern.h>
-#include <linux/cn_proc.h>
 #include <linux/delayacct.h>
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
 #include <linux/task_watchers.h>
 
@@ -1212,11 +1211,10 @@ static struct task_struct *copy_process(
 
 	total_forks++;
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
 	notify_task_watchers(WATCH_TASK_CLONE, clone_flags, p);
-	proc_fork_connector(p);
 	return p;
 
 bad_fork_cleanup_namespaces:
 	exit_task_namespaces(p);
 bad_fork_cleanup_mm:
Index: linux-2.6.19-rc2-mm2/kernel/exit.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/exit.c
+++ linux-2.6.19-rc2-mm2/kernel/exit.c
@@ -30,11 +30,10 @@
 #include <linux/taskstats_kern.h>
 #include <linux/delayacct.h>
 #include <linux/syscalls.h>
 #include <linux/signal.h>
 #include <linux/posix-timers.h>
-#include <linux/cn_proc.h>
 #include <linux/mutex.h>
 #include <linux/futex.h>
 #include <linux/compat.h>
 #include <linux/pipe_fs_i.h>
 #include <linux/resource.h>
@@ -927,11 +926,10 @@ fastcall NORET_TYPE void do_exit(long co
 	module_put(task_thread_info(tsk)->exec_domain->module);
 	if (tsk->binfmt)
 		module_put(tsk->binfmt->module);
 
 	tsk->exit_code = code;
-	proc_exit_connector(tsk);
 	exit_notify(tsk);
 	exit_task_namespaces(tsk);
 	/*
 	 * This must happen late, after the PID is not
 	 * hashed anymore:
Index: linux-2.6.19-rc2-mm2/kernel/sys.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/sys.c
+++ linux-2.6.19-rc2-mm2/kernel/sys.c
@@ -25,11 +25,10 @@
 #include <linux/security.h>
 #include <linux/dcookies.h>
 #include <linux/suspend.h>
 #include <linux/tty.h>
 #include <linux/signal.h>
-#include <linux/cn_proc.h>
 #include <linux/getcpu.h>
 #include <linux/seccomp.h>
 #include <linux/task_watchers.h>
 
 #include <linux/compat.h>
@@ -957,11 +956,10 @@ asmlinkage long sys_setregid(gid_t rgid,
 	    (egid != (gid_t) -1 && egid != old_rgid))
 		current->sgid = new_egid;
 	current->fsgid = new_egid;
 	current->egid = new_egid;
 	current->gid = new_rgid;
-	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
 /*
@@ -992,11 +990,10 @@ asmlinkage long sys_setgid(gid_t gid)
 		current->egid = current->fsgid = gid;
 	}
 	else
 		return -EPERM;
 
-	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
   
 static int set_user(uid_t new_ruid, int dumpclear)
@@ -1080,11 +1077,10 @@ asmlinkage long sys_setreuid(uid_t ruid,
 	if (ruid != (uid_t) -1 ||
 	    (euid != (uid_t) -1 && euid != old_ruid))
 		current->suid = current->euid;
 	current->fsuid = current->euid;
 
-	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE);
 }
 
@@ -1127,11 +1123,10 @@ asmlinkage long sys_setuid(uid_t uid)
 		smp_wmb();
 	}
 	current->fsuid = current->euid = uid;
 	current->suid = new_suid;
 
-	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID);
 }
 
@@ -1175,11 +1170,10 @@ asmlinkage long sys_setresuid(uid_t ruid
 	}
 	current->fsuid = current->euid;
 	if (suid != (uid_t) -1)
 		current->suid = suid;
 
-	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES);
 }
 
@@ -1227,11 +1221,10 @@ asmlinkage long sys_setresgid(gid_t rgid
 	if (rgid != (gid_t) -1)
 		current->gid = rgid;
 	if (sgid != (gid_t) -1)
 		current->sgid = sgid;
 
-	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
 asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid)
@@ -1268,11 +1261,10 @@ asmlinkage long sys_setfsuid(uid_t uid)
 			smp_wmb();
 		}
 		current->fsuid = uid;
 	}
 
-	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS);
 
 	return old_fsuid;
@@ -1295,11 +1287,10 @@ asmlinkage long sys_setfsgid(gid_t gid)
 		if (gid != old_fsgid) {
 			current->mm->dumpable = suid_dumpable;
 			smp_wmb();
 		}
 		current->fsgid = gid;
-		proc_id_connector(current, PROC_EVENT_GID);
 		notify_task_watchers(WATCH_TASK_GID, 0, current);
 	}
 	return old_fsgid;
 }
 
Index: linux-2.6.19-rc2-mm2/fs/exec.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/fs/exec.c
+++ linux-2.6.19-rc2-mm2/fs/exec.c
@@ -1086,11 +1086,10 @@ int search_binary_handler(struct linux_b
 					fput(bprm->file);
 				bprm->file = NULL;
 				current->did_exec = 1;
 				notify_task_watchers(WATCH_TASK_EXEC, 0,
 						     current);
-				proc_exec_connector(current);
 				return retval;
 			}
 			read_lock(&binfmt_lock);
 			put_binfmt(fmt);
 			if (retval != -ENOEXEC || bprm->mm == NULL)
Index: linux-2.6.19-rc2-mm2/include/linux/cn_proc.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/cn_proc.h
+++ linux-2.6.19-rc2-mm2/include/linux/cn_proc.h
@@ -95,27 +95,6 @@ struct proc_event {
 			__u32 exit_code, exit_signal;
 		} exit;
 	} event_data;
 };
 
-#ifdef __KERNEL__
-#ifdef CONFIG_PROC_EVENTS
-void proc_fork_connector(struct task_struct *task);
-void proc_exec_connector(struct task_struct *task);
-void proc_id_connector(struct task_struct *task, int which_id);
-void proc_exit_connector(struct task_struct *task);
-#else
-static inline void proc_fork_connector(struct task_struct *task)
-{}
-
-static inline void proc_exec_connector(struct task_struct *task)
-{}
-
-static inline void proc_id_connector(struct task_struct *task,
-				     int which_id)
-{}
-
-static inline void proc_exit_connector(struct task_struct *task)
-{}
-#endif	/* CONFIG_PROC_EVENTS */
-#endif	/* __KERNEL__ */
 #endif	/* CN_PROC_H */

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Task Watchers v2: Introduction
  2006-11-03  4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
                   ` (8 preceding siblings ...)
  2006-11-03  4:23 ` [PATCH 9/9] Task Watchers v2: Register process events connector Matt Helsley
@ 2006-11-03  8:57 ` Paul Jackson
  2006-11-03 22:55   ` Matt Helsley
  9 siblings, 1 reply; 16+ messages in thread
From: Paul Jackson @ 2006-11-03  8:57 UTC (permalink / raw)
  To: Matt Helsley
  Cc: linux-kernel, jes, lse-tech, sekharan, hch, viro, sgrubb,
	linux-audit, akpm

Matt wrote:
> Task watchers is primarily useful to existing kernel code as a means of making
> the code in fork and exit more readable.

I don't get it.  The benchmark data isn't explained in plain English
what it means, that I could find, so I am just guessing.  But looking
at the last (17500) column of the fork results, after applying patch
1/9, I see a number of 18565, and looking at that same column in patch
9/9, I see a number of 18142.

I guess that means a drop of (18565 - 18142 / 18565) == 2% in the fork
rate, to make the code "more readable".

And I'm not even sure it makes it more readable.  Looks to me like another
layer of apparatus, which is one more thing to figure out before a reader
understands what is going on.

I'd gladly put in a few long days to improve the fork rate 2%, and I am
grateful to those who have already done so - whoever they are.

Somewhere I must have missed the memo explaining why this patch is a
good idea - sorry.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] Task Watchers v2: Task watchers v2
  2006-11-03  4:22 ` [PATCH 1/9] Task Watchers v2: Task watchers v2 Matt Helsley
@ 2006-11-03 13:22   ` Daniel Walker
  2006-11-04  0:43     ` Matt Helsley
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Walker @ 2006-11-03 13:22 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Linux-Kernel, Jes Sorensen, LSE-Tech, Chandra S Seetharaman,
	Christoph Hellwig, Al Viro, Steve Grubb, linux-audit,
	Paul Jackson, Andrew Morton

On Thu, 2006-11-02 at 20:22 -0800, Matt Helsley wrote:
> +/*
> + * Watch for events occuring within a task and call the supplied
> function
> + * when (and only when) the given event happens.
> + * Only non-modular kernel code may register functions as
> task_watchers.
> + */
> +#define task_watcher_func(ev, fn) \
> +static task_watcher_fn __task_watcher_##ev##_##fn __attribute_used__
> \
> +       __attribute__ ((__section__ (".task_watchers." #ev))) = fn
> +#else
> +#error "task_watcher() macro may not be used in modules."
> +#endif 

You should make this TASK_WATCHER_FUNC() or even just TASK_WATCHER(). It
looks a little goofy in the code that uses it.

Looking at it now could you do something like,

static int __task_watcher_init 
audit_alloc(unsigned long val, struct task_struct *tsk)

Instead of a macro? Might be a little less invasive.

Daniel 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Task Watchers v2: Introduction
  2006-11-03  8:57 ` [PATCH 0/9] Task Watchers v2: Introduction Paul Jackson
@ 2006-11-03 22:55   ` Matt Helsley
  0 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-03 22:55 UTC (permalink / raw)
  To: Paul Jackson
  Cc: linux-kernel, jes, lse-tech, sekharan, hch, viro, sgrubb,
	linux-audit, akpm

On Fri, 2006-11-03 at 00:57 -0800, Paul Jackson wrote:
> Matt wrote:
> > Task watchers is primarily useful to existing kernel code as a means of making
> > the code in fork and exit more readable.
> 
> I don't get it.  The benchmark data isn't explained in plain English

Sorry, there were no units in the per-patch fork and clone data. Units
there are in tasks created per second. The kernbench units are in place
and should be fairly self-explanatory I think.

Here's what I did:

Measure the time it takes to fork N times. Retry 100 times. Try
different N. Try clone instead of fork to see how different the results
can be.

Then run kernbench.

Do the above after applying each patch. Then compare to the previous
patch (or unpatched source).

Run statistics on the numbers.

> what it means, that I could find, so I am just guessing.  But looking
> at the last (17500) column of the fork results, after applying patch
> 1/9, I see a number of 18565, and looking at that same column in patch
> 9/9, I see a number of 18142.
> 
> I guess that means a drop of (18565 - 18142 / 18565) == 2% in the fork
> rate, to make the code "more readable".

	Well, it's a worst-case scenario. Without the patches I've seen the
fork rate intermittently (once every 300 samples) drop to 16k forks/sec
-- a much bigger drop than 2%. I also ran the tests on Andrew's hotfix
patches for rc2-mm2 and got similar differences even though the patches
don't change the fork path. And finally, don't forget to compare that to
the error -- about +/-1.6%. So on an absolute worst-case workload we
could have a drop anywhere from 0.4 to 3.6%.

	To get a better idea of the normal impact of these patches I think you
have to look at benchmarks more like kernbench since it's not comprised
entirely of fork calls. There the measurements are easily within the
error margins with or without the patches.

	Unfortunately the differences I get always seem to be right around the
size of the error. I can't seem to get a benchmark to have an error of
1% or less. I'm open to suggestions of different benchmarks or how to
obtain tighter bounds on the measurements (e.g. /proc knobs to fiddle
with).

> And I'm not even sure it makes it more readable.  Looks to me like another
> layer of apparatus, which is one more thing to figure out before a reader
> understands what is going on.

	It's nice to see a module's init function with the rest of the module
and not cluttering up the kernel's module loading code. The use,
benefits, disadvantages, and even the implementation of task watchers
are similar. I could rename it (task_init(), task_exit(), etc.) to make
the similarity more apparent.

> I'd gladly put in a few long days to improve the fork rate 2%, and I am
> grateful to those who have already done so - whoever they are.

I'm open to suggestions on how to improve the performance. :)

> Somewhere I must have missed the memo explaining why this patch is a
> good idea - sorry.

	Well, it should make things look cleaner. It's also intended to be
useful in new code like containers and resource management -- pieces
many people don't want to pay attention to in those paths.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] Task Watchers v2: Task watchers v2
  2006-11-03 13:22   ` Daniel Walker
@ 2006-11-04  0:43     ` Matt Helsley
  2006-11-04  1:13       ` Daniel Walker
  0 siblings, 1 reply; 16+ messages in thread
From: Matt Helsley @ 2006-11-04  0:43 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Linux-Kernel, Jes Sorensen, LSE-Tech, Chandra S Seetharaman,
	Christoph Hellwig, Al Viro, Steve Grubb, linux-audit,
	Paul Jackson, Andrew Morton

On Fri, 2006-11-03 at 08:22 -0500, Daniel Walker wrote:
> On Thu, 2006-11-02 at 20:22 -0800, Matt Helsley wrote:
> > +/*
> > + * Watch for events occuring within a task and call the supplied
> > function
> > + * when (and only when) the given event happens.
> > + * Only non-modular kernel code may register functions as
> > task_watchers.
> > + */
> > +#define task_watcher_func(ev, fn) \
> > +static task_watcher_fn __task_watcher_##ev##_##fn __attribute_used__
> > \
> > +       __attribute__ ((__section__ (".task_watchers." #ev))) = fn
> > +#else
> > +#error "task_watcher() macro may not be used in modules."
> > +#endif 
> 
> You should make this TASK_WATCHER_FUNC() or even just TASK_WATCHER(). It
> looks a little goofy in the code that uses it.

I can certainly change this. In my defense I didn't capitalize it
because very similar macros in init.h were not capitalized. For example:

#define core_initcall(fn)               __define_initcall("1",fn)
#define postcore_initcall(fn)           __define_initcall("2",fn)
#define arch_initcall(fn)               __define_initcall("3",fn)
#define subsys_initcall(fn)             __define_initcall("4",fn)
#define fs_initcall(fn)                 __define_initcall("5",fn)
#define device_initcall(fn)             __define_initcall("6",fn)
#define late_initcall(fn)               __define_initcall("7",fn)

setup_param, early_param, module_init, etc. do not use all-caps. And I'm
sure that's not all.

All of these declare variables and assign them attributes and values.

> Looking at it now could you do something like,
> 
> static int __task_watcher_init 
> audit_alloc(unsigned long val, struct task_struct *tsk)
> 
> Instead of a macro? Might be a little less invasive.

	I like your suggestion. However, I don't see how such a macro could be
made to replace the current macro.

	I need to be able to call every init function during task
initialization. The current macro creates and initializes a function
pointer in an array in the special ELF section. This allows the
notify_task_watchers function to traverse the array and make calls to
the init functions.

	I use the name of the function and event to name and intialize the
function pointer. I don't see any way to get the name of the function
without taking a parameter. This also means it would have to be
initialized after the function was declared or defined.

	I considered placing the function code in the ELF section. However I
don't know of any gcc or linker functions that would allow me to iterate
over all of the functions in an ELF section and call them from fork,
exec, exit, etc. I've even looked through the docs and googled.

	I considered doing symbol lookups. Part of the problem is knowing the
names I need to look up. Furthermore, I think doing symbol lookups for
each call would be alot slower. I could create a dynamically-allocated
array and put the lookup results there. However that's more code and
more memory...

	However, your suggestion could put all of the functions near each
other. That locality could improve performance. So I'll try adding
__task_watcher_<event> macros but I can't see a way to make them work as
you suggested.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] Task Watchers v2: Task watchers v2
  2006-11-04  0:43     ` Matt Helsley
@ 2006-11-04  1:13       ` Daniel Walker
  2006-11-05  0:12         ` Matt Helsley
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Walker @ 2006-11-04  1:13 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Linux-Kernel, Jes Sorensen, LSE-Tech, Chandra S Seetharaman,
	Christoph Hellwig, Al Viro, Steve Grubb, linux-audit,
	Paul Jackson, Andrew Morton

On Fri, 2006-11-03 at 16:43 -0800, Matt Helsley wrote:

> I can certainly change this. In my defense I didn't capitalize it
> because very similar macros in init.h were not capitalized. For example:
> 
> #define core_initcall(fn)               __define_initcall("1",fn)
> #define postcore_initcall(fn)           __define_initcall("2",fn)
> #define arch_initcall(fn)               __define_initcall("3",fn)
> #define subsys_initcall(fn)             __define_initcall("4",fn)
> #define fs_initcall(fn)                 __define_initcall("5",fn)
> #define device_initcall(fn)             __define_initcall("6",fn)
> #define late_initcall(fn)               __define_initcall("7",fn)
> 
> setup_param, early_param, module_init, etc. do not use all-caps. And I'm
> sure that's not all.

True .. It's not mandatory. The reason that I mentioned it is because it
looked like a function was being called outside a function block, which
looks odd to me. I think I overlook the initcall functions because I see
them so often I know what they are.

> All of these declare variables and assign them attributes and values.
> 
> > Looking at it now could you do something like,
> > 
> > static int __task_watcher_init 
> > audit_alloc(unsigned long val, struct task_struct *tsk)
> > 
> > Instead of a macro? Might be a little less invasive.
> 
> 	I like your suggestion. However, I don't see how such a macro could be
> made to replace the current macro.
> 
> 	I need to be able to call every init function during task
> initialization. The current macro creates and initializes a function
> pointer in an array in the special ELF section. This allows the
> notify_task_watchers function to traverse the array and make calls to
> the init functions.


You get an "A" for research. I didn't notice you actually declare a
variable inside the macro. I thought it was only setting a section
attribute. You right, I don't see how you could call the functions in
the section without the variable declared. ( besides that's exactly how
the initcalls work. )

Daniel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] Task Watchers v2: Task watchers v2
  2006-11-04  1:13       ` Daniel Walker
@ 2006-11-05  0:12         ` Matt Helsley
  0 siblings, 0 replies; 16+ messages in thread
From: Matt Helsley @ 2006-11-05  0:12 UTC (permalink / raw)
  To: dwalker
  Cc: Linux-Kernel, Jes Sorensen, LSE-Tech, Chandra S Seetharaman,
	Christoph Hellwig, Al Viro, Steve Grubb, linux-audit,
	Paul Jackson, Andrew Morton

On Fri, 2006-11-03 at 17:13 -0800, Daniel Walker wrote:
> On Fri, 2006-11-03 at 16:43 -0800, Matt Helsley wrote:
> 
> > I can certainly change this. In my defense I didn't capitalize it
> > because very similar macros in init.h were not capitalized. For example:
> > 
> > #define core_initcall(fn)               __define_initcall("1",fn)
> > #define postcore_initcall(fn)           __define_initcall("2",fn)
> > #define arch_initcall(fn)               __define_initcall("3",fn)
> > #define subsys_initcall(fn)             __define_initcall("4",fn)
> > #define fs_initcall(fn)                 __define_initcall("5",fn)
> > #define device_initcall(fn)             __define_initcall("6",fn)
> > #define late_initcall(fn)               __define_initcall("7",fn)
> > 
> > setup_param, early_param, module_init, etc. do not use all-caps. And I'm
> > sure that's not all.
> 
> True .. It's not mandatory. The reason that I mentioned it is because it
> looked like a function was being called outside a function block, which
> looks odd to me. I think I overlook the initcall functions because I see
> them so often I know what they are.

This is a good point -- it does look odd. I'm considering:

DEFINE_TASK_INITCALL(audit_alloc);

With others like:

DEFINE_TASK_EXITCALL()
DEFINE_TASK_CLONECALL()
etc.

That resembles other macros which create variables. Though I'm not sure
this patten is appropriate because these variables should not be used by
name.

Seems that no matter what something about it is going to be unusual. :)

> > All of these declare variables and assign them attributes and values.
> > 
> > > Looking at it now could you do something like,
> > > 
> > > static int __task_watcher_init 
> > > audit_alloc(unsigned long val, struct task_struct *tsk)
> > > 
> > > Instead of a macro? Might be a little less invasive.
> > 
> > 	I like your suggestion. However, I don't see how such a macro could be
> > made to replace the current macro.
> > 
> > 	I need to be able to call every init function during task
> > initialization. The current macro creates and initializes a function
> > pointer in an array in the special ELF section. This allows the
> > notify_task_watchers function to traverse the array and make calls to
> > the init functions.
> 
> 
> You get an "A" for research. I didn't notice you actually declare a

Thanks!

<snip>

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-11-05  0:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-03  4:22 [PATCH 0/9] Task Watchers v2: Introduction Matt Helsley
2006-11-03  4:22 ` [PATCH 1/9] Task Watchers v2: Task watchers v2 Matt Helsley
2006-11-03 13:22   ` Daniel Walker
2006-11-04  0:43     ` Matt Helsley
2006-11-04  1:13       ` Daniel Walker
2006-11-05  0:12         ` Matt Helsley
2006-11-03  4:22 ` [PATCH 2/9] Task Watchers v2: Register audit task watcher Matt Helsley
2006-11-03  4:23 ` [PATCH 3/9] Task Watchers v2: Register semundo " Matt Helsley
2006-11-03  4:23 ` [PATCH 4/9] Task Watchers v2: Register cpuset " Matt Helsley
2006-11-03  4:23 ` [PATCH 5/9] Task Watchers v2: Register NUMA mempolicy " Matt Helsley
2006-11-03  4:23 ` [PATCH 6/9] Task Watchers v2: Register IRQ flag tracing " Matt Helsley
2006-11-03  4:23 ` [PATCH 7/9] Task Watchers v2: Register lockdep " Matt Helsley
2006-11-03  4:23 ` [PATCH 8/9] Task Watchers v2: Register process keyrings " Matt Helsley
2006-11-03  4:23 ` [PATCH 9/9] Task Watchers v2: Register process events connector Matt Helsley
2006-11-03  8:57 ` [PATCH 0/9] Task Watchers v2: Introduction Paul Jackson
2006-11-03 22:55   ` Matt Helsley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox