[PATCH 00/10] Introduction

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/10] Introduction
@ 2006-12-15  0:07 Matt Helsley
  2006-12-15  0:07 ` Task watchers v2 Matt Helsley
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15  0:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-Kernel, Jes Sorensen, Christoph Hellwig, Al Viro,
	Steve Grubb, linux-audit, Paul Jackson

This is version 2 of my Task Watchers patches with performance enhancements.

Task watchers calls functions whenever a task forks, execs, changes its
[re][ug]id, or exits.

Task watchers is primarily useful to existing kernel code as a means of making
the code in fork and exit more readable. Kernel code uses these paths by
marking a function as a task watcher much like modules mark their init
functions with module_init(). This improves the readability of copy_process().

The first patch adds the basic infrastructure of task watchers: notification
function calls in the various paths and a table of function pointers to be
called. It uses an ELF section because parts of the table must be gathered
from all over the kernel code and using the linker is easier than resolving
and maintaining complex header interdependencies. Furthermore, using a list
proved to have much higher impact on the size of the patches and was deemed
unacceptable overhead. An ELF table is also ideal because its "readonly" nature means that no locking nor list traversal are required.

Subsequent patches adapt existing parts of the kernel to use a task watcher
 -- typically in the fork, clone, and exit paths:

        FEATURE (notes)                               RELEVANT CONFIG VARIABLE
	-----------------------------------------------------------------------
	audit                                         [ CONFIG_AUDIT ...      ]
	semundo                                       [ CONFIG_SYSVIPC        ]
	cpusets                                       [ CONFIG_CPUSETS        ]
	mempolicy                                     [ CONFIG_NUMA           ]
	trace irqflags                                [ CONFIG_TRACE_IRQFLAGS ]
	lockdep                                       [ CONFIG_LOCKDEP        ]
	keys (for processes -- not for thread groups) [ CONFIG_KEYS           ]
	process events connector                      [ CONFIG_PROC_EVENTS    ]

TODO:
	Mark the task watcher table ELF section read-only. I've tried to "fix"
	the .lds files to do this with no success. I'd really appreciate help
	from folks familiar with writing linker scripts.

	I'm working on three more patches that add support for creating a task
	watcher from within a module using an ELF section. They haven't recieved
	as much attention since I've been focusing on measuring the performance
	impact of these patches.

Changes:
since v2 ():
	Added ELF section annotations to the functions handling the events
	Added section annotation to the lookup table in kernel/task_watchers.c
	Added prefetch hints to the function pointer array walk
	Renamed the macros (better?)
	Retested the patches
	Reduced noise in test results (0.6 - 1%, 2+% previously)

With the last prefetch patch I was able to measure a performance increase in
the range of 0.4 to 2.8%. I sampled 100 times and took the mean for each patch.
Since the numbers seemed to be a source of confusion last time I've tried to
simplify them here:

Patch    Mean (forks/second)
0        6925.16 (baseline)
1        7170.81  task watchers
2        7100.34  audit
3        7114.47  semundo
4        7185.7   cpusets
5        7121.41  numa-mempolicy
6        7070.82  irqflags
7        7012.61  lockdep
8        7116.54  keys
9        7116.35  procevents
12       7109.52  prefetch
----------------------------------------------------
7109.52 - 6925.16 = +184 forks/second (+2.6%)

So the patch series now actually improves performance a little.

All the numbers from the tests are available if anyone wishes to analyze them
independently.

Please consider for inclusion in -mm.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Task watchers v2
  2006-12-15  0:07 [PATCH 00/10] Introduction Matt Helsley
@ 2006-12-15  0:07 ` Matt Helsley
  2006-12-15  8:34   ` Christoph Hellwig
  2006-12-18  5:44   ` Zhang, Yanmin
  2006-12-15  0:07 ` Register audit task watcher Matt Helsley
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15  0:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-Kernel, Jes Sorensen, Christoph Hellwig, Al Viro,
	Steve Grubb, linux-audit, Paul Jackson

[-- Attachment #1: task-watchers-v2 --]
[-- Type: text/plain, Size: 14780 bytes --]

Associate function calls with significant events in a task's lifetime much like
we handle kernel and module init/exit functions. This creates a table for each
of the following events in the task_watchers_table ELF section:

WATCH_TASK_INIT at the beginning of a fork/clone system call when the
new task struct first becomes available.

WATCH_TASK_CLONE just before returning successfully from a fork/clone.

WATCH_TASK_EXEC just before successfully returning from the exec
system call.

WATCH_TASK_UID every time a task's real or effective user id changes.

WATCH_TASK_GID every time a task's real or effective group id changes.

WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
for any reason. 

WATCH_TASK_FREE is called before critical task structures like
the mm_struct become inaccessible and the task is subsequently freed.

The next patch will add a debugfs interface for measuring fork and exit rates
which can be used to calculate the overhead of the task watcher infrastructure.

Subsequent patches will make use of task watchers to simplify fork, exit,
and many of the system calls that set [er][ug]ids.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: linux-audit@redhat.com
Cc: Paul Jackson <pj@sgi.com>
---
 fs/exec.c                         |    3 ++
 include/asm-generic/vmlinux.lds.h |   26 +++++++++++++++++++++
 include/linux/init.h              |   47 ++++++++++++++++++++++++++++++++++++++
 kernel/Makefile                   |    2 -
 kernel/exit.c                     |    3 ++
 kernel/fork.c                     |   15 ++++++++----
 kernel/sys.c                      |    9 +++++++
 kernel/task_watchers.c            |   41 +++++++++++++++++++++++++++++++++
 8 files changed, 141 insertions(+), 5 deletions(-)

Index: linux-2.6.19/kernel/sys.c
===================================================================
--- linux-2.6.19.orig/kernel/sys.c
+++ linux-2.6.19/kernel/sys.c
@@ -27,10 +27,11 @@
 #include <linux/suspend.h>
 #include <linux/tty.h>
 #include <linux/signal.h>
 #include <linux/cn_proc.h>
 #include <linux/getcpu.h>
+#include <linux/init.h>
 
 #include <linux/compat.h>
 #include <linux/syscalls.h>
 #include <linux/kprobes.h>
 
@@ -957,10 +958,11 @@ asmlinkage long sys_setregid(gid_t rgid,
 	current->fsgid = new_egid;
 	current->egid = new_egid;
 	current->gid = new_rgid;
 	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
+	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
 /*
  * setgid() is implemented like SysV w/ SAVED_IDS 
@@ -992,10 +994,11 @@ asmlinkage long sys_setgid(gid_t gid)
 	else
 		return -EPERM;
 
 	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
+	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
   
 static int set_user(uid_t new_ruid, int dumpclear)
 {
@@ -1080,10 +1083,11 @@ asmlinkage long sys_setreuid(uid_t ruid,
 		current->suid = current->euid;
 	current->fsuid = current->euid;
 
 	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
+	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE);
 }
 
 
@@ -1127,10 +1131,11 @@ asmlinkage long sys_setuid(uid_t uid)
 	current->fsuid = current->euid = uid;
 	current->suid = new_suid;
 
 	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
+	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID);
 }
 
 
@@ -1175,10 +1180,11 @@ asmlinkage long sys_setresuid(uid_t ruid
 	if (suid != (uid_t) -1)
 		current->suid = suid;
 
 	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
+	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES);
 }
 
 asmlinkage long sys_getresuid(uid_t __user *ruid, uid_t __user *euid, uid_t __user *suid)
@@ -1227,10 +1233,11 @@ asmlinkage long sys_setresgid(gid_t rgid
 	if (sgid != (gid_t) -1)
 		current->sgid = sgid;
 
 	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
+	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
 asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid)
 {
@@ -1268,10 +1275,11 @@ asmlinkage long sys_setfsuid(uid_t uid)
 		current->fsuid = uid;
 	}
 
 	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
+	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS);
 
 	return old_fsuid;
 }
@@ -1295,10 +1303,11 @@ asmlinkage long sys_setfsgid(gid_t gid)
 			smp_wmb();
 		}
 		current->fsgid = gid;
 		key_fsgid_changed(current);
 		proc_id_connector(current, PROC_EVENT_GID);
+		notify_task_watchers(WATCH_TASK_GID, 0, current);
 	}
 	return old_fsgid;
 }
 
 asmlinkage long sys_times(struct tms __user * tbuf)
Index: linux-2.6.19/kernel/exit.c
===================================================================
--- linux-2.6.19.orig/kernel/exit.c
+++ linux-2.6.19/kernel/exit.c
@@ -6,10 +6,11 @@
 
 #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/interrupt.h>
 #include <linux/smp_lock.h>
+#include <linux/init.h>
 #include <linux/module.h>
 #include <linux/capability.h>
 #include <linux/completion.h>
 #include <linux/personality.h>
 #include <linux/tty.h>
@@ -882,10 +883,11 @@ fastcall NORET_TYPE void do_exit(long co
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		schedule();
 	}
 
 	tsk->flags |= PF_EXITING;
+	notify_task_watchers(WATCH_TASK_EXIT, code, tsk);
 
 	if (unlikely(in_atomic()))
 		printk(KERN_INFO "note: %s[%d] exited with preempt_count %d\n",
 				current->comm, current->pid,
 				preempt_count());
@@ -913,10 +915,11 @@ fastcall NORET_TYPE void do_exit(long co
 		audit_free(tsk);
 	taskstats_exit_send(tsk, tidstats, group_dead, mycpu);
 	taskstats_exit_free(tidstats);
 
 	exit_mm(tsk);
+	notify_task_watchers(WATCH_TASK_FREE, code, tsk);
 
 	if (group_dead)
 		acct_process();
 	exit_sem(tsk);
 	__exit_files(tsk);
Index: linux-2.6.19/fs/exec.c
===================================================================
--- linux-2.6.19.orig/fs/exec.c
+++ linux-2.6.19/fs/exec.c
@@ -47,10 +47,11 @@
 #include <linux/syscalls.h>
 #include <linux/rmap.h>
 #include <linux/tsacct_kern.h>
 #include <linux/cn_proc.h>
 #include <linux/audit.h>
+#include <linux/init.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
 
 #ifdef CONFIG_KMOD
@@ -1082,10 +1083,12 @@ int search_binary_handler(struct linux_b
 				allow_write_access(bprm->file);
 				if (bprm->file)
 					fput(bprm->file);
 				bprm->file = NULL;
 				current->did_exec = 1;
+				notify_task_watchers(WATCH_TASK_EXEC, 0,
+						     current);
 				proc_exec_connector(current);
 				return retval;
 			}
 			read_lock(&binfmt_lock);
 			put_binfmt(fmt);
Index: linux-2.6.19/kernel/task_watchers.c
===================================================================
--- /dev/null
+++ linux-2.6.19/kernel/task_watchers.c
@@ -0,0 +1,41 @@
+#include <linux/init.h>
+
+/* Defined in include/asm-generic/vmlinux.lds.h */
+extern const task_watcher_fn __start_task_init[],
+		__start_task_clone[], __start_task_exec[],
+		__start_task_uid[], __start_task_gid[],
+		__start_task_exit[], __start_task_free[],
+		__stop_task_free[];
+
+/*
+ *  Tables of ptrs to the first watcher func for WATCH_TASK_*
+ */
+static const task_watcher_fn __attribute__((__section__(".task.table"))) \
+	     *twtable[] = {
+	__start_task_init,
+	__start_task_clone,
+	__start_task_exec,
+	__start_task_uid,
+	__start_task_gid,
+	__start_task_exit,
+	__start_task_free,
+	__stop_task_free,
+};
+
+int notify_task_watchers(unsigned int ev, unsigned long val,
+			 struct task_struct *tsk)
+{
+	const task_watcher_fn *tw_call, *tw_end;
+	int ret_err = 0, err;
+
+	tw_call = twtable[ev];
+	tw_end = twtable[ev + 1];
+
+	/* Call all of the watchers, report the first error */
+	for (; tw_call < tw_end; tw_call++) {
+		err = (*tw_call)(val, tsk);
+		if (unlikely((err < 0) && (ret_err == 0)))
+			ret_err = err;
+	}
+	return ret_err;
+}
Index: linux-2.6.19/kernel/Makefile
===================================================================
--- linux-2.6.19.orig/kernel/Makefile
+++ linux-2.6.19/kernel/Makefile
@@ -6,11 +6,11 @@ obj-y     = sched.o fork.o exec_domain.o
 	    exit.o itimer.o time.o softirq.o resource.o \
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
 	    rcupdate.o extable.o params.o posix-timers.o \
 	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
-	    hrtimer.o rwsem.o latency.o nsproxy.o srcu.o
+	    hrtimer.o rwsem.o latency.o nsproxy.o srcu.o task_watchers.o
 
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
 obj-y += time/
 obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
 obj-$(CONFIG_LOCKDEP) += lockdep.o
Index: linux-2.6.19/kernel/fork.c
===================================================================
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -46,10 +46,11 @@
 #include <linux/tsacct_kern.h>
 #include <linux/cn_proc.h>
 #include <linux/delayacct.h>
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
+#include <linux/init.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -1052,10 +1053,18 @@ static struct task_struct *copy_process(
 	do_posix_clock_monotonic_gettime(&p->start_time);
 	p->security = NULL;
 	p->io_context = NULL;
 	p->io_wait = NULL;
 	p->audit_context = NULL;
+
+	p->tgid = p->pid;
+	if (clone_flags & CLONE_THREAD)
+		p->tgid = current->tgid;
+
+	retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
+	if (retval < 0)
+		goto bad_fork_cleanup_delays_binfmt;
 	cpuset_fork(p);
 #ifdef CONFIG_NUMA
  	p->mempolicy = mpol_copy(p->mempolicy);
  	if (IS_ERR(p->mempolicy)) {
  		retval = PTR_ERR(p->mempolicy);
@@ -1091,14 +1100,10 @@ static struct task_struct *copy_process(
 
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
 
-	p->tgid = p->pid;
-	if (clone_flags & CLONE_THREAD)
-		p->tgid = current->tgid;
-
 	if ((retval = security_task_alloc(p)))
 		goto bad_fork_cleanup_policy;
 	if ((retval = audit_alloc(p)))
 		goto bad_fork_cleanup_security;
 	/* copy all the process information */
@@ -1255,10 +1260,11 @@ static struct task_struct *copy_process(
 	}
 
 	total_forks++;
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
+	notify_task_watchers(WATCH_TASK_CLONE, clone_flags, p);
 	proc_fork_connector(p);
 	return p;
 
 bad_fork_cleanup_namespaces:
 	exit_task_namespaces(p);
@@ -1287,10 +1293,11 @@ bad_fork_cleanup_policy:
 bad_fork_cleanup_cpuset:
 #endif
 	cpuset_exit(p);
 bad_fork_cleanup_delays_binfmt:
 	delayacct_tsk_free(p);
+	notify_task_watchers(WATCH_TASK_FREE, 0, p);
 	if (p->binfmt)
 		module_put(p->binfmt->module);
 bad_fork_cleanup_put_domain:
 	module_put(task_thread_info(p)->exec_domain->module);
 bad_fork_cleanup_count:
Index: linux-2.6.19/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6.19.orig/include/asm-generic/vmlinux.lds.h
+++ linux-2.6.19/include/asm-generic/vmlinux.lds.h
@@ -42,10 +42,36 @@
 		VMLINUX_SYMBOL(__start_rio_route_ops) = .;		\
 		*(.rio_route_ops)					\
 		VMLINUX_SYMBOL(__end_rio_route_ops) = .;		\
 	}								\
 									\
+	.task : AT(ADDR(.task) - LOAD_OFFSET) {			\
+		*(.task.table)					\
+		VMLINUX_SYMBOL(__start_task_init) = .;		\
+		*(.task.INIT)					\
+		VMLINUX_SYMBOL(__start_task_clone) = .;		\
+		*(.task.CLONE)					\
+		VMLINUX_SYMBOL(__start_task_exec) = .;		\
+		*(.task.EXEC)					\
+		VMLINUX_SYMBOL(__start_task_uid) = .;		\
+		*(.task.UID)					\
+		VMLINUX_SYMBOL(__start_task_gid) = .;		\
+		*(.task.GID)					\
+		VMLINUX_SYMBOL(__start_task_exit) = .;		\
+		*(.task.EXIT)					\
+		VMLINUX_SYMBOL(__start_task_free) = .;		\
+		*(.task.FREE)					\
+		VMLINUX_SYMBOL(__stop_task_free) = .;		\
+		*(.task.function.FREE)				\
+		*(.task.function.INIT)				\
+		*(.task.function.CLONE)				\
+		*(.task.function.EXEC)				\
+		*(.task.function.UID)				\
+		*(.task.function.GID)				\
+		*(.task.function.EXIT)				\
+	}								\
+									\
 	/* Kernel symbol table: Normal symbols */			\
 	__ksymtab         : AT(ADDR(__ksymtab) - LOAD_OFFSET) {		\
 		VMLINUX_SYMBOL(__start___ksymtab) = .;			\
 		*(__ksymtab)						\
 		VMLINUX_SYMBOL(__stop___ksymtab) = .;			\
Index: linux-2.6.19/include/linux/init.h
===================================================================
--- linux-2.6.19.orig/include/linux/init.h
+++ linux-2.6.19/include/linux/init.h
@@ -292,6 +292,53 @@ void __init parse_early_param(void);
 #define __exit_p(x) x
 #else
 #define __exit_p(x) NULL
 #endif
 
+#define WATCH_TASK_INIT  0
+#define WATCH_TASK_CLONE 1
+#define WATCH_TASK_EXEC  2
+#define WATCH_TASK_UID   3
+#define WATCH_TASK_GID   4
+#define WATCH_TASK_EXIT  5
+#define WATCH_TASK_FREE  6
+#define NUM_WATCH_TASK_EVENTS 7
+
+#ifndef __ASSEMBLY__
+#ifndef MODULE
+struct task_struct; /* avoid including sched.h */
+
+typedef int (*task_watcher_fn)(unsigned long, struct task_struct*);
+extern int notify_task_watchers(unsigned int ev_idx, unsigned long val,
+				struct task_struct *tsk);
+
+/*
+ * Watch for events occuring within a task and call the supplied function
+ * when (and only when) the event happens.
+ * Only non-modular kernel code may register functions as task_watchers.
+ */
+#define __task_func(ev, fn) \
+static task_watcher_fn __task_##ev##_##fn __attribute_used__ \
+	__attribute__ ((__section__ (".task." #ev))) = fn
+
+#define DEFINE_TASK_INITCALL(fn) __task_func(INIT, fn)
+#define DEFINE_TASK_CLONECALL(fn) __task_func(CLONE, fn)
+#define DEFINE_TASK_EXECCALL(fn) __task_func(EXEC, fn)
+#define DEFINE_TASK_UIDCALL(fn) __task_func(UID, fn)
+#define DEFINE_TASK_GIDCALL(fn) __task_func(GID, fn)
+#define DEFINE_TASK_EXITCALL(fn) __task_func(EXIT, fn)
+#define DEFINE_TASK_FREECALL(fn) __task_func(FREE, fn)
+
+#define __task_func_section(sect) \
+        __attribute__((__section__(".task.function." #sect)))
+
+#define __task_init __task_func_section(INIT)
+#define __task_clone __task_func_section(CLONE)
+#define __task_exec __task_func_section(EXEC)
+#define __task_uid __task_func_section(UID)
+#define __task_gid __task_func_section(GID)
+#define __task_exit __task_func_section(EXIT)
+#define __task_free __task_func_section(FREE)
+#endif /* ndef MODULE */
+#endif /* ndef __ASSEMBLY__ */
+
 #endif /* _LINUX_INIT_H */

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Register audit task watcher
  2006-12-15  0:07 [PATCH 00/10] Introduction Matt Helsley
  2006-12-15  0:07 ` Task watchers v2 Matt Helsley
@ 2006-12-15  0:07 ` Matt Helsley
  2006-12-15  0:07 ` Register semundo " Matt Helsley
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15  0:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-Kernel, Jes Sorensen, Christoph Hellwig, Al Viro,
	Steve Grubb, linux-audit, Paul Jackson

[-- Attachment #1: task-watchers-register-audit --]
[-- Type: text/plain, Size: 6668 bytes --]

Change audit to register a task watcher function rather than modify
the copy_process() and do_exit() paths directly.

Removes an unlikely() hint from kernel/exit.c:
	if (unlikely(tsk->audit_context))
		audit_free(tsk);
This use of unlikely() is an artifact of audit_free()'s former invocation from
__put_task_struct() (commit: fa84cb935d4ec601528f5e2f0d5d31e7876a5044).
Clearly in the __put_task_struct() path it would be called much more frequently
than do_exit() and hence the use of unlikely() there was justified. However, in
the new location the hint most likely offers no measurable performance impact.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: linux-audit@redhat.com
---
 include/linux/audit.h |    4 ----
 kernel/auditsc.c      |    9 ++++++---
 kernel/exit.c         |    3 ---
 kernel/fork.c         |    7 +------
 4 files changed, 7 insertions(+), 16 deletions(-)

Index: linux-2.6.19/kernel/auditsc.c
===================================================================
--- linux-2.6.19.orig/kernel/auditsc.c
+++ linux-2.6.19/kernel/auditsc.c
@@ -677,11 +677,11 @@ static inline struct audit_context *audi
  * Filter on the task information and allocate a per-task audit context
  * if necessary.  Doing so turns on system call auditing for the
  * specified task.  This is called from copy_process, so no lock is
  * needed.
  */
-int audit_alloc(struct task_struct *tsk)
+static int __task_init audit_alloc(unsigned long val, struct task_struct *tsk)
 {
 	struct audit_context *context;
 	enum audit_state     state;
 
 	if (likely(!audit_enabled))
@@ -703,10 +703,11 @@ int audit_alloc(struct task_struct *tsk)
 
 	tsk->audit_context  = context;
 	set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
 	return 0;
 }
+DEFINE_TASK_INITCALL(audit_alloc);
 
 static inline void audit_free_context(struct audit_context *context)
 {
 	struct audit_context *previous;
 	int		     count = 0;
@@ -1033,28 +1034,30 @@ static void audit_log_exit(struct audit_
  * audit_free - free a per-task audit context
  * @tsk: task whose audit context block to free
  *
  * Called from copy_process and do_exit
  */
-void audit_free(struct task_struct *tsk)
+static int __task_free audit_free(unsigned long val, struct task_struct *tsk)
 {
 	struct audit_context *context;
 
 	context = audit_get_context(tsk, 0, 0);
 	if (likely(!context))
-		return;
+		return 0;
 
 	/* Check for system calls that do not go through the exit
 	 * function (e.g., exit_group), then free context block. 
 	 * We use GFP_ATOMIC here because we might be doing this 
 	 * in the context of the idle thread */
 	/* that can happen only if we are called from do_exit() */
 	if (context->in_syscall && context->auditable)
 		audit_log_exit(context, tsk);
 
 	audit_free_context(context);
+	return 0;
 }
+DEFINE_TASK_FREECALL(audit_free);
 
 /**
  * audit_syscall_entry - fill in an audit record at syscall entry
  * @tsk: task being audited
  * @arch: architecture type
Index: linux-2.6.19/include/linux/audit.h
===================================================================
--- linux-2.6.19.orig/include/linux/audit.h
+++ linux-2.6.19/include/linux/audit.h
@@ -332,12 +332,10 @@ struct mqstat;
 extern int __init audit_register_class(int class, unsigned *list);
 extern int audit_classify_syscall(int abi, unsigned syscall);
 #ifdef CONFIG_AUDITSYSCALL
 /* These are defined in auditsc.c */
 				/* Public API */
-extern int  audit_alloc(struct task_struct *task);
-extern void audit_free(struct task_struct *task);
 extern void audit_syscall_entry(int arch,
 				int major, unsigned long a0, unsigned long a1,
 				unsigned long a2, unsigned long a3);
 extern void audit_syscall_exit(int failed, long return_code);
 extern void __audit_getname(const char *name);
@@ -432,12 +430,10 @@ static inline int audit_mq_getsetattr(mq
 		return __audit_mq_getsetattr(mqdes, mqstat);
 	return 0;
 }
 extern int audit_n_rules;
 #else
-#define audit_alloc(t) ({ 0; })
-#define audit_free(t) do { ; } while (0)
 #define audit_syscall_entry(ta,a,b,c,d,e) do { ; } while (0)
 #define audit_syscall_exit(f,r) do { ; } while (0)
 #define audit_dummy_context() 1
 #define audit_getname(n) do { ; } while (0)
 #define audit_putname(n) do { ; } while (0)
Index: linux-2.6.19/kernel/fork.c
===================================================================
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -37,11 +37,10 @@
 #include <linux/jiffies.h>
 #include <linux/futex.h>
 #include <linux/rcupdate.h>
 #include <linux/ptrace.h>
 #include <linux/mount.h>
-#include <linux/audit.h>
 #include <linux/profile.h>
 #include <linux/rmap.h>
 #include <linux/acct.h>
 #include <linux/tsacct_kern.h>
 #include <linux/cn_proc.h>
@@ -1102,15 +1101,13 @@ static struct task_struct *copy_process(
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
 
 	if ((retval = security_task_alloc(p)))
 		goto bad_fork_cleanup_policy;
-	if ((retval = audit_alloc(p)))
-		goto bad_fork_cleanup_security;
 	/* copy all the process information */
 	if ((retval = copy_semundo(clone_flags, p)))
-		goto bad_fork_cleanup_audit;
+		goto bad_fork_cleanup_security;
 	if ((retval = copy_files(clone_flags, p)))
 		goto bad_fork_cleanup_semundo;
 	if ((retval = copy_fs(clone_flags, p)))
 		goto bad_fork_cleanup_files;
 	if ((retval = copy_sighand(clone_flags, p)))
@@ -1281,12 +1278,10 @@ bad_fork_cleanup_fs:
 	exit_fs(p); /* blocking */
 bad_fork_cleanup_files:
 	exit_files(p); /* blocking */
 bad_fork_cleanup_semundo:
 	exit_sem(p);
-bad_fork_cleanup_audit:
-	audit_free(p);
 bad_fork_cleanup_security:
 	security_task_free(p);
 bad_fork_cleanup_policy:
 #ifdef CONFIG_NUMA
 	mpol_free(p->mempolicy);
Index: linux-2.6.19/kernel/exit.c
===================================================================
--- linux-2.6.19.orig/kernel/exit.c
+++ linux-2.6.19/kernel/exit.c
@@ -37,11 +37,10 @@
 #include <linux/cn_proc.h>
 #include <linux/mutex.h>
 #include <linux/futex.h>
 #include <linux/compat.h>
 #include <linux/pipe_fs_i.h>
-#include <linux/audit.h> /* for audit_free() */
 #include <linux/resource.h>
 #include <linux/blkdev.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -909,12 +908,10 @@ fastcall NORET_TYPE void do_exit(long co
 		exit_robust_list(tsk);
 #if defined(CONFIG_FUTEX) && defined(CONFIG_COMPAT)
 	if (unlikely(tsk->compat_robust_list))
 		compat_exit_robust_list(tsk);
 #endif
-	if (unlikely(tsk->audit_context))
-		audit_free(tsk);
 	taskstats_exit_send(tsk, tidstats, group_dead, mycpu);
 	taskstats_exit_free(tidstats);
 
 	exit_mm(tsk);
 	notify_task_watchers(WATCH_TASK_FREE, code, tsk);

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Register semundo task watcher
  2006-12-15  0:07 [PATCH 00/10] Introduction Matt Helsley
  2006-12-15  0:07 ` Task watchers v2 Matt Helsley
  2006-12-15  0:07 ` Register audit task watcher Matt Helsley
@ 2006-12-15  0:07 ` Matt Helsley
  2006-12-15  0:07 ` Register cpuset " Matt Helsley
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15  0:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-Kernel, Jes Sorensen, Christoph Hellwig, Al Viro,
	Steve Grubb, linux-audit, Paul Jackson

[-- Attachment #1: task-watchers-register-semundo --]
[-- Type: text/plain, Size: 5399 bytes --]

Make the semaphore undo code use a task watcher instead of hooking into
copy_process() and do_exit() directly.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
 include/linux/sem.h |   17 -----------------
 ipc/sem.c           |   12 ++++++++----
 kernel/exit.c       |    2 --
 kernel/fork.c       |    6 +-----
 4 files changed, 9 insertions(+), 28 deletions(-)

Index: linux-2.6.19/ipc/sem.c
===================================================================
--- linux-2.6.19.orig/ipc/sem.c
+++ linux-2.6.19/ipc/sem.c
@@ -81,10 +81,11 @@
 #include <linux/audit.h>
 #include <linux/capability.h>
 #include <linux/seq_file.h>
 #include <linux/mutex.h>
 #include <linux/nsproxy.h>
+#include <linux/init.h>
 
 #include <asm/uaccess.h>
 #include "util.h"
 
 #define sem_ids(ns)	(*((ns)->ids[IPC_SEM_IDS]))
@@ -1287,11 +1288,11 @@ asmlinkage long sys_semop (int semid, st
  * See the notes above unlock_semundo() regarding the spin_lock_init()
  * in this code.  Initialize the undo_list->lock here instead of get_undo_list()
  * because of the reasoning in the comment above unlock_semundo.
  */
 
-int copy_semundo(unsigned long clone_flags, struct task_struct *tsk)
+static int __task_init copy_semundo(unsigned long clone_flags, struct task_struct *tsk)
 {
 	struct sem_undo_list *undo_list;
 	int error;
 
 	if (clone_flags & CLONE_SYSVSEM) {
@@ -1303,10 +1304,11 @@ int copy_semundo(unsigned long clone_fla
 	} else 
 		tsk->sysvsem.undo_list = NULL;
 
 	return 0;
 }
+DEFINE_TASK_INITCALL(copy_semundo);
 
 /*
  * add semadj values to semaphores, free undo structures.
  * undo structures are not freed when semaphore arrays are destroyed
  * so some of them may be out of date.
@@ -1316,22 +1318,22 @@ int copy_semundo(unsigned long clone_fla
  * should we queue up and wait until we can do so legally?
  * The original implementation attempted to do this (queue and wait).
  * The current implementation does not do so. The POSIX standard
  * and SVID should be consulted to determine what behavior is mandated.
  */
-void exit_sem(struct task_struct *tsk)
+static int __task_free exit_sem(unsigned long ignored, struct task_struct *tsk)
 {
 	struct sem_undo_list *undo_list;
 	struct sem_undo *u, **up;
 	struct ipc_namespace *ns;
 
 	undo_list = tsk->sysvsem.undo_list;
 	if (!undo_list)
-		return;
+		return 0;
 
 	if (!atomic_dec_and_test(&undo_list->refcnt))
-		return;
+		return 0;
 
 	ns = tsk->nsproxy->ipc_ns;
 	/* There's no need to hold the semundo list lock, as current
          * is the last task exiting for this undo list.
 	 */
@@ -1394,11 +1396,13 @@ found:
 		update_queue(sma);
 next_entry:
 		sem_unlock(sma);
 	}
 	kfree(undo_list);
+	return 0;
 }
+DEFINE_TASK_FREECALL(exit_sem);
 
 #ifdef CONFIG_PROC_FS
 static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
 {
 	struct sem_array *sma = it;
Index: linux-2.6.19/kernel/exit.c
===================================================================
--- linux-2.6.19.orig/kernel/exit.c
+++ linux-2.6.19/kernel/exit.c
@@ -45,11 +45,10 @@
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include <asm/pgtable.h>
 #include <asm/mmu_context.h>
 
-extern void sem_exit (void);
 extern struct task_struct *child_reaper;
 
 static void exit_mm(struct task_struct * tsk);
 
 static void __unhash_process(struct task_struct *p)
@@ -916,11 +915,10 @@ fastcall NORET_TYPE void do_exit(long co
 	exit_mm(tsk);
 	notify_task_watchers(WATCH_TASK_FREE, code, tsk);
 
 	if (group_dead)
 		acct_process();
-	exit_sem(tsk);
 	__exit_files(tsk);
 	__exit_fs(tsk);
 	exit_thread();
 	cpuset_exit(tsk);
 	exit_keys(tsk);
Index: linux-2.6.19/kernel/fork.c
===================================================================
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -1102,14 +1102,12 @@ static struct task_struct *copy_process(
 #endif
 
 	if ((retval = security_task_alloc(p)))
 		goto bad_fork_cleanup_policy;
 	/* copy all the process information */
-	if ((retval = copy_semundo(clone_flags, p)))
-		goto bad_fork_cleanup_security;
 	if ((retval = copy_files(clone_flags, p)))
-		goto bad_fork_cleanup_semundo;
+		goto bad_fork_cleanup_security;
 	if ((retval = copy_fs(clone_flags, p)))
 		goto bad_fork_cleanup_files;
 	if ((retval = copy_sighand(clone_flags, p)))
 		goto bad_fork_cleanup_fs;
 	if ((retval = copy_signal(clone_flags, p)))
@@ -1276,12 +1274,10 @@ bad_fork_cleanup_sighand:
 	__cleanup_sighand(p->sighand);
 bad_fork_cleanup_fs:
 	exit_fs(p); /* blocking */
 bad_fork_cleanup_files:
 	exit_files(p); /* blocking */
-bad_fork_cleanup_semundo:
-	exit_sem(p);
 bad_fork_cleanup_security:
 	security_task_free(p);
 bad_fork_cleanup_policy:
 #ifdef CONFIG_NUMA
 	mpol_free(p->mempolicy);
Index: linux-2.6.19/include/linux/sem.h
===================================================================
--- linux-2.6.19.orig/include/linux/sem.h
+++ linux-2.6.19/include/linux/sem.h
@@ -136,25 +136,8 @@ struct sem_undo_list {
 
 struct sysv_sem {
 	struct sem_undo_list *undo_list;
 };
 
-#ifdef CONFIG_SYSVIPC
-
-extern int copy_semundo(unsigned long clone_flags, struct task_struct *tsk);
-extern void exit_sem(struct task_struct *tsk);
-
-#else
-static inline int copy_semundo(unsigned long clone_flags, struct task_struct *tsk)
-{
-	return 0;
-}
-
-static inline void exit_sem(struct task_struct *tsk)
-{
-	return;
-}
-#endif
-
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_SEM_H */

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Register cpuset task watcher
  2006-12-15  0:07 [PATCH 00/10] Introduction Matt Helsley
                   ` (2 preceding siblings ...)
  2006-12-15  0:07 ` Register semundo " Matt Helsley
@ 2006-12-15  0:07 ` Matt Helsley
  2006-12-15  0:07 ` Register NUMA mempolicy " Matt Helsley
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15  0:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-Kernel, Jes Sorensen, Christoph Hellwig, Al Viro,
	Steve Grubb, linux-audit, Paul Jackson

[-- Attachment #1: task-watchers-register-cpusets --]
[-- Type: text/plain, Size: 5799 bytes --]

Register a task watcher for cpusets instead of hooking into
copy_process() and do_exit() directly.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
---
 include/linux/cpuset.h |    4 ----
 kernel/cpuset.c        |   11 +++++++++--
 kernel/exit.c          |    2 --
 kernel/fork.c          |    6 +-----
 4 files changed, 10 insertions(+), 13 deletions(-)

Index: linux-2.6.19/kernel/fork.c
===================================================================
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -28,11 +28,10 @@
 #include <linux/mman.h>
 #include <linux/fs.h>
 #include <linux/nsproxy.h>
 #include <linux/capability.h>
 #include <linux/cpu.h>
-#include <linux/cpuset.h>
 #include <linux/security.h>
 #include <linux/swap.h>
 #include <linux/syscalls.h>
 #include <linux/jiffies.h>
 #include <linux/futex.h>
@@ -1060,17 +1059,16 @@ static struct task_struct *copy_process(
 		p->tgid = current->tgid;
 
 	retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
 	if (retval < 0)
 		goto bad_fork_cleanup_delays_binfmt;
-	cpuset_fork(p);
 #ifdef CONFIG_NUMA
  	p->mempolicy = mpol_copy(p->mempolicy);
  	if (IS_ERR(p->mempolicy)) {
  		retval = PTR_ERR(p->mempolicy);
  		p->mempolicy = NULL;
- 		goto bad_fork_cleanup_cpuset;
+ 		goto bad_fork_cleanup_delays_binfmt;
  	}
 	mpol_fix_fork_child_flag(p);
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
@@ -1279,13 +1277,11 @@ bad_fork_cleanup_files:
 bad_fork_cleanup_security:
 	security_task_free(p);
 bad_fork_cleanup_policy:
 #ifdef CONFIG_NUMA
 	mpol_free(p->mempolicy);
-bad_fork_cleanup_cpuset:
 #endif
-	cpuset_exit(p);
 bad_fork_cleanup_delays_binfmt:
 	delayacct_tsk_free(p);
 	notify_task_watchers(WATCH_TASK_FREE, 0, p);
 	if (p->binfmt)
 		module_put(p->binfmt->module);
Index: linux-2.6.19/kernel/cpuset.c
===================================================================
--- linux-2.6.19.orig/kernel/cpuset.c
+++ linux-2.6.19/kernel/cpuset.c
@@ -47,10 +47,11 @@
 #include <linux/stat.h>
 #include <linux/string.h>
 #include <linux/time.h>
 #include <linux/backing-dev.h>
 #include <linux/sort.h>
+#include <linux/init.h>
 
 #include <asm/uaccess.h>
 #include <asm/atomic.h>
 #include <linux/mutex.h>
 
@@ -2173,17 +2174,20 @@ void __init cpuset_init_smp(void)
  *
  * At the point that cpuset_fork() is called, 'current' is the parent
  * task, and the passed argument 'child' points to the child task.
  **/
 
-void cpuset_fork(struct task_struct *child)
+static int __task_init cpuset_fork(unsigned long clone_flags,
+				    struct task_struct *child)
 {
 	task_lock(current);
 	child->cpuset = current->cpuset;
 	atomic_inc(&child->cpuset->count);
 	task_unlock(current);
+	return 0;
 }
+DEFINE_TASK_INITCALL(cpuset_fork);
 
 /**
  * cpuset_exit - detach cpuset from exiting task
  * @tsk: pointer to task_struct of exiting process
  *
@@ -2240,11 +2244,12 @@ void cpuset_fork(struct task_struct *chi
  *    to NULL here, and check in cpuset_update_task_memory_state()
  *    for a NULL pointer.  This hack avoids that NULL check, for no
  *    cost (other than this way too long comment ;).
  **/
 
-void cpuset_exit(struct task_struct *tsk)
+static int __task_free cpuset_exit(unsigned long exit_code,
+				    struct task_struct *tsk)
 {
 	struct cpuset *cs;
 
 	cs = tsk->cpuset;
 	tsk->cpuset = &top_cpuset;	/* the_top_cpuset_hack - see above */
@@ -2258,11 +2263,13 @@ void cpuset_exit(struct task_struct *tsk
 		mutex_unlock(&manage_mutex);
 		cpuset_release_agent(pathbuf);
 	} else {
 		atomic_dec(&cs->count);
 	}
+	return 0;
 }
+DEFINE_TASK_FREECALL(cpuset_exit);
 
 /**
  * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset.
  * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
  *
Index: linux-2.6.19/kernel/exit.c
===================================================================
--- linux-2.6.19.orig/kernel/exit.c
+++ linux-2.6.19/kernel/exit.c
@@ -28,11 +28,10 @@
 #include <linux/mount.h>
 #include <linux/proc_fs.h>
 #include <linux/mempolicy.h>
 #include <linux/taskstats_kern.h>
 #include <linux/delayacct.h>
-#include <linux/cpuset.h>
 #include <linux/syscalls.h>
 #include <linux/signal.h>
 #include <linux/posix-timers.h>
 #include <linux/cn_proc.h>
 #include <linux/mutex.h>
@@ -918,11 +917,10 @@ fastcall NORET_TYPE void do_exit(long co
 	if (group_dead)
 		acct_process();
 	__exit_files(tsk);
 	__exit_fs(tsk);
 	exit_thread();
-	cpuset_exit(tsk);
 	exit_keys(tsk);
 
 	if (group_dead && tsk->signal->leader)
 		disassociate_ctty(1);
 
Index: linux-2.6.19/include/linux/cpuset.h
===================================================================
--- linux-2.6.19.orig/include/linux/cpuset.h
+++ linux-2.6.19/include/linux/cpuset.h
@@ -17,12 +17,10 @@
 extern int number_of_cpusets;	/* How many cpusets are defined in system? */
 
 extern int cpuset_init_early(void);
 extern int cpuset_init(void);
 extern void cpuset_init_smp(void);
-extern void cpuset_fork(struct task_struct *p);
-extern void cpuset_exit(struct task_struct *p);
 extern cpumask_t cpuset_cpus_allowed(struct task_struct *p);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
 void cpuset_init_current_mems_allowed(void);
 void cpuset_update_task_memory_state(void);
 #define cpuset_nodes_subset_current_mems_allowed(nodes) \
@@ -68,12 +66,10 @@ extern void cpuset_track_online_nodes(vo
 #else /* !CONFIG_CPUSETS */
 
 static inline int cpuset_init_early(void) { return 0; }
 static inline int cpuset_init(void) { return 0; }
 static inline void cpuset_init_smp(void) {}
-static inline void cpuset_fork(struct task_struct *p) {}
-static inline void cpuset_exit(struct task_struct *p) {}
 
 static inline cpumask_t cpuset_cpus_allowed(struct task_struct *p)
 {
 	return cpu_possible_map;
 }

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Register NUMA mempolicy task watcher
  2006-12-15  0:07 [PATCH 00/10] Introduction Matt Helsley
                   ` (3 preceding siblings ...)
  2006-12-15  0:07 ` Register cpuset " Matt Helsley
@ 2006-12-15  0:07 ` Matt Helsley
  2006-12-15  0:08 ` Register IRQ flag tracing " Matt Helsley
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15  0:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-Kernel, Jes Sorensen, Christoph Hellwig, Al Viro,
	Steve Grubb, linux-audit, Paul Jackson

[-- Attachment #1: task-watchers-register-numa-mempolicy --]
[-- Type: text/plain, Size: 5418 bytes --]

Register a NUMA mempolicy task watcher instead of hooking into
copy_process() and do_exit() directly.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
 kernel/exit.c  |    4 ----
 kernel/fork.c  |   15 +--------------
 mm/mempolicy.c |   25 +++++++++++++++++++++++++
 3 files changed, 26 insertions(+), 18 deletions(-)

Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel

Clone	Number of Children Cloned
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17836.3 	18085.2 	18220.4 	18225 		18319	 	18339
Dev	302.801 	314.617 	303.079 	293.46 		287.267 	294.819
Err (%)	1.69767 	1.73963 	1.6634	 	1.6102	 	1.56814 	1.60761

Fork	Number of Children Forked
	5000		7500		10000		12500		15000		17500
	---------------------------------------------------------------------------------------
Mean	17896.2 	17990 		18100.6 	18242.3 	18244 		18346.9
Dev	301.64	 	285.698 	295.646 	304.361 	299.472 	287.153
Err (%)	1.6855 		1.58809 	1.63335 	1.66844 	1.64148 	1.56513

Kernbench:
Elapsed: 124.532s User: 439.732s System: 46.497s CPU: 389.9%
439.71user 46.48system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.42system 2:05.10elapsed 388%CPU (0avgtext+0avgdata 0maxresident)k
439.74user 46.44system 2:04.60elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.75user 46.64system 2:04.74elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.61user 46.45system 2:05.36elapsed 387%CPU (0avgtext+0avgdata 0maxresident)k
439.60user 46.43system 2:04.33elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.47system 2:04.34elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.87user 46.45system 2:04.10elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.71system 2:04.58elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.48system 2:03.93elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k

Index: linux-2.6.19/mm/mempolicy.c
===================================================================
--- linux-2.6.19.orig/mm/mempolicy.c
+++ linux-2.6.19/mm/mempolicy.c
@@ -87,10 +87,11 @@
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
 #include <linux/migrate.h>
 #include <linux/rmap.h>
 #include <linux/security.h>
+#include <linux/init.h>
 
 #include <asm/tlbflush.h>
 #include <asm/uaccess.h>
 
 /* Internal flags */
@@ -1331,10 +1332,34 @@ struct mempolicy *__mpol_copy(struct mem
 		}
 	}
 	return new;
 }
 
+static int __task_init init_task_mempolicy(unsigned long clone_flags,
+			    		   struct task_struct *tsk)
+{
+ 	tsk->mempolicy = mpol_copy(tsk->mempolicy);
+ 	if (IS_ERR(tsk->mempolicy)) {
+		int retval;
+
+ 		retval = PTR_ERR(tsk->mempolicy);
+ 		tsk->mempolicy = NULL;
+		return retval;
+ 	}
+	mpol_fix_fork_child_flag(tsk);
+	return 0;
+}
+DEFINE_TASK_INITCALL(init_task_mempolicy);
+
+static int __task_free free_task_mempolicy(unsigned int ignored,
+					   struct task_struct *tsk)
+{
+	mpol_free(tsk->mempolicy);
+	tsk->mempolicy = NULL;
+}
+DEFINE_TASK_FREECALL(free_task_mempolicy);
+
 /* Slow path of a mempolicy comparison */
 int __mpol_equal(struct mempolicy *a, struct mempolicy *b)
 {
 	if (!a || !b)
 		return 0;
Index: linux-2.6.19/kernel/fork.c
===================================================================
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -1059,19 +1059,10 @@ static struct task_struct *copy_process(
 		p->tgid = current->tgid;
 
 	retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
 	if (retval < 0)
 		goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_NUMA
- 	p->mempolicy = mpol_copy(p->mempolicy);
- 	if (IS_ERR(p->mempolicy)) {
- 		retval = PTR_ERR(p->mempolicy);
- 		p->mempolicy = NULL;
- 		goto bad_fork_cleanup_delays_binfmt;
- 	}
-	mpol_fix_fork_child_flag(p);
-#endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
 #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
 	p->hardirqs_enabled = 1;
 #else
@@ -1098,11 +1089,11 @@ static struct task_struct *copy_process(
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
 
 	if ((retval = security_task_alloc(p)))
-		goto bad_fork_cleanup_policy;
+		goto bad_fork_cleanup_delays_binfmt;
 	/* copy all the process information */
 	if ((retval = copy_files(clone_flags, p)))
 		goto bad_fork_cleanup_security;
 	if ((retval = copy_fs(clone_flags, p)))
 		goto bad_fork_cleanup_files;
@@ -1274,14 +1265,10 @@ bad_fork_cleanup_fs:
 	exit_fs(p); /* blocking */
 bad_fork_cleanup_files:
 	exit_files(p); /* blocking */
 bad_fork_cleanup_security:
 	security_task_free(p);
-bad_fork_cleanup_policy:
-#ifdef CONFIG_NUMA
-	mpol_free(p->mempolicy);
-#endif
 bad_fork_cleanup_delays_binfmt:
 	delayacct_tsk_free(p);
 	notify_task_watchers(WATCH_TASK_FREE, 0, p);
 	if (p->binfmt)
 		module_put(p->binfmt->module);
Index: linux-2.6.19/kernel/exit.c
===================================================================
--- linux-2.6.19.orig/kernel/exit.c
+++ linux-2.6.19/kernel/exit.c
@@ -930,14 +930,10 @@ fastcall NORET_TYPE void do_exit(long co
 
 	tsk->exit_code = code;
 	proc_exit_connector(tsk);
 	exit_notify(tsk);
 	exit_task_namespaces(tsk);
-#ifdef CONFIG_NUMA
-	mpol_free(tsk->mempolicy);
-	tsk->mempolicy = NULL;
-#endif
 	/*
 	 * This must happen late, after the PID is not
 	 * hashed anymore:
 	 */
 	if (unlikely(!list_empty(&tsk->pi_state_list)))

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Register IRQ flag tracing task watcher
  2006-12-15  0:07 [PATCH 00/10] Introduction Matt Helsley
                   ` (4 preceding siblings ...)
  2006-12-15  0:07 ` Register NUMA mempolicy " Matt Helsley
@ 2006-12-15  0:08 ` Matt Helsley
  2006-12-15  0:08 ` Register lockdep " Matt Helsley
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15  0:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-Kernel, Jes Sorensen, Christoph Hellwig, Al Viro,
	Steve Grubb, linux-audit, Paul Jackson

[-- Attachment #1: task-watchers-register-trace-irqflags --]
[-- Type: text/plain, Size: 2589 bytes --]

Register an irq-flag-tracing task watcher instead of hooking into
copy_process().

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
 kernel/fork.c       |   19 -------------------
 kernel/irq/handle.c |   24 ++++++++++++++++++++++++
 2 files changed, 24 insertions(+), 19 deletions(-)

Index: linux-2.6.19/kernel/fork.c
===================================================================
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -1059,29 +1059,10 @@ static struct task_struct *copy_process(
 		p->tgid = current->tgid;
 
 	retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
 	if (retval < 0)
 		goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_TRACE_IRQFLAGS
-	p->irq_events = 0;
-#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
-	p->hardirqs_enabled = 1;
-#else
-	p->hardirqs_enabled = 0;
-#endif
-	p->hardirq_enable_ip = 0;
-	p->hardirq_enable_event = 0;
-	p->hardirq_disable_ip = _THIS_IP_;
-	p->hardirq_disable_event = 0;
-	p->softirqs_enabled = 1;
-	p->softirq_enable_ip = _THIS_IP_;
-	p->softirq_enable_event = 0;
-	p->softirq_disable_ip = 0;
-	p->softirq_disable_event = 0;
-	p->hardirq_context = 0;
-	p->softirq_context = 0;
-#endif
 #ifdef CONFIG_LOCKDEP
 	p->lockdep_depth = 0; /* no locks held yet */
 	p->curr_chain_key = 0;
 	p->lockdep_recursion = 0;
 #endif
Index: linux-2.6.19/kernel/irq/handle.c
===================================================================
--- linux-2.6.19.orig/kernel/irq/handle.c
+++ linux-2.6.19/kernel/irq/handle.c
@@ -13,10 +13,11 @@
 #include <linux/irq.h>
 #include <linux/module.h>
 #include <linux/random.h>
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
+#include <linux/init.h>
 
 #include "internals.h"
 
 /**
  * handle_bad_irq - handle spurious and unhandled irqs
@@ -266,6 +267,29 @@ void early_init_irq_lock_class(void)
 
 	for (i = 0; i < NR_IRQS; i++)
 		lockdep_set_class(&irq_desc[i].lock, &irq_desc_lock_class);
 }
 
+static int __task_init init_task_trace_irqflags(unsigned long clone_flags,
+			    			struct task_struct *p)
+{
+	p->irq_events = 0;
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+	p->hardirqs_enabled = 1;
+#else
+	p->hardirqs_enabled = 0;
+#endif
+	p->hardirq_enable_ip = 0;
+	p->hardirq_enable_event = 0;
+	p->hardirq_disable_ip = _THIS_IP_;
+	p->hardirq_disable_event = 0;
+	p->softirqs_enabled = 1;
+	p->softirq_enable_ip = _THIS_IP_;
+	p->softirq_enable_event = 0;
+	p->softirq_disable_ip = 0;
+	p->softirq_disable_event = 0;
+	p->hardirq_context = 0;
+	p->softirq_context = 0;
+	return 0;
+}
+DEFINE_TASK_INITCALL(init_task_trace_irqflags);
 #endif

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Register lockdep task watcher
  2006-12-15  0:07 [PATCH 00/10] Introduction Matt Helsley
                   ` (5 preceding siblings ...)
  2006-12-15  0:08 ` Register IRQ flag tracing " Matt Helsley
@ 2006-12-15  0:08 ` Matt Helsley
  2006-12-15  0:08 ` Register process keyrings " Matt Helsley
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15  0:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-Kernel, Jes Sorensen, Christoph Hellwig, Al Viro,
	Steve Grubb, linux-audit, Paul Jackson

[-- Attachment #1: task-watchers-register-lockdep --]
[-- Type: text/plain, Size: 1939 bytes --]

Register a task watcher for lockdep instead of hooking into copy_process().

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
 kernel/fork.c    |    5 -----
 kernel/lockdep.c |   11 +++++++++++
 2 files changed, 11 insertions(+), 5 deletions(-)

Index: linux-2.6.19/kernel/fork.c
===================================================================
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -1059,15 +1059,10 @@ static struct task_struct *copy_process(
 		p->tgid = current->tgid;
 
 	retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
 	if (retval < 0)
 		goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_LOCKDEP
-	p->lockdep_depth = 0; /* no locks held yet */
-	p->curr_chain_key = 0;
-	p->lockdep_recursion = 0;
-#endif
 
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
 
Index: linux-2.6.19/kernel/lockdep.c
===================================================================
--- linux-2.6.19.orig/kernel/lockdep.c
+++ linux-2.6.19/kernel/lockdep.c
@@ -25,10 +25,11 @@
  * mapping lock dependencies runtime.
  */
 #include <linux/mutex.h>
 #include <linux/sched.h>
 #include <linux/delay.h>
+#include <linux/init.h>
 #include <linux/module.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include <linux/spinlock.h>
 #include <linux/kallsyms.h>
@@ -2557,10 +2558,20 @@ void __init lockdep_init(void)
 		INIT_LIST_HEAD(chainhash_table + i);
 
 	lockdep_initialized = 1;
 }
 
+static int __task_init init_task_lockdep(unsigned long clone_flags,
+					 struct task_struct *p)
+{
+	p->lockdep_depth = 0; /* no locks held yet */
+	p->curr_chain_key = 0;
+	p->lockdep_recursion = 0;
+	return 0;
+}
+DEFINE_TASK_INITCALL(init_task_lockdep);
+
 void __init lockdep_info(void)
 {
 	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
 
 	printk("... MAX_LOCKDEP_SUBCLASSES:    %lu\n", MAX_LOCKDEP_SUBCLASSES);

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Register process keyrings task watcher
  2006-12-15  0:07 [PATCH 00/10] Introduction Matt Helsley
                   ` (6 preceding siblings ...)
  2006-12-15  0:08 ` Register lockdep " Matt Helsley
@ 2006-12-15  0:08 ` Matt Helsley
  2006-12-15  0:08 ` Register process events connector Matt Helsley
  2006-12-15  0:08 ` Prefetch hint Matt Helsley
  9 siblings, 0 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15  0:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-Kernel, Jes Sorensen, Christoph Hellwig, Al Viro,
	Steve Grubb, linux-audit, Paul Jackson, David Howells

[-- Attachment #1: task-watchers-register-keys --]
[-- Type: text/plain, Size: 10554 bytes --]

Make the keyring code use a task watcher to initialize and free per-task data.

NOTE:
We can't make copy_thread_group_keys() in copy_signal() a task watcher because it needs the task's signal field (struct signal_struct).

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
---
 include/linux/key.h          |    8 --------
 kernel/exit.c                |    2 --
 kernel/fork.c                |    6 +-----
 kernel/sys.c                 |    8 --------
 security/keys/process_keys.c |   21 ++++++++++++++-------
 5 files changed, 15 insertions(+), 30 deletions(-)

Index: linux-2.6.19/include/linux/key.h
===================================================================
--- linux-2.6.19.orig/include/linux/key.h
+++ linux-2.6.19/include/linux/key.h
@@ -335,18 +335,14 @@ extern void keyring_replace_payload(stru
  */
 extern struct key root_user_keyring, root_session_keyring;
 extern int alloc_uid_keyring(struct user_struct *user,
 			     struct task_struct *ctx);
 extern void switch_uid_keyring(struct user_struct *new_user);
-extern int copy_keys(unsigned long clone_flags, struct task_struct *tsk);
 extern int copy_thread_group_keys(struct task_struct *tsk);
-extern void exit_keys(struct task_struct *tsk);
 extern void exit_thread_group_keys(struct signal_struct *tg);
 extern int suid_keys(struct task_struct *tsk);
 extern int exec_keys(struct task_struct *tsk);
-extern void key_fsuid_changed(struct task_struct *tsk);
-extern void key_fsgid_changed(struct task_struct *tsk);
 extern void key_init(void);
 
 #define __install_session_keyring(tsk, keyring)			\
 ({								\
 	struct key *old_session = tsk->signal->session_keyring;	\
@@ -365,18 +361,14 @@ extern void key_init(void);
 #define key_ref_to_ptr(k)		({ NULL; })
 #define is_key_possessed(k)		0
 #define alloc_uid_keyring(u,c)		0
 #define switch_uid_keyring(u)		do { } while(0)
 #define __install_session_keyring(t, k)	({ NULL; })
-#define copy_keys(f,t)			0
 #define copy_thread_group_keys(t)	0
-#define exit_keys(t)			do { } while(0)
 #define exit_thread_group_keys(tg)	do { } while(0)
 #define suid_keys(t)			do { } while(0)
 #define exec_keys(t)			do { } while(0)
-#define key_fsuid_changed(t)		do { } while(0)
-#define key_fsgid_changed(t)		do { } while(0)
 #define key_init()			do { } while(0)
 
 /* Initial keyrings */
 extern struct key root_user_keyring;
 extern struct key root_session_keyring;
Index: linux-2.6.19/kernel/fork.c
===================================================================
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -1077,14 +1077,12 @@ static struct task_struct *copy_process(
 		goto bad_fork_cleanup_fs;
 	if ((retval = copy_signal(clone_flags, p)))
 		goto bad_fork_cleanup_sighand;
 	if ((retval = copy_mm(clone_flags, p)))
 		goto bad_fork_cleanup_signal;
-	if ((retval = copy_keys(clone_flags, p)))
-		goto bad_fork_cleanup_mm;
 	if ((retval = copy_namespaces(clone_flags, p)))
-		goto bad_fork_cleanup_keys;
+		goto bad_fork_cleanup_mm;
 	retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
 	if (retval)
 		goto bad_fork_cleanup_namespaces;
 
 	p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
@@ -1226,12 +1224,10 @@ static struct task_struct *copy_process(
 	proc_fork_connector(p);
 	return p;
 
 bad_fork_cleanup_namespaces:
 	exit_task_namespaces(p);
-bad_fork_cleanup_keys:
-	exit_keys(p);
 bad_fork_cleanup_mm:
 	if (p->mm)
 		mmput(p->mm);
 bad_fork_cleanup_signal:
 	cleanup_signal(p);
Index: linux-2.6.19/security/keys/process_keys.c
===================================================================
--- linux-2.6.19.orig/security/keys/process_keys.c
+++ linux-2.6.19/security/keys/process_keys.c
@@ -15,10 +15,11 @@
 #include <linux/slab.h>
 #include <linux/keyctl.h>
 #include <linux/fs.h>
 #include <linux/err.h>
 #include <linux/mutex.h>
+#include <linux/init.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
 /* session keyring create vs join semaphore */
 static DEFINE_MUTEX(key_session_mutex);
@@ -276,11 +277,12 @@ int copy_thread_group_keys(struct task_s
 
 /*****************************************************************************/
 /*
  * copy the keys for fork
  */
-int copy_keys(unsigned long clone_flags, struct task_struct *tsk)
+static int __task_init copy_keys(unsigned long clone_flags,
+				 struct task_struct *tsk)
 {
 	key_check(tsk->thread_keyring);
 	key_check(tsk->request_key_auth);
 
 	/* no thread keyring yet */
@@ -290,10 +292,11 @@ int copy_keys(unsigned long clone_flags,
 	key_get(tsk->request_key_auth);
 
 	return 0;
 
 } /* end copy_keys() */
+DEFINE_TASK_INITCALL(copy_keys);
 
 /*****************************************************************************/
 /*
  * dispose of thread group keys upon thread group destruction
  */
@@ -306,16 +309,18 @@ void exit_thread_group_keys(struct signa
 
 /*****************************************************************************/
 /*
  * dispose of per-thread keys upon thread exit
  */
-void exit_keys(struct task_struct *tsk)
+static int __task_free exit_keys(unsigned long exit_code,
+				 struct task_struct *tsk)
 {
 	key_put(tsk->thread_keyring);
 	key_put(tsk->request_key_auth);
-
+	return 0;
 } /* end exit_keys() */
+DEFINE_TASK_FREECALL(exit_keys);
 
 /*****************************************************************************/
 /*
  * deal with execve()
  */
@@ -356,35 +361,37 @@ int suid_keys(struct task_struct *tsk)
 
 /*****************************************************************************/
 /*
  * the filesystem user ID changed
  */
-void key_fsuid_changed(struct task_struct *tsk)
+static int key_fsuid_changed(unsigned long ignored, struct task_struct *tsk)
 {
 	/* update the ownership of the thread keyring */
 	if (tsk->thread_keyring) {
 		down_write(&tsk->thread_keyring->sem);
 		tsk->thread_keyring->uid = tsk->fsuid;
 		up_write(&tsk->thread_keyring->sem);
 	}
-
+	return 0;
 } /* end key_fsuid_changed() */
+DEFINE_TASK_UIDCALL(key_fsuid_changed);
 
 /*****************************************************************************/
 /*
  * the filesystem group ID changed
  */
-void key_fsgid_changed(struct task_struct *tsk)
+static int key_fsgid_changed(unsigned long ignored, struct task_struct *tsk)
 {
 	/* update the ownership of the thread keyring */
 	if (tsk->thread_keyring) {
 		down_write(&tsk->thread_keyring->sem);
 		tsk->thread_keyring->gid = tsk->fsgid;
 		up_write(&tsk->thread_keyring->sem);
 	}
-
+	return 0;
 } /* end key_fsgid_changed() */
+DEFINE_TASK_GIDCALL(key_fsgid_changed);
 
 /*****************************************************************************/
 /*
  * search the process keyrings for the first matching key
  * - we use the supplied match function to see if the description (or other
Index: linux-2.6.19/kernel/exit.c
===================================================================
--- linux-2.6.19.orig/kernel/exit.c
+++ linux-2.6.19/kernel/exit.c
@@ -13,11 +13,10 @@
 #include <linux/capability.h>
 #include <linux/completion.h>
 #include <linux/personality.h>
 #include <linux/tty.h>
 #include <linux/namespace.h>
-#include <linux/key.h>
 #include <linux/security.h>
 #include <linux/cpu.h>
 #include <linux/acct.h>
 #include <linux/tsacct_kern.h>
 #include <linux/file.h>
@@ -917,11 +916,10 @@ fastcall NORET_TYPE void do_exit(long co
 	if (group_dead)
 		acct_process();
 	__exit_files(tsk);
 	__exit_fs(tsk);
 	exit_thread();
-	exit_keys(tsk);
 
 	if (group_dead && tsk->signal->leader)
 		disassociate_ctty(1);
 
 	module_put(task_thread_info(tsk)->exec_domain->module);
Index: linux-2.6.19/kernel/sys.c
===================================================================
--- linux-2.6.19.orig/kernel/sys.c
+++ linux-2.6.19/kernel/sys.c
@@ -956,11 +956,10 @@ asmlinkage long sys_setregid(gid_t rgid,
 	    (egid != (gid_t) -1 && egid != old_rgid))
 		current->sgid = new_egid;
 	current->fsgid = new_egid;
 	current->egid = new_egid;
 	current->gid = new_rgid;
-	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
@@ -992,11 +991,10 @@ asmlinkage long sys_setgid(gid_t gid)
 		current->egid = current->fsgid = gid;
 	}
 	else
 		return -EPERM;
 
-	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
   
@@ -1081,11 +1079,10 @@ asmlinkage long sys_setreuid(uid_t ruid,
 	if (ruid != (uid_t) -1 ||
 	    (euid != (uid_t) -1 && euid != old_ruid))
 		current->suid = current->euid;
 	current->fsuid = current->euid;
 
-	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE);
 }
@@ -1129,11 +1126,10 @@ asmlinkage long sys_setuid(uid_t uid)
 		smp_wmb();
 	}
 	current->fsuid = current->euid = uid;
 	current->suid = new_suid;
 
-	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID);
 }
@@ -1178,11 +1174,10 @@ asmlinkage long sys_setresuid(uid_t ruid
 	}
 	current->fsuid = current->euid;
 	if (suid != (uid_t) -1)
 		current->suid = suid;
 
-	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES);
 }
@@ -1231,11 +1226,10 @@ asmlinkage long sys_setresgid(gid_t rgid
 	if (rgid != (gid_t) -1)
 		current->gid = rgid;
 	if (sgid != (gid_t) -1)
 		current->sgid = sgid;
 
-	key_fsgid_changed(current);
 	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
@@ -1273,11 +1267,10 @@ asmlinkage long sys_setfsuid(uid_t uid)
 			smp_wmb();
 		}
 		current->fsuid = uid;
 	}
 
-	key_fsuid_changed(current);
 	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS);
 
@@ -1301,11 +1294,10 @@ asmlinkage long sys_setfsgid(gid_t gid)
 		if (gid != old_fsgid) {
 			current->mm->dumpable = suid_dumpable;
 			smp_wmb();
 		}
 		current->fsgid = gid;
-		key_fsgid_changed(current);
 		proc_id_connector(current, PROC_EVENT_GID);
 		notify_task_watchers(WATCH_TASK_GID, 0, current);
 	}
 	return old_fsgid;
 }

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Register process events connector
  2006-12-15  0:07 [PATCH 00/10] Introduction Matt Helsley
                   ` (7 preceding siblings ...)
  2006-12-15  0:08 ` Register process keyrings " Matt Helsley
@ 2006-12-15  0:08 ` Matt Helsley
  2006-12-15  0:08 ` Prefetch hint Matt Helsley
  9 siblings, 0 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15  0:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-Kernel, Jes Sorensen, Christoph Hellwig, Al Viro,
	Steve Grubb, linux-audit, Paul Jackson

[-- Attachment #1: task-watchers-register-procevents --]
[-- Type: text/plain, Size: 11560 bytes --]

Make the Process events connector use task watchers instead of hooking the
paths it's interested in.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
 drivers/connector/cn_proc.c |   51 +++++++++++++++++++++++++++++++-------------
 fs/exec.c                   |    1 
 include/linux/cn_proc.h     |   21 ------------------
 kernel/exit.c               |    2 -
 kernel/fork.c               |    2 -
 kernel/sys.c                |    8 ------
 6 files changed, 36 insertions(+), 49 deletions(-)

Index: linux-2.6.19/drivers/connector/cn_proc.c
===================================================================
--- linux-2.6.19.orig/drivers/connector/cn_proc.c
+++ linux-2.6.19/drivers/connector/cn_proc.c
@@ -44,19 +44,20 @@ static inline void get_seq(__u32 *ts, in
 	*ts = get_cpu_var(proc_event_counts)++;
 	*cpu = smp_processor_id();
 	put_cpu_var(proc_event_counts);
 }
 
-void proc_fork_connector(struct task_struct *task)
+static int proc_fork_connector(unsigned long clone_flags,
+			       struct task_struct *task)
 {
 	struct cn_msg *msg;
 	struct proc_event *ev;
 	__u8 buffer[CN_PROC_MSG_SIZE];
 	struct timespec ts;
 
 	if (atomic_read(&proc_event_num_listeners) < 1)
-		return;
+		return 0;
 
 	msg = (struct cn_msg*)buffer;
 	ev = (struct proc_event*)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
@@ -70,21 +71,24 @@ void proc_fork_connector(struct task_str
 	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
 	msg->ack = 0; /* not used */
 	msg->len = sizeof(*ev);
 	/*  If cn_netlink_send() failed, the data is not sent */
 	cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+	return 0;
 }
+DEFINE_TASK_CLONECALL(proc_fork_connector);
 
-void proc_exec_connector(struct task_struct *task)
+static int proc_exec_connector(unsigned long ignore,
+			       struct task_struct *task)
 {
 	struct cn_msg *msg;
 	struct proc_event *ev;
 	struct timespec ts;
 	__u8 buffer[CN_PROC_MSG_SIZE];
 
 	if (atomic_read(&proc_event_num_listeners) < 1)
-		return;
+		return 0;
 
 	msg = (struct cn_msg*)buffer;
 	ev = (struct proc_event*)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
@@ -95,21 +99,23 @@ void proc_exec_connector(struct task_str
 
 	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
 	msg->ack = 0; /* not used */
 	msg->len = sizeof(*ev);
 	cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+	return 0;
 }
+DEFINE_TASK_EXECCALL(proc_exec_connector);
 
-void proc_id_connector(struct task_struct *task, int which_id)
+static int process_change_id(unsigned long which_id, struct task_struct *task)
 {
 	struct cn_msg *msg;
 	struct proc_event *ev;
 	__u8 buffer[CN_PROC_MSG_SIZE];
 	struct timespec ts;
 
 	if (atomic_read(&proc_event_num_listeners) < 1)
-		return;
+		return 0;
 
 	msg = (struct cn_msg*)buffer;
 	ev = (struct proc_event*)msg->data;
 	ev->what = which_id;
 	ev->event_data.id.process_pid = task->pid;
@@ -119,47 +125,64 @@ void proc_id_connector(struct task_struc
 	 	ev->event_data.id.e.euid = task->euid;
 	} else if (which_id == PROC_EVENT_GID) {
 	   	ev->event_data.id.r.rgid = task->gid;
 	   	ev->event_data.id.e.egid = task->egid;
 	} else
-	     	return;
+	     	return 0;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	ev->timestamp_ns = timespec_to_ns(&ts);
 
 	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
 	msg->ack = 0; /* not used */
 	msg->len = sizeof(*ev);
 	cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+	return 0;
+}
+
+static int proc_change_uid_connector(unsigned long ignore,
+				     struct task_struct *task)
+{
+	return process_change_id(PROC_EVENT_UID, task);
+}
+DEFINE_TASK_UIDCALL(proc_change_uid_connector);
+
+static int proc_change_gid_connector(unsigned long ignore,
+				     struct task_struct *task)
+{
+	return process_change_id(PROC_EVENT_GID, task);
 }
+DEFINE_TASK_GIDCALL(proc_change_gid_connector);
 
-void proc_exit_connector(struct task_struct *task)
+static int proc_exit_connector(unsigned long code, struct task_struct *task)
 {
 	struct cn_msg *msg;
 	struct proc_event *ev;
 	__u8 buffer[CN_PROC_MSG_SIZE];
 	struct timespec ts;
 
 	if (atomic_read(&proc_event_num_listeners) < 1)
-		return;
+		return 0;
 
 	msg = (struct cn_msg*)buffer;
 	ev = (struct proc_event*)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	ev->timestamp_ns = timespec_to_ns(&ts);
 	ev->what = PROC_EVENT_EXIT;
 	ev->event_data.exit.process_pid = task->pid;
 	ev->event_data.exit.process_tgid = task->tgid;
-	ev->event_data.exit.exit_code = task->exit_code;
+	ev->event_data.exit.exit_code = code;
 	ev->event_data.exit.exit_signal = task->exit_signal;
 
 	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
 	msg->ack = 0; /* not used */
 	msg->len = sizeof(*ev);
 	cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+	return 0;
 }
+DEFINE_TASK_EXITCALL(proc_exit_connector);
 
 /*
  * Send an acknowledgement message to userspace
  *
  * Use 0 for success, EFOO otherwise.
@@ -226,14 +249,12 @@ static void cn_proc_mcast_ctl(void *data
  */
 static int __init cn_proc_init(void)
 {
 	int err;
 
-	if ((err = cn_add_callback(&cn_proc_event_id, "cn_proc",
-	 			   &cn_proc_mcast_ctl))) {
+	err = cn_add_callback(&cn_proc_event_id, "cn_proc", &cn_proc_mcast_ctl);
+	if (err)
 		printk(KERN_WARNING "cn_proc failed to register\n");
-		return err;
-	}
-	return 0;
+	return err;
 }
 
 module_init(cn_proc_init);
Index: linux-2.6.19/kernel/fork.c
===================================================================
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -40,11 +40,10 @@
 #include <linux/mount.h>
 #include <linux/profile.h>
 #include <linux/rmap.h>
 #include <linux/acct.h>
 #include <linux/tsacct_kern.h>
-#include <linux/cn_proc.h>
 #include <linux/delayacct.h>
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
 #include <linux/init.h>
 
@@ -1219,11 +1218,10 @@ static struct task_struct *copy_process(
 
 	total_forks++;
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
 	notify_task_watchers(WATCH_TASK_CLONE, clone_flags, p);
-	proc_fork_connector(p);
 	return p;
 
 bad_fork_cleanup_namespaces:
 	exit_task_namespaces(p);
 bad_fork_cleanup_mm:
Index: linux-2.6.19/kernel/exit.c
===================================================================
--- linux-2.6.19.orig/kernel/exit.c
+++ linux-2.6.19/kernel/exit.c
@@ -30,11 +30,10 @@
 #include <linux/taskstats_kern.h>
 #include <linux/delayacct.h>
 #include <linux/syscalls.h>
 #include <linux/signal.h>
 #include <linux/posix-timers.h>
-#include <linux/cn_proc.h>
 #include <linux/mutex.h>
 #include <linux/futex.h>
 #include <linux/compat.h>
 #include <linux/pipe_fs_i.h>
 #include <linux/resource.h>
@@ -925,11 +924,10 @@ fastcall NORET_TYPE void do_exit(long co
 	module_put(task_thread_info(tsk)->exec_domain->module);
 	if (tsk->binfmt)
 		module_put(tsk->binfmt->module);
 
 	tsk->exit_code = code;
-	proc_exit_connector(tsk);
 	exit_notify(tsk);
 	exit_task_namespaces(tsk);
 	/*
 	 * This must happen late, after the PID is not
 	 * hashed anymore:
Index: linux-2.6.19/kernel/sys.c
===================================================================
--- linux-2.6.19.orig/kernel/sys.c
+++ linux-2.6.19/kernel/sys.c
@@ -956,11 +956,10 @@ asmlinkage long sys_setregid(gid_t rgid,
 	    (egid != (gid_t) -1 && egid != old_rgid))
 		current->sgid = new_egid;
 	current->fsgid = new_egid;
 	current->egid = new_egid;
 	current->gid = new_rgid;
-	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
 /*
@@ -991,11 +990,10 @@ asmlinkage long sys_setgid(gid_t gid)
 		current->egid = current->fsgid = gid;
 	}
 	else
 		return -EPERM;
 
-	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
   
 static int set_user(uid_t new_ruid, int dumpclear)
@@ -1079,11 +1077,10 @@ asmlinkage long sys_setreuid(uid_t ruid,
 	if (ruid != (uid_t) -1 ||
 	    (euid != (uid_t) -1 && euid != old_ruid))
 		current->suid = current->euid;
 	current->fsuid = current->euid;
 
-	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE);
 }
 
@@ -1126,11 +1123,10 @@ asmlinkage long sys_setuid(uid_t uid)
 		smp_wmb();
 	}
 	current->fsuid = current->euid = uid;
 	current->suid = new_suid;
 
-	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID);
 }
 
@@ -1174,11 +1170,10 @@ asmlinkage long sys_setresuid(uid_t ruid
 	}
 	current->fsuid = current->euid;
 	if (suid != (uid_t) -1)
 		current->suid = suid;
 
-	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES);
 }
 
@@ -1226,11 +1221,10 @@ asmlinkage long sys_setresgid(gid_t rgid
 	if (rgid != (gid_t) -1)
 		current->gid = rgid;
 	if (sgid != (gid_t) -1)
 		current->sgid = sgid;
 
-	proc_id_connector(current, PROC_EVENT_GID);
 	notify_task_watchers(WATCH_TASK_GID, 0, current);
 	return 0;
 }
 
 asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid)
@@ -1267,11 +1261,10 @@ asmlinkage long sys_setfsuid(uid_t uid)
 			smp_wmb();
 		}
 		current->fsuid = uid;
 	}
 
-	proc_id_connector(current, PROC_EVENT_UID);
 	notify_task_watchers(WATCH_TASK_UID, 0, current);
 
 	security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS);
 
 	return old_fsuid;
@@ -1294,11 +1287,10 @@ asmlinkage long sys_setfsgid(gid_t gid)
 		if (gid != old_fsgid) {
 			current->mm->dumpable = suid_dumpable;
 			smp_wmb();
 		}
 		current->fsgid = gid;
-		proc_id_connector(current, PROC_EVENT_GID);
 		notify_task_watchers(WATCH_TASK_GID, 0, current);
 	}
 	return old_fsgid;
 }
 
Index: linux-2.6.19/fs/exec.c
===================================================================
--- linux-2.6.19.orig/fs/exec.c
+++ linux-2.6.19/fs/exec.c
@@ -1085,11 +1085,10 @@ int search_binary_handler(struct linux_b
 					fput(bprm->file);
 				bprm->file = NULL;
 				current->did_exec = 1;
 				notify_task_watchers(WATCH_TASK_EXEC, 0,
 						     current);
-				proc_exec_connector(current);
 				return retval;
 			}
 			read_lock(&binfmt_lock);
 			put_binfmt(fmt);
 			if (retval != -ENOEXEC || bprm->mm == NULL)
Index: linux-2.6.19/include/linux/cn_proc.h
===================================================================
--- linux-2.6.19.orig/include/linux/cn_proc.h
+++ linux-2.6.19/include/linux/cn_proc.h
@@ -95,27 +95,6 @@ struct proc_event {
 			__u32 exit_code, exit_signal;
 		} exit;
 	} event_data;
 };
 
-#ifdef __KERNEL__
-#ifdef CONFIG_PROC_EVENTS
-void proc_fork_connector(struct task_struct *task);
-void proc_exec_connector(struct task_struct *task);
-void proc_id_connector(struct task_struct *task, int which_id);
-void proc_exit_connector(struct task_struct *task);
-#else
-static inline void proc_fork_connector(struct task_struct *task)
-{}
-
-static inline void proc_exec_connector(struct task_struct *task)
-{}
-
-static inline void proc_id_connector(struct task_struct *task,
-				     int which_id)
-{}
-
-static inline void proc_exit_connector(struct task_struct *task)
-{}
-#endif	/* CONFIG_PROC_EVENTS */
-#endif	/* __KERNEL__ */
 #endif	/* CN_PROC_H */

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Prefetch hint
  2006-12-15  0:07 [PATCH 00/10] Introduction Matt Helsley
                   ` (8 preceding siblings ...)
  2006-12-15  0:08 ` Register process events connector Matt Helsley
@ 2006-12-15  0:08 ` Matt Helsley
  9 siblings, 0 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15  0:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-Kernel, Jes Sorensen, Christoph Hellwig, Al Viro,
	Steve Grubb, linux-audit, Paul Jackson

[-- Attachment #1: task-watchers-prefetch --]
[-- Type: text/plain, Size: 996 bytes --]

Prefetch the entire array of function pointers.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>

---
 kernel/task_watchers.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.19/kernel/task_watchers.c
===================================================================
--- linux-2.6.19.orig/kernel/task_watchers.c
+++ linux-2.6.19/kernel/task_watchers.c
@@ -1,6 +1,7 @@
 #include <linux/init.h>
+#include <linux/prefetch.h>
 
 /* Defined in include/asm-generic/vmlinux.lds.h */
 extern const task_watcher_fn __start_task_init[],
 		__start_task_clone[], __start_task_exec[],
 		__start_task_uid[], __start_task_gid[],
@@ -30,10 +31,11 @@ int notify_task_watchers(unsigned int ev
 
 	tw_call = twtable[ev];
 	tw_end = twtable[ev + 1];
 
 	/* Call all of the watchers, report the first error */
+	prefetch_range(tw_call, tw_end - tw_call);
 	for (; tw_call < tw_end; tw_call++) {
 		err = (*tw_call)(val, tsk);
 		if (unlikely((err < 0) && (ret_err == 0)))
 			ret_err = err;
 	}

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Task watchers v2
  2006-12-15  0:07 ` Task watchers v2 Matt Helsley
@ 2006-12-15  8:34   ` Christoph Hellwig
  2006-12-15 22:17     ` Matt Helsley
  2006-12-15 23:13     ` Matt Helsley
  2006-12-18  5:44   ` Zhang, Yanmin
  1 sibling, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2006-12-15  8:34 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Andrew Morton, Linux-Kernel, Jes Sorensen, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson

On Thu, Dec 14, 2006 at 04:07:55PM -0800, Matt Helsley wrote:
> Associate function calls with significant events in a task's lifetime much like
> we handle kernel and module init/exit functions. This creates a table for each
> of the following events in the task_watchers_table ELF section:
> 
> WATCH_TASK_INIT at the beginning of a fork/clone system call when the
> new task struct first becomes available.
> 
> WATCH_TASK_CLONE just before returning successfully from a fork/clone.
> 
> WATCH_TASK_EXEC just before successfully returning from the exec
> system call.
> 
> WATCH_TASK_UID every time a task's real or effective user id changes.
> 
> WATCH_TASK_GID every time a task's real or effective group id changes.
> 
> WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
> for any reason. 
> 
> WATCH_TASK_FREE is called before critical task structures like
> the mm_struct become inaccessible and the task is subsequently freed.
> 
> The next patch will add a debugfs interface for measuring fork and exit rates
> which can be used to calculate the overhead of the task watcher infrastructure.

What's the point of the ELF hackery? This code would be a lot simpler
and more understandable if you simply had task_watcher_ops and a
register / unregister function for it.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Task watchers v2
  2006-12-15  8:34   ` Christoph Hellwig
@ 2006-12-15 22:17     ` Matt Helsley
  2006-12-15 23:13     ` Matt Helsley
  1 sibling, 0 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15 22:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Linux-Kernel, Jes Sorensen, Al Viro, Steve Grubb,
	linux-audit, Paul Jackson

On Fri, 2006-12-15 at 09:34 +0100, Christoph Hellwig wrote:
> On Thu, Dec 14, 2006 at 04:07:55PM -0800, Matt Helsley wrote:
> > Associate function calls with significant events in a task's lifetime much like
> > we handle kernel and module init/exit functions. This creates a table for each
> > of the following events in the task_watchers_table ELF section:
> >
> > WATCH_TASK_INIT at the beginning of a fork/clone system call when the
> > new task struct first becomes available.
> >
> > WATCH_TASK_CLONE just before returning successfully from a fork/clone.
> >
> > WATCH_TASK_EXEC just before successfully returning from the exec
> > system call.
> >
> > WATCH_TASK_UID every time a task's real or effective user id changes.
> >
> > WATCH_TASK_GID every time a task's real or effective group id changes.
> >
> > WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
> > for any reason.
> >
> > WATCH_TASK_FREE is called before critical task structures like
> > the mm_struct become inaccessible and the task is subsequently freed.
> >
> > The next patch will add a debugfs interface for measuring fork and exit rates
> > which can be used to calculate the overhead of the task watcher infrastructure.
> 
> What's the point of the ELF hackery? This code would be a lot simpler
> and more understandable if you simply had task_watcher_ops and a
> register / unregister function for it.

	Andrew asked me to avoid locking and added complexity in the code that
uses one or more task watchers. The ELF hackery helps me avoid locking
in the fork/exit/etc paths that call the "registered" function.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Task watchers v2
  2006-12-15  8:34   ` Christoph Hellwig
  2006-12-15 22:17     ` Matt Helsley
@ 2006-12-15 23:13     ` Matt Helsley
  1 sibling, 0 replies; 19+ messages in thread
From: Matt Helsley @ 2006-12-15 23:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Linux-Kernel, Jes Sorensen, Al Viro, Steve Grubb,
	linux-audit, Paul Jackson

On Fri, 2006-12-15 at 09:34 +0100, Christoph Hellwig wrote:
> On Thu, Dec 14, 2006 at 04:07:55PM -0800, Matt Helsley wrote:
> > Associate function calls with significant events in a task's lifetime much like
> > we handle kernel and module init/exit functions. This creates a table for each
> > of the following events in the task_watchers_table ELF section:
> >
> > WATCH_TASK_INIT at the beginning of a fork/clone system call when the
> > new task struct first becomes available.
> >
> > WATCH_TASK_CLONE just before returning successfully from a fork/clone.
> >
> > WATCH_TASK_EXEC just before successfully returning from the exec
> > system call.
> >
> > WATCH_TASK_UID every time a task's real or effective user id changes.
> >
> > WATCH_TASK_GID every time a task's real or effective group id changes.
> >
> > WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
> > for any reason.
> >
> > WATCH_TASK_FREE is called before critical task structures like
> > the mm_struct become inaccessible and the task is subsequently freed.
> >
> > The next patch will add a debugfs interface for measuring fork and exit rates
> > which can be used to calculate the overhead of the task watcher infrastructure.
> 
> What's the point of the ELF hackery? This code would be a lot simpler
> and more understandable if you simply had task_watcher_ops and a
> register / unregister function for it.

A bit more verbose response:

	I posted a notifier chain implementation back in June that bears some
resemblance to your suggestion -- a structure needed to be registered at
runtime. There was a single global list of them to iterate over for each
event.

	This patch and the following patches are significantly shorter than
their counterparts in that series. They avoid iterating over elements
with empty ops. The way function pointers and function bodies are
grouped together by this series should improve locality. The fact that
there's no locking required also makes it simpler to analyze and use.

	The patches to allow modules to register task watchers does make things
more complex though -- that does require a list and a lock. However, the
lock does not need to be taken in the fork/exec/etc paths if we pin the
module. In contrast your suggested approach is simpler because it
doesn't treat modules any differently. However overall I think the
balance still favors these patches.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Task watchers v2
  2006-12-15  0:07 ` Task watchers v2 Matt Helsley
  2006-12-15  8:34   ` Christoph Hellwig
@ 2006-12-18  5:44   ` Zhang, Yanmin
  2006-12-18 13:18     ` Matt Helsley
  1 sibling, 1 reply; 19+ messages in thread
From: Zhang, Yanmin @ 2006-12-18  5:44 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Andrew Morton, Linux-Kernel, Jes Sorensen, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, systemtap

On Thu, 2006-12-14 at 16:07 -0800, Matt Helsley wrote:
> plain text document attachment (task-watchers-v2)
> Associate function calls with significant events in a task's lifetime much like
> we handle kernel and module init/exit functions. This creates a table for each
> of the following events in the task_watchers_table ELF section:
> 
> WATCH_TASK_INIT at the beginning of a fork/clone system call when the
> new task struct first becomes available.
> 
> WATCH_TASK_CLONE just before returning successfully from a fork/clone.
> 
> WATCH_TASK_EXEC just before successfully returning from the exec
> system call.
> 
> WATCH_TASK_UID every time a task's real or effective user id changes.
> 
> WATCH_TASK_GID every time a task's real or effective group id changes.
> 
> WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
> for any reason. 
> 
> WATCH_TASK_FREE is called before critical task structures like
> the mm_struct become inaccessible and the task is subsequently freed.
> 
> The next patch will add a debugfs interface for measuring fork and exit rates
> which can be used to calculate the overhead of the task watcher infrastructure.
> 
> Subsequent patches will make use of task watchers to simplify fork, exit,
> and many of the system calls that set [er][ug]ids.
It's easier to get such watch capabilities by kprobe/systemtap. Why to
add new codes to kernel?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Task watchers v2
  2006-12-18  5:44   ` Zhang, Yanmin
@ 2006-12-18 13:18     ` Matt Helsley
  2006-12-19  5:41       ` Paul Jackson
  0 siblings, 1 reply; 19+ messages in thread
From: Matt Helsley @ 2006-12-18 13:18 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Andrew Morton, Linux-Kernel, Jes Sorensen, Christoph Hellwig,
	Al Viro, Steve Grubb, linux-audit, Paul Jackson, systemtap

On Mon, 2006-12-18 at 13:44 +0800, Zhang, Yanmin wrote:
> On Thu, 2006-12-14 at 16:07 -0800, Matt Helsley wrote:
> > plain text document attachment (task-watchers-v2)
> > Associate function calls with significant events in a task's lifetime much like
> > we handle kernel and module init/exit functions. This creates a table for each
> > of the following events in the task_watchers_table ELF section:
> >
> > WATCH_TASK_INIT at the beginning of a fork/clone system call when the
> > new task struct first becomes available.
> >
> > WATCH_TASK_CLONE just before returning successfully from a fork/clone.
> >
> > WATCH_TASK_EXEC just before successfully returning from the exec
> > system call.
> >
> > WATCH_TASK_UID every time a task's real or effective user id changes.
> >
> > WATCH_TASK_GID every time a task's real or effective group id changes.
> >
> > WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
> > for any reason.
> >
> > WATCH_TASK_FREE is called before critical task structures like
> > the mm_struct become inaccessible and the task is subsequently freed.
> >
> > The next patch will add a debugfs interface for measuring fork and exit rates
> > which can be used to calculate the overhead of the task watcher infrastructure.
> >
> > Subsequent patches will make use of task watchers to simplify fork, exit,
> > and many of the system calls that set [er][ug]ids.
> It's easier to get such watch capabilities by kprobe/systemtap. Why to
> add new codes to kernel?

Good question! Disclaimer: Everything I know about kprobes I learned
from Documentation/kprobes.txt

The task watchers patches have a few distinguishing capabilities yet
lack capabilities important for kprobes -- so neither is a replacement
for the other. Specifically:

- Task watchers are for use by the kernel for more than profiling and
debugging. They need to work even when kernel debugging and
instrumentation are disabled.

- Task watchers do not need to be dynamically enabled, disabled, or
removed (though dynamic insertion would be nice -- I'm working on that).
In fact I've been told that dynamically enabling, disabling, or removing
them would incur unacceptable complexity and/or cost for an
uninstrumented kernel.

- Task watchers don't require arch support. They use completely generic
code.
	- Since they are written into the code task watchers don't need
	  to modify instructions.

	- Task watchers doesn't need to single-step an instruction

	- Task watchers don't need to know about arch registers, calling
	  conventions, etc. to work

- Task watchers don't need to have the same (possibly extensive)
argument list as the function being "probed". This makes maintenance
easier -- no need to keep the signature of the watchers in synch with
the signature of the "probed" function.

- Task watchers don't require MODULES (2.6.20-rc1-mm1's
arch/i386/Kconfig suggests this is true of kprobes).

- Task watchers don't need kernel symbols.

- Task watchers can affect flow control (see the patch hunks that change
copy_process()) with their return value.

- Task watchers do not need to know the instruction address to be
"probed".

- Task watchers can actually improve kernel performance slightly (up to
2% in extremely fork-heavy workloads for instance).

- Task watchers require local variables -- not necessarily arguments to
the "probed" function.

- Task watchers don't care if preemption is enabled or disabled.

- Task watchers could sleep if they want to.

	So to the best of my knowledge kprobes isn't a replacement for task
watchers nor is task watchers capable of replacing kprobes.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Task watchers v2
  2006-12-18 13:18     ` Matt Helsley
@ 2006-12-19  5:41       ` Paul Jackson
  2006-12-19 12:05         ` Matt Helsley
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Jackson @ 2006-12-19  5:41 UTC (permalink / raw)
  To: Matt Helsley
  Cc: yanmin_zhang, akpm, linux-kernel, jes, hch, viro, sgrubb,
	linux-audit, pj, systemtap

Matt wrote:
> - Task watchers can actually improve kernel performance slightly (up to
> 2% in extremely fork-heavy workloads for instance).

Nice.

Could you explain why?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Task watchers v2
  2006-12-19  5:41       ` Paul Jackson
@ 2006-12-19 12:05         ` Matt Helsley
  2006-12-19 12:26           ` Paul Jackson
  0 siblings, 1 reply; 19+ messages in thread
From: Matt Helsley @ 2006-12-19 12:05 UTC (permalink / raw)
  To: Paul Jackson
  Cc: yanmin_zhang, akpm, linux-kernel, jes, hch, viro, sgrubb,
	linux-audit, systemtap

On Mon, 2006-12-18 at 21:41 -0800, Paul Jackson wrote:
> Matt wrote:
> > - Task watchers can actually improve kernel performance slightly (up to
> > 2% in extremely fork-heavy workloads for instance).
> 
> Nice.
> 
> Could you explain why?

After the last round of patches I set out to improve instruction and
data cache hits.

Previous iterations of task watchers would prevent the code in these
paths from being inlined. Furthermore, the code certainly wouldn't be
placed near the table of function pointers (which was in an entirely
different ELF section). By placing them adjacent to each other in the
same ELF section we can improve the likelihood of cache hits in
fork-heavy workloads (which were the ones that showed a performance
decrease in the previous iteration of these patches).

Suppose we have two functions to invoke during fork -- A and B. Here's
what the memory layout looked like in the previous iteration of task
watchers:

+--------------+<----+
|  insns of B  |     |
       .             |
       .             |
       .             |
|              |     |
+--------------+     |
       .             |
       .             |
       .             |
+--------------+<-+  |
|  insns of A  |  |  |
       .          |  |
       .          |  |
       .          |  |
|              |  |  |
+--------------+  |  |
       .          |  |
       .          |  |
       .          |  |  .text
==================|==|========= ELF Section Boundary
+--------------+  |  |  .task
| pointer to A----+  |
+--------------+     |
| pointer to B-------+
+--------------+

The notify_task_watchers() function would first load the pointer to A from the .task
section. Then it would immediately jump into the .text section and force the
instructions from A to be loaded. When A was finished, it would return to
notify_task_watchers() only to jump into B by the same steps.

As you can see things can be rather spread out. Unless the compiler inlined the
functions called from copy_process() things are very similar in a mainline
kernel -- copy_process() could be jumping to rather distant portions of the kernel
text and the pointer table would be rather distant from the instructions to be loaded.

Here's what the new patches look like:

===============================
+--------------+        .task
| pointer to A----+
+--------------+  |
| pointer to B-------+
+--------------+  |  |
       .          |  |
       .          |  |
+--------------+<-+  |
|  insns of A  |     |
       .             |
       .             |
       .             |
|              |     |
+--------------+<----+
|  insns of B  |
       .
       .
       .
|              |
+--------------+
===============================

Which is clearly more compact and also follows the order of calls (A
then B). The instructions are all in the same section. When A finishes
executing we soon jump into B which could be in the same instruction
cache line as the function we just left. Furthermore, since the sequence
always goes from A to B I expect some anticipatory loads could be done.

For fork-heavy workloads I'd expect this to explain the performance
difference. For workloads that aren't fork-heavy I suspect we're just as
likely to experience instruction cache misses -- whether the functions
are inlined, adjacent, or not -- since the fork happens relatively
infrequently and other instructions are likely to push fork-related
instructions out.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Task watchers v2
  2006-12-19 12:05         ` Matt Helsley
@ 2006-12-19 12:26           ` Paul Jackson
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Jackson @ 2006-12-19 12:26 UTC (permalink / raw)
  To: Matt Helsley
  Cc: yanmin_zhang, akpm, linux-kernel, jes, hch, viro, sgrubb,
	linux-audit, systemtap

Matt wrote:
> Previous iterations of task watchers would prevent the code in these
> paths from being inlined. Furthermore, the code certainly wouldn't be
> placed near the table of function pointers (which was in an entirely
> different ELF section). By placing them adjacent to each other in the
> same ELF section we can improve the likelihood of cache hits in
> fork-heavy workloads (which were the ones that showed a performance
> decrease in the previous iteration of these patches).

Ah so - by marking some of the fork (and exit, exec, ...) routines
with the WATCH_TASK_* mechanism, you can compact them together in the
kernel's text pages, instead of having them scattered about based on
whatever source files they are in.

Nice.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-12-19 12:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-15  0:07 [PATCH 00/10] Introduction Matt Helsley
2006-12-15  0:07 ` Task watchers v2 Matt Helsley
2006-12-15  8:34   ` Christoph Hellwig
2006-12-15 22:17     ` Matt Helsley
2006-12-15 23:13     ` Matt Helsley
2006-12-18  5:44   ` Zhang, Yanmin
2006-12-18 13:18     ` Matt Helsley
2006-12-19  5:41       ` Paul Jackson
2006-12-19 12:05         ` Matt Helsley
2006-12-19 12:26           ` Paul Jackson
2006-12-15  0:07 ` Register audit task watcher Matt Helsley
2006-12-15  0:07 ` Register semundo " Matt Helsley
2006-12-15  0:07 ` Register cpuset " Matt Helsley
2006-12-15  0:07 ` Register NUMA mempolicy " Matt Helsley
2006-12-15  0:08 ` Register IRQ flag tracing " Matt Helsley
2006-12-15  0:08 ` Register lockdep " Matt Helsley
2006-12-15  0:08 ` Register process keyrings " Matt Helsley
2006-12-15  0:08 ` Register process events connector Matt Helsley
2006-12-15  0:08 ` Prefetch hint Matt Helsley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox