* [PATCH] Process Aggregates (PAGG) for 2.6.7
@ 2004-06-24 18:08 Erik Jacobson
2004-06-24 18:32 ` Limin Gu
2004-06-24 23:22 ` Peter Williams
0 siblings, 2 replies; 8+ messages in thread
From: Erik Jacobson @ 2004-06-24 18:08 UTC (permalink / raw)
To: linux-kernel; +Cc: jlan, limin, pwil3058
[-- Attachment #1: Type: TEXT/PLAIN, Size: 683 bytes --]
Attached is a PAGG patch to kernel 2.6.7.
The maintainers of two patches that make use of PAGG will post their patches
in to this discussion thread shortly.
The biggest change in this patch from the last one I posted is that
Peter Williams supplied an implementation for the init function pointer
in the pagg hook. We kicked this around a few times to flush out
locking issues. Thanks to Robin Holt for helping me with that.
A bug was found by Dean Roe and fixed - we had to move our pagg_attach
call in fork.c.
There a couple other minor changes too.
Signed-off-by: Erik Jacobson <erikj@sgi.com>
--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota
[-- Attachment #2: Type: TEXT/PLAIN, Size: 28738 bytes --]
Index: linux/Documentation/pagg.txt
===================================================================
--- /dev/null
+++ linux/Documentation/pagg.txt
@@ -0,0 +1,32 @@
+Linux Process Aggregates (PAGG)
+-------------------------------
+
+The process aggregates infrastructure, or PAGG, provides a generalized
+mechanism for providing arbitrary process groups in Linux. PAGG consists
+of a series of functions for registering and unregistering support
+for new types of process aggregation containers with the kernel.
+This is similar to the support currently provided within Linux that
+allows for dynamic support of filesystems, block and character devices,
+symbol tables, network devices, serial devices, and execution domains.
+This implementation of PAGG provides developers the basic hooks necessary
+to implement kernel modules for specific process containers, such as
+the job container.
+
+The do_fork function in the kernel was altered to support PAGG. If a
+process is attached to any PAGG containers and subsequently forks a
+child process, the child process will also be attached to the same PAGG
+containers. The PAGG containers involved during the fork are notified
+that a new process has been attached. The notification is accomplished
+via a callback function provided by the PAGG module.
+
+The do_exit function in the kernel has also been altered. If a process
+is attached to any PAGG containers and that process is exiting, the PAGG
+containers are notified that a process has detached from the container.
+The notification is accomplished via a callback function provided by
+the PAGG module.
+
+The sys_execve function has been modified to support an optional callout
+that can be run when a process in a pagg list does an exec. It can be
+used, for example, by other kernel modules that wish to do advanced CPU
+placement on multi-processor systems (just one example).
+
Index: linux/fs/exec.c
===================================================================
--- linux.orig/fs/exec.c
+++ linux/fs/exec.c
@@ -46,7 +46,7 @@
#include <linux/security.h>
#include <linux/syscalls.h>
#include <linux/rmap.h>
-
+#include <linux/pagg.h>
#include <asm/uaccess.h>
#include <asm/pgalloc.h>
#include <asm/mmu_context.h>
@@ -1133,6 +1133,7 @@
retval = search_binary_handler(&bprm,regs);
if (retval >= 0) {
free_arg_pages(&bprm);
+ pagg_exec(current);
/* execve success */
security_bprm_free(&bprm);
Index: linux/include/linux/init_task.h
===================================================================
--- linux.orig/include/linux/init_task.h
+++ linux/include/linux/init_task.h
@@ -2,6 +2,7 @@
#define _LINUX__INIT_TASK_H
#include <linux/file.h>
+#include <linux/pagg.h>
#define INIT_FILES \
{ \
@@ -112,6 +113,7 @@
.proc_lock = SPIN_LOCK_UNLOCKED, \
.switch_lock = SPIN_LOCK_UNLOCKED, \
.journal_info = NULL, \
+ INIT_TASK_PAGG(tsk) \
}
Index: linux/include/linux/pagg.h
===================================================================
--- /dev/null
+++ linux/include/linux/pagg.h
@@ -0,0 +1,202 @@
+/*
+ * PAGG (Process Aggregates) interface
+ *
+ *
+ * Copyright (c) 2000-2002, 2004 Silicon Graphics, Inc. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA 94043, or:
+ *
+ * http://www.sgi.com
+ *
+ * For further information regarding this notice, see:
+ *
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ */
+
+/*
+ * Data structure definitions and function prototypes used to implement
+ * process aggregates (paggs).
+ *
+ * Paggs provides a generalized way to implement process groupings or
+ * containers. Modules use these functions to register with the kernel as
+ * providers of process aggregation containers. The pagg data structures
+ * define the callback functions and data access pointers back into the
+ * pagg modules.
+ */
+
+#ifndef _LINUX_PAGG_H
+#define _LINUX_PAGG_H
+
+#include <linux/sched.h>
+
+#ifdef CONFIG_PAGG
+
+#define PAGG_NAMELN 32 /* Max chars in PAGG module name */
+
+
+/**
+ * INIT_PAGG_LIST - used to initialize a pagg_list structure after declaration
+ * @_l: Task struct to init the pagg_list and semaphore in
+ *
+ */
+#define INIT_PAGG_LIST(_l) \
+do { \
+ INIT_LIST_HEAD(&(_l)->pagg_list); \
+ init_rwsem(&(_l)->pagg_sem); \
+} while(0)
+
+
+/*
+ * Used by task_struct to manage list of pagg attachments for the process.
+ * Each pagg provides the link between the process and the
+ * correct pagg container.
+ *
+ * STRUCT MEMBERS:
+ * hook: Reference to pagg module structure. That struct
+ * holds the name key and function pointers.
+ * data: Opaque data pointer - defined by pagg modules.
+ * entry: List pointers
+ */
+struct pagg {
+ struct pagg_hook *hook;
+ void *data;
+ struct list_head entry;
+};
+
+/*
+ * Used by pagg modules to define the callback functions into the
+ * module.
+ *
+ * STRUCT MEMBERS:
+ * name: The name of the pagg container type provided by
+ * the module. This will be set by the pagg module.
+ * attach: Function pointer to function used when attaching
+ * a process to the pagg container referenced by
+ * this struct.
+ * detach: Function pointer to function used when detaching
+ * a process to the pagg container referenced by
+ * this struct.
+ * init: Function pointer to initialization function. This
+ * function is used when the module is loaded to attach
+ * existing processes to a default container as defined by
+ * the pagg module. This is optional and may be set to
+ * NULL if it is not needed by the pagg module.
+ * data: Opaque data pointer - defined by pagg modules.
+ * module: Pointer to kernel module struct. Used to increment &
+ * decrement the use count for the module.
+ * entry: List pointers
+ * exec: Function pointer to function used when a process
+ * in the pagg container exec's a new process. This
+ * is optional and may be set to NULL if it is not
+ * needed by the pagg module.
+ * refcnt: Keep track of user count of the pagg hook
+ */
+struct pagg_hook {
+ struct module *module;
+ char *name; /* Name Key - restricted to 32 characters */
+ void *data; /* Opaque module specific data */
+ struct list_head entry; /* List pointers */
+ atomic_t refcnt; /* usage counter */
+ int (*init)(struct task_struct *, struct pagg *);
+ int (*attach)(struct task_struct *, struct pagg *, void*);
+ void (*detach)(struct task_struct *, struct pagg *);
+ void (*exec)(struct task_struct *, struct pagg *);
+};
+
+
+/* Kernel service functions for providing PAGG support */
+extern struct pagg *pagg_get(struct task_struct *task, char *key);
+extern struct pagg *pagg_alloc(struct task_struct *task,
+ struct pagg_hook *pt);
+extern void pagg_free(struct pagg *pagg);
+extern int pagg_hook_register(struct pagg_hook *pt_new);
+extern int pagg_hook_unregister(struct pagg_hook *pt_old);
+extern void __pagg_attach(struct task_struct *to_task,
+ struct task_struct *from_task);
+extern void __pagg_detach(struct task_struct *task);
+extern int __pagg_exec(struct task_struct *task);
+
+/**
+ * pagg_attach - child inherits attachment to pagg containers of its parent
+ * @child: child task - to inherit
+ * @parent: parenet task - child inherits pagg containers from this parent
+ *
+ * function used when a child process must inherit attachment to pagg
+ * containers from the parent.
+ *
+ */
+static inline void pagg_attach(struct task_struct *child,
+ struct task_struct *parent)
+{
+ INIT_PAGG_LIST(child);
+ if (!list_empty(&parent->pagg_list))
+ __pagg_attach(child, parent);
+ return;
+}
+
+
+/**
+ * pagg_detach - Detach a process from a pagg container it is a member of
+ * @task: The task the pagg will be detached from
+ *
+ */
+static inline void pagg_detach(struct task_struct *task)
+{
+ if (!list_empty(&task->pagg_list))
+ __pagg_detach(task);
+}
+
+/**
+ * pagg_exec - Used when a process exec's
+ * @task: The process doing the exec
+ *
+ */
+static inline void pagg_exec(struct task_struct *task)
+{
+ if (!list_empty(&task->pagg_list))
+ __pagg_exec(task);
+}
+
+/**
+ * INIT_TASK_PAGG - Used in INIT_TASK to set the head and sem of pagg_list
+ * @tsk: The task work with
+ *
+ * Marco Used in INIT_TASK to set the head and sem of pagg_list.
+ * If CONFIG_PAGG is off, it is defined as an empty macro below.
+ *
+ */
+#define INIT_TASK_PAGG(tsk) \
+ .pagg_list = LIST_HEAD_INIT(tsk.pagg_list), \
+ .pagg_sem = __RWSEM_INITIALIZER(tsk.pagg_sem)
+
+#else /* CONFIG_PAGG */
+
+/*
+ * Replacement macros used when PAGG (Process Aggregates) support is not
+ * compiled into the kernel.
+ */
+#define INIT_TASK_PAGG(tsk)
+#define INIT_PAGG_LIST(l) do { } while(0)
+#define pagg_attach(ct, pt) do { } while(0)
+#define pagg_detach(t) do { } while(0)
+#define pagg_exec(t) do { } while(0)
+
+#endif /* CONFIG_PAGG */
+
+#endif /* _LINUX_PAGG_H */
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -513,6 +513,13 @@
struct mempolicy *mempolicy;
short il_next; /* could be shared with used_math */
#endif
+
+#ifdef CONFIG_PAGG
+/* List of pagg (process aggregate) attachments */
+ struct list_head pagg_list;
+ struct rw_semaphore pagg_sem;
+#endif
+
};
static inline pid_t process_group(struct task_struct *tsk)
Index: linux/init/Kconfig
===================================================================
--- linux.orig/init/Kconfig
+++ linux/init/Kconfig
@@ -121,6 +121,14 @@
up to the user level program to do useful things with this
information. This is generally a good idea, so say Y.
+config PAGG
+ bool "Support for process aggregates (PAGGs)"
+ help
+ Say Y here if you will be loading modules which provide support
+ for process aggregate containers. Examples of such modules include the
+ Linux Jobs module and the Linux Array Sessions module. If you will not
+ be using such modules, say N.
+
config SYSCTL
bool "Sysctl support"
---help---
Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -18,6 +18,7 @@
obj-$(CONFIG_PM) += power/
obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
obj-$(CONFIG_COMPAT) += compat.o
+obj-$(CONFIG_PAGG) += pagg.o
obj-$(CONFIG_IKCONFIG) += configs.o
obj-$(CONFIG_IKCONFIG_PROC) += configs.o
obj-$(CONFIG_STOP_MACHINE) += stop_machine.o
Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -23,6 +23,7 @@
#include <linux/mount.h>
#include <linux/proc_fs.h>
#include <linux/mempolicy.h>
+#include <linux/pagg.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -812,6 +813,9 @@
module_put(tsk->binfmt->module);
tsk->exit_code = code;
+
+ pagg_detach(tsk);
+
exit_notify(tsk);
schedule();
BUG();
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c
+++ linux/kernel/fork.c
@@ -36,6 +36,7 @@
#include <linux/mount.h>
#include <linux/audit.h>
#include <linux/rmap.h>
+#include <linux/pagg.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -236,6 +237,9 @@
init_task.rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
init_task.rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
+
+ /* Initialize the pagg list in pid 0 before it can clone itself. */
+ INIT_PAGG_LIST(current);
}
static struct task_struct *dup_task_struct(struct task_struct *orig)
@@ -1023,6 +1027,12 @@
sched_fork(p);
/*
+ * call pagg modules to properly attach new process to the same
+ * process aggregate containers as the parent process.
+ */
+ pagg_attach(p, current);
+
+ /*
* Ok, make it visible to the rest of the system.
* We dont wake it up yet.
*/
Index: linux/kernel/pagg.c
===================================================================
--- /dev/null
+++ linux/kernel/pagg.c
@@ -0,0 +1,474 @@
+/*
+ * PAGG (Process Aggregates) interface
+ *
+ *
+ * Copyright (c) 2000-2004 Silicon Graphics, Inc. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA 94043, or:
+ *
+ * http://www.sgi.com
+ */
+
+#include <linux/config.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/module.h>
+#include <linux/pagg.h>
+#include <asm/semaphore.h>
+
+/* list of pagg hook entries that reference the "module" implementations */
+static LIST_HEAD(pagg_hook_list);
+static DECLARE_RWSEM(pagg_hook_list_sem);
+
+
+/**
+ * pagg_get - get a pagg given a search key
+ * @task: We examine the pagg_list from the given task
+ * @key: Key name of pagg we wish to retrieve
+ *
+ * Given a pagg_list list structure, this function will return
+ * a pointer to the pagg struct that matches the search
+ * key. If the key is not found, the function will return NULL.
+ *
+ * The caller should hold at least a read lock on the pagg_list
+ * for task using down_read(&task->pagg_list.sem).
+ *
+ */
+struct pagg *
+pagg_get(struct task_struct *task, char *key)
+{
+ struct pagg *pagg;
+
+ list_for_each_entry(pagg, &task->pagg_list, entry) {
+ if (!strcmp(pagg->hook->name,key))
+ return pagg;
+ }
+ return NULL;
+}
+
+
+/**
+ * pagg_alloc - Insert a new pagg in to the pagg_list for a task
+ * @task: Task we want to insert the pagg in to
+ * @pagg_hook: Pagg hook to associate with the new pagg
+ *
+ * Given a task and a pagg hook, this function will allocate
+ * a new pagg structure, initialize the settings, and insert the pagg into
+ * the pagg_list for the task.
+ *
+ * The caller for this function should hold at least a read lock on the
+ * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be
+ * removed. If this function was called from the pagg module (usually the
+ * case), then the caller need not hold this lock. The caller should hold
+ * a write lock on for the tasks pagg_sem. This can be locked using
+ * down_write(&task->pagg_sem)
+ *
+ */
+struct pagg *
+pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook)
+{
+ struct pagg *pagg;
+
+ pagg = kmalloc(sizeof(struct pagg), GFP_KERNEL);
+ if (!pagg)
+ return NULL;
+
+ pagg->hook = pagg_hook;
+ pagg->data = NULL;
+ atomic_inc(&pagg_hook->refcnt); /* Increase hook's reference count */
+ list_add_tail(&pagg->entry, &task->pagg_list);
+ return pagg;
+}
+
+
+/**
+ * pagg_free - Delete pagg from the list and free its memory
+ * @pagg: The pagg to free
+ *
+ * This function will ensure the pagg is deleted form
+ * the list of pagg entries for the task. Finally, the memory for the
+ * pagg is discarded.
+ *
+ * The caller of this function should hold a write lock on the pagg_sem
+ * for the task. This can be locked using down_write(&task->pagg_sem).
+ *
+ * Prior to calling pagg_free, the pagg should have been detached from the
+ * pagg container represented by this pagg. That is usually done using
+ * p->hook->detach(task, pagg);
+ *
+ */
+void
+pagg_free(struct pagg *pagg)
+{
+ atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */
+ list_del(&pagg->entry);
+ kfree(pagg);
+}
+
+
+/**
+ * get_pagg_hook - Get the pagg hook matching the requested name
+ * @key: The name of the pagg hook to get
+ *
+ * Given a pagg hook name key, this function will return a pointer
+ * to the pagg_hook struct that matches the name.
+ *
+ * You should hold either the write or read lock for pagg_hook_list_sem
+ * before using this function. This will ensure that the pagg_hook_list
+ * does not change while iterating through the list entries.
+ *
+ */
+static struct pagg_hook *
+get_pagg_hook(char *key)
+{
+ struct pagg_hook *pagg_hook;
+
+ list_for_each_entry(pagg_hook, &pagg_hook_list, entry) {
+ if (!strcmp(pagg_hook->name, key)) {
+ return pagg_hook;
+ }
+ }
+ return NULL;
+}
+
+/**
+ * remove_client_paggs_from_all_tasks - Remove all paggs associated with hook
+ * @php: Pagg hook associated with paggs to purge
+ *
+ * Given a pagg hook, this function will remove all paggs associated with that
+ * pagg hook from all tasks calling the provided function on each pagg.
+ *
+ * If there is a detach function associated with the pagg, it is called
+ * before the pagg is freed.
+ *
+ * This is meant to be used by pagg_hook_register and pagg_hook_unregister
+ *
+ */
+static void
+remove_client_paggs_from_all_tasks(struct pagg_hook *php)
+{
+ if (php == NULL)
+ return;
+
+ /* Because of internal race conditions we can't gaurantee
+ * getting every task in just one pass so we just keep going
+ * until there are no tasks with paggs from this hook attached.
+ * The inefficiency of this should be tempered by the fact that this
+ * happens at most once for each registered client.
+ *
+ * Because we hold the tasklist lock, we can't use down_write on a
+ * semaphore. So we use down_write_trylock and go around again if
+ * we fail to get a lock...
+ */
+ while (atomic_read(&php->refcnt) != 0) {
+ struct task_struct *p = NULL;
+
+ read_lock(&tasklist_lock);
+ for_each_process(p) {
+ struct pagg *paggp;
+
+ /* If we fail to get the lock, we'll just try again. We rely on
+ * the pagg hook reference count to know when we're done */
+ if (down_write_trylock(&p->pagg_sem)) {
+ paggp = pagg_get(p, php->name);
+ if (paggp != NULL) {
+ (void)php->detach(p, paggp);
+ pagg_free(paggp);
+ }
+ up_write(&p->pagg_sem);
+ }
+ }
+ read_unlock(&tasklist_lock);
+ }
+}
+
+/**
+ * pagg_hook_register - Register a new pagg hook and enter it the list
+ * @pagg_hook_new: The new pagg hook to register
+ *
+ * Used to register a new pagg hook and enter it into the pagg_hook_list.
+ * The service name for a pagg hook is restricted to 32 characters.
+ *
+ * If an "init()" function is supplied in the hook being registered then a
+ * pagg will be attached to all existing tasks and the supplied "init()"
+ * function will be applied to it. If any call to the supplied "init()"
+ * function returns a non zero result the registration will be aborted. As
+ * part of the abort process, all paggs belonging to the new client will be
+ * removed from all tasks and the supplied "detach()" function will be
+ * called on them. Note: The init function must not sleep.
+ *
+ * If a memory error is encountered, the pagg hook is unregistered and any
+ * tasks that have been attached to the initial pagg container are detached
+ * from that container.
+ *
+ */
+int
+pagg_hook_register(struct pagg_hook *pagg_hook_new)
+{
+ struct pagg_hook *pagg_hook = NULL;
+
+ /* Add new pagg module to access list */
+ if (!pagg_hook_new)
+ return -EINVAL; /* error */
+ if (!list_empty(&pagg_hook_new->entry))
+ return -EINVAL; /* error */
+ if (pagg_hook_new->name == NULL || strlen(pagg_hook_new->name) > PAGG_NAMELN)
+ return -EINVAL; /* error */
+
+ /* Try to insert new hook entry into the pagg hook list */
+ down_write(&pagg_hook_list_sem);
+
+ pagg_hook = get_pagg_hook(pagg_hook_new->name);
+
+ if (pagg_hook) {
+ up_write(&pagg_hook_list_sem);
+ printk(KERN_WARNING "Attempt to register duplicate"
+ " PAGG support (name=%s)\n", pagg_hook_new->name);
+ return -EBUSY;
+ }
+
+ /* Okay, we can insert into the pagg hook list */
+ list_add_tail(&pagg_hook_new->entry, &pagg_hook_list);
+ /* set the ref count to zero */
+ atomic_set(&pagg_hook_new->refcnt, 0);
+
+ /* Now we can call the initializer function (if present) for each task */
+ if (pagg_hook_new->init != NULL) {
+ int init_result = 0;
+ int done = 0;
+
+ /* Because of internal race conditions we can't gaurantee
+ * getting every task in just one pass so we just keep going
+ * until we don't find any unitialised tasks. The inefficiency
+ * of this should be tempered by the fact that this happens
+ * at most once for each registered client.
+ *
+ * To avoid having to have the tasklist locked while trying to
+ * access semaphores, we use write_down_trylock and if we don't
+ * get a lock, we loop around again.
+ */
+ while (!done) {
+ struct task_struct *p = NULL;
+
+
+ done = 1; /* flag that we're done unless we can't get the lock */
+
+ read_lock(&tasklist_lock);
+ for_each_process(p) {
+ struct pagg *paggp;
+
+ /* Try the lock */
+ if (down_write_trylock(&p->pagg_sem)) {
+ paggp = pagg_get(p, pagg_hook_new->name);
+ if (paggp == NULL) {
+ paggp = pagg_alloc(p, pagg_hook_new);
+ if (paggp != NULL)
+ init_result = pagg_hook_new->init(p, paggp);
+ else
+ init_result = -ENOMEM;
+ if (init_result != 0) {
+ /* init failed or out of memory - game over */
+ done = 1;
+ break;
+ }
+ }
+ up_write(&p->pagg_sem);
+ } else { /* We failed to get the lock, keep trying... */
+ done = 0;
+ }
+ }
+ read_unlock(&tasklist_lock);
+ }
+
+ /*
+ * if anything went wrong during initialisation abandon the
+ * registration process
+ */
+ if (init_result != 0) {
+ remove_client_paggs_from_all_tasks(pagg_hook_new);
+ list_del_init(&pagg_hook_new->entry);
+ up_write(&pagg_hook_list_sem);
+
+ printk(KERN_WARNING "Registering PAGG support for"
+ " (name=%s) failed. errcode=%d\n", pagg_hook_new->name, init_result);
+
+ return init_result;
+ }
+ }
+
+ up_write(&pagg_hook_list_sem);
+
+ printk(KERN_INFO "Registering PAGG support for (name=%s)\n",
+ pagg_hook_new->name);
+
+ return 0; /* success */
+
+}
+
+/**
+ * pagg_hook_unregister - Unregister pagg hook and remove it from the list
+ * @pagg_hook_old: The hook to unregister and remove
+ *
+ * Used to unregister pagg hooks and remove them from the pagg_hook_list.
+ * Once the pagg hook entry in the pagg_hook_list is found, paggs associated
+ * with the hook (if any) will have their detach function called and will
+ * be detached.
+ *
+ */
+int
+pagg_hook_unregister(struct pagg_hook *pagg_hook_old)
+{
+ struct pagg_hook *pagg_hook;
+
+ /* Check the validity of the arguments */
+ if (!pagg_hook_old)
+ return -EINVAL; /* error */
+ if (list_empty(&pagg_hook_old->entry))
+ return -EINVAL; /* error */
+ if (pagg_hook_old->name == NULL)
+ return -EINVAL; /* error */
+
+ down_write(&pagg_hook_list_sem);
+
+ pagg_hook = get_pagg_hook(pagg_hook_old->name);
+
+ if (pagg_hook && pagg_hook == pagg_hook_old) {
+ remove_client_paggs_from_all_tasks(pagg_hook);
+ list_del_init(&pagg_hook->entry);
+ up_write(&pagg_hook_list_sem);
+
+ printk(KERN_INFO "Unregistering PAGG support for"
+ " (name=%s)\n", pagg_hook_old->name);
+
+ return 0; /* success */
+ }
+
+ up_write(&pagg_hook_list_sem);
+
+ printk(KERN_WARNING "Attempt to unregister PAGG support (name=%s)"
+ " failed - not found\n", pagg_hook_old->name);
+
+ return -EINVAL; /* error */
+}
+
+
+/**
+ * __pagg_attach - Attach a new task to the same containers of its parent
+ * @to_task: The child task that will inherit the parent's containers
+ * @from_task: The parent task
+ *
+ * Used to attach a new task to the same pagg containers to which it's parent
+ * is attached.
+ *
+ * The "from" argument is the parent task. The "to" argument is the child
+ * task.
+ *
+ */
+void
+__pagg_attach(struct task_struct *to_task, struct task_struct *from_task)
+{
+ struct pagg *from_pagg;
+
+ /* lock the parents pagg_list we are copying from */
+ down_read(&from_task->pagg_sem); /* read lock the pagg list */
+
+ list_for_each_entry(from_pagg, &from_task->pagg_list, entry) {
+ struct pagg *to_pagg = NULL;
+
+ to_pagg = pagg_alloc(to_task, from_pagg->hook);
+ if (!to_pagg) {
+ goto error_return;
+ }
+ if (to_pagg->hook->attach(to_task, to_pagg, from_pagg->data) != 0 )
+ goto error_return;
+ }
+
+ up_read(&from_task->pagg_sem); /* unlock the pagg list */
+
+ return; /* success */
+
+ error_return:
+ /*
+ * Clean up all the pagg attachments made on behalf of the new
+ * task. Set new task pagg ptr to NULL for return.
+ */
+ up_read(&from_task->pagg_sem); /* unlock the pagg list */
+ __pagg_detach(to_task);
+ return; /* failure */
+}
+
+/**
+ * __pagg_detach - Detach a task from all pagg containers it is attached to
+ * @task: Task to detach from pagg containers
+ *
+ * Used to detach a task from all pagg containers to which it is attached.
+ *
+ */
+void
+__pagg_detach(struct task_struct *task)
+{
+ struct pagg *pagg;
+ struct pagg *paggtmp;
+
+ /* Remove ref. to paggs from task immediately */
+ down_write(&task->pagg_sem); /* write lock pagg list */
+
+ list_for_each_entry_safe(pagg, paggtmp, &task->pagg_list, entry) {
+ pagg->hook->detach(task, pagg);
+ pagg_free(pagg);
+ }
+
+ up_write(&task->pagg_sem); /* write unlock the pagg list */
+
+ return; /* 0 = success, else return last code for failure */
+}
+
+
+/**
+ * __pagg_exec - Execute callback when a process in a container execs
+ * @task: We go through the pagg list in the given task
+ *
+ * Used to when a process that is in a pagg container does an exec.
+ *
+ * The "from" argument is the task. The "name" argument is the name
+ * of the process being exec'ed.
+ *
+ */
+int
+__pagg_exec(struct task_struct *task)
+{
+ struct pagg *pagg;
+
+ /* lock the parents pagg_list we are copying from */
+ down_read(&task->pagg_sem); /* lock the pagg list */
+
+ list_for_each_entry(pagg, &task->pagg_list, entry) {
+ if (pagg->hook->exec) /* conditional because it's optional */
+ pagg->hook->exec(task, pagg);
+ }
+
+ up_read(&task->pagg_sem); /* unlock the pagg list */
+ return 0;
+}
+
+
+EXPORT_SYMBOL(pagg_get);
+EXPORT_SYMBOL(pagg_alloc);
+EXPORT_SYMBOL(pagg_free);
+EXPORT_SYMBOL(pagg_hook_register);
+EXPORT_SYMBOL(pagg_hook_unregister);
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] Process Aggregates (PAGG) for 2.6.7 2004-06-24 18:08 [PATCH] Process Aggregates (PAGG) for 2.6.7 Erik Jacobson @ 2004-06-24 18:32 ` Limin Gu 2004-06-24 18:57 ` Chris Wright 2004-06-24 19:31 ` Jay Lan 2004-06-24 23:22 ` Peter Williams 1 sibling, 2 replies; 8+ messages in thread From: Limin Gu @ 2004-06-24 18:32 UTC (permalink / raw) To: Erik Jacobson; +Cc: linux-kernel, jlan, limin, pwil3058 [-- Attachment #1: Type: text/plain, Size: 1009 bytes --] > Attached is a PAGG patch to kernel 2.6.7. > > The maintainers of two patches that make use of PAGG will post their patches > in to this discussion thread shortly. One user of PAGG is job, a loadable kernel module. You can find the documentation of job in the attached patch. Job has not received much feedback from the community yet, we welcome any comments/suggestions/criticism for you. Thanks! Limin Gu - Linux System Software - Silicon Graphics > > The biggest change in this patch from the last one I posted is that > Peter Williams supplied an implementation for the init function pointer > in the pagg hook. We kicked this around a few times to flush out > locking issues. Thanks to Robin Holt for helping me with that. > > A bug was found by Dean Roe and fixed - we had to move our pagg_attach > call in fork.c. > > There a couple other minor changes too. > > Signed-off-by: Erik Jacobson <erikj@sgi.com> > > -- > Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota [-- Attachment #2: English text --] [-- Type: text/plain, Size: 66434 bytes --] Index: linux/Documentation/job.txt =================================================================== --- /dev/null +++ linux/Documentation/job.txt @@ -0,0 +1,104 @@ +Linux Jobs - A Process Aggregate (PAGG) Module +---------------------------------------------- + +1. Overview + +This document provides two additional sections. Section 2 provides a +listing of the manual page that describes the particulars of the Linux +job implementation. Section 3 provides some information about using +the user job library to interface to jobs. + +2. Job Man Page + + +JOB(7) Linux User's Manual JOB(7) + + +NAME + job - Linux Jobs kernel module overview + +DESCRIPTION + A job is a group of related processes all descended from a + point of entry process and identified by a unique job + identifier (jid). A job can contain multiple process + groups or sessions, and all processes in one of these sub- + groups can only be contained within a single job. + + The primary purpose for having jobs is to provide job + based resource limits. The current implementation only + provides the job container and resource limits will be + provided in a later implementation. When an implementa- + tion that provides job limits is available, this descrip- + tion will be expanded to provide further explanation of + job based limits. + + Not every process on the system is part of a job. That + is, only processes which are started by a login initiator + like login, rlogin, rsh and so on, get assigned a job ID. + In the Linux environment, jobs are created via a PAM mod- + ule. + + Jobs on Linux are provided using a loadable kernel module. + Linux jobs have the following characteristics: + + o A job is an inescapable container. A process cannot + leave the job nor can a new process be created outside + the job without explicit action, that is, a system + call with root privilege. + + o Each new process inherits the jid and limits [when + implemented] from its parent process. + + o All point of entry processes (job initiators) create a + new job and set the job limits [when implemented] + appropriately. + + o Job initiation on Linux is performed via a PAM session + module. + + o The job initiator performs authentication and security + checks. + + o Users can raise and lower their own job limits within + maximum values specified by the system administrator + [when implemented]. + + o Not all processes on a system need be members of a job. + + o The process control initialization process (init(1M)) + and startup scripts called by init are not part of a + job. + + + Job initiators can be categorized as either interactive or + batch processes. Limit domain names are defined by the + system administrator when the user limits database (ULDB) + is created. [The ULDB will be implemented in conjunction + with future job limits work.] + + Note: The existing command jobs(1) applies to shell "jobs" + and it is not related to the Linux Kernel Module jobs. + The at(1), atd(8), atq(1), batch(1), atrun(8), atrm(1)) + man pages refer to shell scripts as a job. a shell + script. + +SEE ALSO + job(1), jwait(1), jstat(1), jkill(1) + + + + + + + + + +3. User Job Library + +For developers who wish to make software using Linux Jobs, there exists +a user job library. This library contains functions for obtaining information +about running jobs, creating jobs, detaching, etc. + +The library is part of the job package and can be obtained from oss.sgi.com +using anonymous ftp. Look in the /projects/pagg/download directory. See the +README in the job source package for more information. Index: linux/include/linux/job.h =================================================================== --- /dev/null +++ linux/include/linux/job.h @@ -0,0 +1,123 @@ +/* + * PAGG Job kernel definitions & interfaces + * + * + * Copyright (c) 2000-2004 Silicon Graphics, Inc. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + * + * For further information regarding this notice, see: + * + * http://oss.sgi.com/projects/GenInfo/NoticeExplan + */ + +/* + * Description: This file, include/linux/job.h, contains the data + * structure definitions and functions prototypes used + * by other kernel bits that communicate with the job + * module. One such example is Comprehensive System + * Accounting (CSA). + */ + +#ifndef _LINUX_JOB_H +#define _LINUX_JOB_H + +/* + * ================ + * GENERAL USE INFO + * ================ + */ + +/* + * The job start/stop events: These will identify the + * the reason the jobstart and jobend callbacks are being + * called. + */ +enum { + JOB_EVENT_IGNORE = 0, + JOB_EVENT_START = 1, + JOB_EVENT_RESTART = 2, + JOB_EVENT_END = 3, +}; + + +/* + * ========================================= + * INTERFACE INFO FOR ACCOUNTING SUBSCRIBERS + * ========================================= + */ + +/* To register as a job dependent accounting module */ +struct job_acctmod { + int type; /* CSA or something else */ + int (*jobstart)(int event, void *data); + int (*jobend)(int event, void *data); + struct module *module; +}; + + +/* + * Subscriber type: Each module that registers as a accounting data + * "subscriber" has to have a type. This type will identify the + * the appropriate structs and macros to use when exchanging data. + */ +#define JOB_ACCT_CSA 0 +#define JOB_ACCT_COUNT 1 /* Number of entries available */ + + +/* + * -------------- + * CSA ACCOUNTING + * -------------- + */ + +/* + * For data exchange betwee job and csa. The embedded defines + * identify the sub-fields + */ +struct job_csa { +#define JOB_CSA_JID 001 + u64 job_id; +#define JOB_CSA_UID 002 + uid_t job_uid; +#define JOB_CSA_START 004 + time_t job_start; +#define JOB_CSA_COREHIMEM 010 + u64 job_corehimem; +#define JOB_CSA_VIRTHIMEM 020 + u64 job_virthimem; +#define JOB_CSA_ACCTFILE 040 + struct file *job_acctfile; +}; + + +/* + * =================== + * FUNCTION PROTOTYPES + * =================== + */ +int job_register_acct(struct job_acctmod *); +int job_unregister_acct(struct job_acctmod *); +u64 job_getjid(struct task_struct *); +int job_getacct(u64, int, void *); +int job_setacct(u64, int, int, void *); + +#endif /* _LINUX_JOB_H */ Index: linux/include/linux/paggctl.h =================================================================== --- /dev/null +++ linux/include/linux/paggctl.h @@ -0,0 +1,179 @@ +/* + * + * Copyright (c) 2000-2004 Silicon Graphics, Inc. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + * + * For further information regarding this notice, see: + * + * http://oss.sgi.com/projects/GenInfo/NoticeExplan + * + * + * Description: This file, include/linux/paggctl.h, contains the data + * definitions used by job to communicate with pagg via the /proc/job + * ioctl interface. + * + */ + +#ifndef _LINUX_PAGGCTL_H +#define _LINUX_PAGGCTL_H +#ifndef __KERNEL__ +#include <stdint.h> +#include <sys/types.h> +#include <asm/unistd.h> +#endif + +#define PAGG_NAMELN 32 /* Max chars in PAGG module name */ +#define PAGG_NAMESTR PAGG_NAMELN+1 /* PAGG mod name string including + * room for end-of-string = '\0' */ + +/* + * ==================== + * JOB PAGG definitions + * ==================== + */ +#define PAGG_JOB "job" /* PAGG module identifier string */ + + + +/* + * ================ + * KERNEL INTERFACE + * ================ + */ +#define JOB_PROC_ENTRY "job" /* /proc entry name */ +#define JOB_IOCTL_NUM 'A' + + +/* + * + * Define ioctl options available in the job module + * + */ + +#define JOB_NOOP _IOWR(JOB_IOCTL_NUM, 0, void *) /* No-op options */ + +#define JOB_CREATE _IOWR(JOB_IOCTL_NUM, 1, void *) /* Create a job - uid = 0 only */ +#define JOB_ATTACH _IOWR(JOB_IOCTL_NUM, 2, void *) /* RESERVED */ +#define JOB_DETACH _IOWR(JOB_IOCTL_NUM, 3, void *) /* RESERVED */ +#define JOB_GETJID _IOWR(JOB_IOCTL_NUM, 4, void *) /* Get Job ID for specificed pid */ +#define JOB_WAITJID _IOWR(JOB_IOCTL_NUM, 5, void *) /* Wait for job to complete */ +#define JOB_KILLJID _IOWR(JOB_IOCTL_NUM, 6, void *) /* Send signal to job */ +#define JOB_GETJIDCNT _IOWR(JOB_IOCTL_NUM, 9, void *) /* Get number of JIDs on system */ +#define JOB_GETJIDLST _IOWR(JOB_IOCTL_NUM, 10, void *) /* Get list of JIDs on system */ +#define JOB_GETPIDCNT _IOWR(JOB_IOCTL_NUM, 11, void *) /* Get number of PIDs in JID */ +#define JOB_GETPIDLST _IOWR(JOB_IOCTL_NUM, 12, void *) /* Get list of PIDs in JID */ +#define JOB_SETJLIMIT _IOWR(JOB_IOCTL_NUM, 13, void *) /* Future: set job limits info */ +#define JOB_GETJLIMIT _IOWR(JOB_IOCTL_NUM, 14, void *) /* Future: get job limits info */ +#define JOB_GETJUSAGE _IOWR(JOB_IOCTL_NUM, 15, void *) /* Future: get job res. usage */ +#define JOB_FREE _IOWR(JOB_IOCTL_NUM, 16, void *) /* Future: Free job entry */ +#define JOB_GETUSER _IOWR(JOB_IOCTL_NUM, 17, void *) /* Get owner for job */ +#define JOB_GETPRIMEPID _IOWR(JOB_IOCTL_NUM, 18, void *) /* Get prime pid for job */ +#define JOB_SETHID _IOWR(JOB_IOCTL_NUM, 19, void *) /* Set HID for jid values */ +#define JOB_DETACHJID _IOWR(JOB_IOCTL_NUM, 20, void *) /* Detach all tasks from job */ +#define JOB_DETACHPID _IOWR(JOB_IOCTL_NUM, 21, void *) /* Detach a task from job */ +#define JOB_OPT_MAX _IOWR(JOB_IOCTL_NUM, 22 , void *) /* Should always be highest number */ + + +/* + * Define ioctl request structures for job module + */ + +struct job_create { + u64 r_jid; /* Return value of JID */ + u64 jid; /* Jid value requested */ + int user; /* UID of user associated with job */ + int options;/* creation options - unused */ +}; + + +struct job_getjid { + u64 r_jid; /* Returned value of JID */ + pid_t pid; /* Info requested for PID */ +}; + + +struct job_waitjid { + u64 r_jid; /* Returned value of JID */ + u64 jid; /* Waiting on specified JID */ + int stat; /* Status information on JID */ + int options;/* Waiting options */ +}; + + +struct job_killjid { + int r_val; /* Return value of kill request */ + u64 jid; /* Sending signal to all PIDs in JID */ + int sig; /* Signal to send */ +}; + + +struct job_jidcnt { + int r_val; /* Number of JIDs on system */ +}; + + +struct job_jidlst { + int r_val; /* Number of JIDs in list */ + u64 *jid; /* List of JIDs */ +}; + + +struct job_pidcnt { + int r_val; /* Number of PIDs in JID */ + u64 jid; /* Getting count of JID */ +}; + + +struct job_pidlst { + int r_val; /* Number of PIDs in list */ + pid_t *pid; /* List of PIDs */ + u64 jid; +}; + + +struct job_user { + int r_user; /* The UID of the owning user */ + u64 jid; /* Get the UID for this job */ +}; + +struct job_primepid { + pid_t r_pid; /* The prime pid */ + u64 jid; /* Get the prime pid for this job */ +}; + +struct job_sethid { + unsigned long r_hid; /* Value that was set */ + unsigned long hid; /* Value to set to */ +}; + + +struct job_detachjid { + int r_val; /* Number of tasks detached from job */ + u64 jid; /* Job to detach processes from */ +}; + +struct job_detachpid { + u64 r_jid; /* Jod ID task was attached to */ + pid_t pid; /* Task to detach from job */ +}; + +#endif /* _LINUX_PAGGCTL_H */ Index: linux/init/Kconfig =================================================================== --- linux.orig/init/Kconfig +++ linux/init/Kconfig @@ -129,6 +129,31 @@ Linux Jobs module and the Linux Array Sessions module. If you will not be using such modules, say N. +config PAGG_JOB + tristate " Process aggregate based jobs" + depends on PAGG + help + The Job feature implements a type of process aggregate, + or grouping. A job is the collection of all processes that + are descended from a point-of-entry process. Examples of such + points-of-entry include telnet, rlogin, and console logins. + A job differs from a session and process group since the job + container (or group) is inescapable. Only root level processes, + or those with the CAP_SYS_RESOURCE capability, can create new jobs + or escape from a job. + + A job is identified by a unique job identifier (jid). Currently, + that jid can be used to obtain status information about the job + and the processes it contians. The jid can also be used to send + signals to all processes contained in the job. In addition, + other processes can wait for the completion of a job - the event + where the last process contained in the job has exited. + + If you want to compile support for jobs into the kernel, select + this entry using Y. If you want the support for jobs provided as + a module, select this entry using M. If you do not want support + for jobs, select N. + config SYSCTL bool "Sysctl support" ---help--- Index: linux/kernel/job.c =================================================================== --- /dev/null +++ linux/kernel/job.c @@ -0,0 +1,2052 @@ +/* + * Linux Job kernel module + * + * + * Copyright (c) 2000-2004 Silicon Graphics, Inc. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + * + * For further information regarding this notice, see: + * + * http://oss.sgi.com/projects/GenInfo/NoticeExplan + */ + +/* + * Description: This file implements a type of process grouping called jobs. + * For further information about jobs, consult the file + * Documentation/job.txt. Jobs are implemented as a type of PAGG + * (process aggregate). For further information about PAGGs, + * consult the file Documentation/pagg.txt. + */ + +/* + * LOCKING INFO + * + * There are currently two levels of locking in this module. So, we + * have two classes of locks: + * + * (1) job table lock (always, job_table_sem) + * (2) job entry lock (usually, job->sem) + * + * Most of the locking used is read/write sempahores. In rare cases, a + * spinlock is also used. Those cases requiring a spinlock concern when the + * tasklist_lock must be locked (such as when looping over all tasks on the + * system). + * + * There is only one job_table_sem. There is a job->sem for each job + * entry in the job_table. This job module is a PAGG module (Process + * Aggregation). Each task has a special lock that protects its PAGG + * information - this is called the pagg list lock. There are special macros + * used to lock/unlock a task's pagg list lock. The pagg list lock is really + * a semaphore. + * + * Purpose: + * + * (1) The job_table_sem protects all entries in the table. + * (2) The job->sem protects all data and task attachments for the job. + * + * Truths we hold to be self-evident: + * + * Only the holder of a write lock for the job_table_lock may add or + * delete a job entry from the job_table. The job_table includes all job + * entries in the hash table and chains off the hash table locations. + * + * Only the holder of a write lock for a job->lock may attach or detach + * processes/tasks from the attached list for the job. + * + * If you hold a read lock of job_table_lock, you can assume that the + * job entries in the table will not change. The link pointers for + * the chains of job entries will not change, the job ID (jid) value + * will not change, and data changes will be (mostly) atomic. + * + * If you hold a read lock of a job->lock, you can assume that the + * attachments to the job will not change. The link pointers for the + * attachment list will not change and the attachments will not change. + * + * If you are going to grab nested locks, the nesting order is: + * + * down_write/up_write/down_read/up_read(&task->pagg_sem) + * job_table_sem + * job->sem + * + * However, it is not strictly necessary to down the job_table_sem + * before downing job->sem. + * + * Also, the nesting order allows you to lock in this order: + * + * down_write/up_write/down_read/up_read(&task->pagg_sem) + * job->sem + * + * without locking job_table_sem between the two. + * + */ + +/* standard for kernel modules */ +#include <linux/config.h> +#include <linux/module.h> +#include <linux/kernel.h> +#include <linux/kmod.h> +#include <linux/init.h> +#include <linux/list.h> + +#include <asm/uaccess.h> /* for get_user & put_user */ + +#include <linux/sched.h> /* for current */ +#include <linux/tty.h> /* for the tty declarations */ +#include <linux/slab.h> +#include <linux/types.h> + +#include <linux/proc_fs.h> + +#include <linux/string.h> +#include <asm/semaphore.h> + +#include <linux/pagg.h> /* to use pagg hooks */ +#include <linux/job.h> +#include <linux/paggctl.h> + +MODULE_AUTHOR("Silicon Graphics, Inc."); +MODULE_DESCRIPTION("PAGG-based inescapable jobs"); +MODULE_LICENSE("GPL"); + +#define HASH_SIZE 1024 + +/* The states for a job */ +#define FETAL 1 /* being born, not ready for attachments yet */ +#define RUNNING 2 /* Running job */ +#define STOPPED 3 /* Stopped job */ +#define ZOMBIE 4 /* Dead job */ + +/* Job creation tags for the job HID (host ID) */ +#define DISABLED 0xffffffff /* New job creation disabled */ +#define LOCAL 0x0 /* Only creating local sys jobs */ + + +#ifdef __BIG_ENDIAN +#define iptr_hid(ll) ((u32 *)&(ll)) +#define iptr_sid(ll) (((u32 *)(&(ll) + 1)) - 1) +#else /* __LITTLE_ENDIAN */ +#define iptr_hid(ll) (((u32 *)(&(ll) + 1)) - 1) +#define iptr_sid(ll) ((u32 *)&(ll)) +#endif /* __BIG_ENDIAN */ + +#define jid_hash(ll) (*(iptr_sid(ll)) % HASH_SIZE) + + +/* Job info entry for member tasks */ +struct job_attach { + struct task_struct *task; /* task we are attaching to job */ + struct pagg *pagg; /* our pagg entry in the task */ + struct job_entry *job; /* the job we are attaching task to */ + struct list_head entry; /* list stuff */ +}; + +struct job_waitinfo { + int status; /* For tasks waiting on job exit */ +}; + +struct job_csainfo { + u64 corehimem; /* Accounting - highpoint, phys mem */ + u64 virthimem; /* Accounting - highpoint, virt mem */ + struct file *acctfile; /* The accounting file for job */ +}; + +/* Job table entry type */ +struct job_entry { + u64 jid; /* Our job ID */ + int refcnt; /* Number of tasks attached to job */ + int state; /* State of job - RUNNING,... */ + struct rw_semaphore sem; /* lock for the job */ + uid_t user; /* user that owns the job */ + time_t start; /* When the job began */ + struct job_csainfo csa; /* CSA accounting info */ + wait_queue_head_t zombie; /* queue last task - during wait */ + wait_queue_head_t wait; /* queue of tasks waiting on job */ + int waitcnt; /* Number of tasks waiting on job */ + struct job_waitinfo waitinfo; /* Status info for waiting tasks */ + struct list_head attached; /* List of attached tasks */ + struct list_head entry; /* List of other jobs - same hash */ +}; + + +/* Job container tables */ +static struct list_head job_table[HASH_SIZE]; +static int job_table_refcnt = 0; +static DECLARE_RWSEM(job_table_sem); + + +/* Accounting subscriber list */ +static struct job_acctmod *acct_list[JOB_ACCT_COUNT]; +static DECLARE_RWSEM(acct_list_sem); + + +/* Host ID for the localhost */ +static u32 jid_hid = DISABLED; + +static char *hid = NULL; +MODULE_PARM(hid, "s"); + +/* Function prototypes */ +static int job_sys_create(struct job_create *); +static int job_sys_getjid(struct job_getjid *); +static int job_sys_waitjid(struct job_waitjid *); +static int job_sys_killjid(struct job_killjid *); +static int job_sys_getjidcnt(struct job_jidcnt *); +static int job_sys_getjidlst(struct job_jidlst *); +static int job_sys_getpidcnt(struct job_pidcnt *); +static int job_sys_getpidlst(struct job_pidlst *); +static int job_sys_getuser(struct job_user *); +static int job_sys_getprimepid(struct job_primepid *); +static int job_sys_sethid(struct job_sethid *); +static int job_sys_detachjid(struct job_detachjid *); +static int job_sys_detachpid(struct job_detachpid *); +static int job_attach(struct task_struct *, struct pagg *, void *); +static void job_detach(struct task_struct *, struct pagg *); +static struct job_entry *job_getjob(u64 jid); +static int job_syscall(unsigned int, unsigned long); + +u64 job_getjid(struct task_struct *); + +int job_ioctl(struct inode *, struct file *, unsigned int, unsigned long); + +/* Job container kernel pagg entry */ +static struct pagg_hook pagg_hook = { + .module = THIS_MODULE, + .name = PAGG_JOB, + .data = &job_table, + .init = NULL, + .entry = LIST_HEAD_INIT(pagg_hook.entry), + .attach = job_attach, + .detach = job_detach, + .exec = NULL, +}; + +/* proc dir entry */ +struct proc_dir_entry *job_proc_entry; + +/* file operations for proc file */ +static struct file_operations job_file_ops = { + .owner = THIS_MODULE, + .ioctl = job_ioctl +}; + +#ifdef DEBUG + +#define DBG_PRINTINIT(s) \ + char *dbg_fname = s + +#define DBG_PRINTENTRY() \ +do { \ + printk(KERN_DEBUG "job: %s: entry\n", dbg_fname); \ +} while(0) + +#define DBG_PRINTEXIT(c) \ +do { \ + printk(KERN_DEBUG "job: %s: exit, code = %d\n", dbg_fname, c); \ +} while(0) + +/* write lock semaphore */ +#define JOB_WLOCK(l) \ +do { \ + printk(KERN_DEBUG "job: wlock = %p\n", l); \ + down_write(l); \ +} while(0); + +/* write unlock semaphore */ +#define JOB_WUNLOCK(l) \ +do { \ + printk(KERN_DEBUG "job: wunlock = %p\n", l); \ + up_write(l); \ +} while(0); + +/* read lock semaphore */ +#define JOB_RLOCK(l) \ +do { \ + printk(KERN_DEBUG "job: rlock = %p\n", l); \ + down_read(l); \ +} while(0); + +/* read unlock semaphore */ +#define JOB_RUNLOCK(l) \ +do { \ + printk(KERN_DEBUG "job: runlock = %p\n", l); \ + up_read(l); \ +} while(0); + + +#else /* #ifdef DEBUG */ + +#define DBG_PRINTINIT(s) + +#define DBG_PRINTENTRY() \ +do { \ +} while(0) + +#define DBG_PRINTEXIT(c) \ +do { \ +} while(0) + +/* write lock semaphore */ +#define JOB_WLOCK(l) \ +do { \ + down_write(l); \ +} while(0); + +/* write unlock semaphore */ +#define JOB_WUNLOCK(l) \ +do { \ + up_write(l); \ +} while(0); + +/* read lock semaphore */ +#define JOB_RLOCK(l) \ +do { \ + down_read(l); \ +} while(0); + +/* read unlock semaphore */ +#define JOB_RUNLOCK(l) \ +do { \ + up_read(l); \ +} while(0); + + +#endif /* #ifdef DEBUG */ + + + +/* + * job__getjob + * + * Given a jid value, find the entry in the job_table and return a pointer + * to the job entry or NULL if not found. + * + * You should normally JOB_RLOCK the job_table_sem before calling this + * function. + */ +struct job_entry * +job_getjob(u64 jid) +{ + struct list_head *entry = NULL; + struct job_entry *tjob = NULL; + struct job_entry *job = NULL; + + list_for_each(entry, &job_table[ jid_hash(jid) ]) { + tjob = list_entry(entry, struct job_entry, entry); + if (tjob->jid == jid) { + job = tjob; + break; + } + } + return job; +} + + +/* + * job_attach + * + * Attach the task to the job specified in the target data (old_data). + * This function will add the task to the list of attached tasks for the job. + * In addition, a link from the task to the job is created and added to the + * task via the data pointer reference. + * + * The process that owns the target data should be at least read locked (using + * down_read(&task->pagg_sem)) during this call. This help in ensuring + * that the job cannot be removed since at least one process will + * still be referencing the job (the one owning the target_data). + * + * It is expected that this function will be called from within the + * pagg_attach() function in the kernel, when forking (do_fork) a child + * process represented by task. + * + * If this function is called form some other point, then it is possible that + * task and data could be altered while going through this function. In such + * a case, the caller should also lock the pagg list for the task + * task_struct. + * + * the function returns 0 upon success, and -1 upon failure. + */ +static int +job_attach(struct task_struct *task, struct pagg *new_pagg, + void *old_data) +{ + struct job_entry *job = ((struct job_attach *)old_data)->job; + struct job_attach *attached = NULL; + int errcode = 0; + DBG_PRINTINIT("job_attach"); + + DBG_PRINTENTRY(); + + /* + * Lock the job for writing. The task owning target_data has its + * pagg_sem locked, so we know there is at least one active reference + * to the job - therefore, it cannot have been removed before we + * have gotten this write lock established. + */ + JOB_WLOCK(&job->sem); + + if (job->state == ZOMBIE) { + /* If the job is a zombie (dying), bail out of the attach */ + printk(KERN_WARNING "Attach task(pid=%d) to job" + " failed - job is ZOMBIE\n", + task->pid); + errcode = -EINPROGRESS; + JOB_WUNLOCK(&job->sem); + goto error_return; + } + + + /* Allocate memory that we will need */ + + attached = (struct job_attach *)kmalloc(sizeof(struct job_attach), + GFP_KERNEL); + if (!attached) { + /* error */ + printk(KERN_ERR "Attach task(pid=%d) to job" + " failed on memory error in kernel\n", + task->pid); + errcode = -ENOMEM; + goto error_return; + } + + + attached->task = task; + attached->pagg = new_pagg; + attached->job = job; + new_pagg->data = (void *)attached; + list_add_tail(&attached->entry, &job->attached); + ++job->refcnt; + + JOB_WUNLOCK(&job->sem); + + DBG_PRINTEXIT(0); + return 0; + +error_return: + DBG_PRINTEXIT(errcode); + if (attached) kfree(attached); + return errcode; +} + + +/* + * job_detach + * + * Detach the task from the job attached to via the pagg reference. + * This function will remove the task from the list of attached tasks for the + * job specified via the pagg pointer. In addition, the link to the job + * provided via the data pointer will also be removed. + * + * The pagg_list should be write locked for task before entering + * this function (using down_write(&task->pagg_sem)). + * + * the function returns 0 uopn success, and -1 uopn failure. + */ +static void +job_detach(struct task_struct *task, struct pagg *pagg) +{ + struct job_attach *attached = ((struct job_attach *)(pagg->data)); + struct job_entry *job = attached->job; + DBG_PRINTINIT("job_detach"); + + DBG_PRINTENTRY(); + + /* + * Obtain the lock on the the job_table_sem and the job->sem for + * this job. + */ + JOB_WLOCK(&job_table_sem); + JOB_WLOCK(&job->sem); + + job->refcnt--; + list_del(&attached->entry); + pagg->data = NULL; + kfree(attached); + + if (job->refcnt == 0) { + int waitcnt; + + list_del(&job->entry); + --job_table_refcnt; + + /* + * The job is removed from the job_table. + * We can remove the job_table_sem now since + * nobody can access the job via the table. + */ + JOB_WUNLOCK(&job_table_sem); + + job->state = ZOMBIE; + job->waitinfo.status = task->exit_code; + + waitcnt = job->waitcnt; + + /* + * Release the job semaphore. You cannot hold + * this lock if you want the wakeup to work + * properly. + */ + JOB_WUNLOCK(&job->sem); + + if (waitcnt > 0) { + wake_up_interruptible(&job->wait); + wait_event(job->zombie, job->waitcnt == 0); + } + + /* + * Job is exiting, all processes waiting for job to exit + * have been notified. Now we call the accounting + * subscribers. + */ + +#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE) + /* - CSA accounting */ + if (acct_list[JOB_ACCT_CSA]) { + struct job_acctmod *acct = acct_list[JOB_ACCT_CSA]; + if (acct->module) { + if (try_module_get(acct->module) == 0) { + printk(KERN_WARNING + "job_detach: Tried to get non-living acct module\n"); + } + } + if (acct->jobend) { + int res = 0; + struct job_csa csa; + + csa.job_id = job->jid; + csa.job_uid = job->user; + csa.job_start = job->start; + csa.job_corehimem = job->csa.corehimem; + csa.job_virthimem = job->csa.virthimem; + csa.job_acctfile = job->csa.acctfile; + + res = acct->jobend(JOB_EVENT_END, + &csa); + if (res) { + printk(KERN_WARNING + "job_detach: CSA -" + " jobend failed.\n"); + } + } + if (acct->module) + module_put(acct->module); + } else { + printk(KERN_WARNING "job_detach: CSA - attempt" + " to lock CSA module failed.\n"); + } +#endif /* CONFIG_CSA || defined(CONFIG_CSA_MODULE) */ + + + /* + * Every process attached or waiting on this job should be + * detached and finished waiting, so now we can free the + * memory for the job. + */ + kfree(job); + + } else { + /* This is case where job->refcnt was greater than 1, so + * we were not going to delete the job after the detach. + * Therefore, only the job->sem is being held - the + * job_table_sem was released earlier. + */ + JOB_WUNLOCK(&job->sem); + JOB_WUNLOCK(&job_table_sem); + } + + DBG_PRINTEXIT(0); + + return; +} + +/* + * job_sys_create + * + * This function is used to create a new job and attache the calling process + * to that new job. + * + * Returns 0 on success, and negative on failure (negative errno value). + */ +static int +job_sys_create(struct job_create *create_args) +{ + struct job_create create; + struct job_entry *job = NULL; + struct job_attach *attached = NULL; + struct pagg *pagg = NULL; + struct pagg *old_pagg = NULL; + int errcode = 0; + DBG_PRINTINIT("job_sys_create"); + + DBG_PRINTENTRY(); + + /* + * if the job ID - host ID segment is set to DISABLED, we will + * not be creating new jobs. We don't mark it as an error, but + * the jid value returned will be 0. + */ + if (jid_hid == DISABLED) { + errcode = 0; + goto error_return; + } + + +#if 0 /* XXX - Use if capable is not present */ + if (current->euid != 0) + return -EPERM; +#else + if (!capable(CAP_SYS_RESOURCE)) { + errcode = -EPERM; + goto error_return; + } +#endif + if (!create_args) { + errcode = -EINVAL; + goto error_return; + } + + if (copy_from_user(&create, create_args, sizeof(create))) { + errcode = -EFAULT; + goto error_return; + } + + /* + * Allocate some of the memory we might need, before we start + * locking + */ + + attached = (struct job_attach *)kmalloc(sizeof(struct job_attach), GFP_KERNEL); + if (!attached) { + /* error */ + errcode = -ENOMEM; + goto error_return; + } + + job = (struct job_entry *)kmalloc(sizeof(struct job_entry), GFP_KERNEL); + if (!job) { + /* error */ + errcode = -ENOMEM; + goto error_return; + } + + /* We keep the old pagg around in case we need it in an error condition. + * If, for example, a job_getjob call fails because the requested JID is + * already in use, we don't want to detach that job. Having this ability + * is complicated by the locking. + */ + down_write(¤t->pagg_sem); /* write lock pagg list */ + old_pagg = pagg_get(current, pagg_hook.name); + + /* + * Lock the job_table and add the pointers for the new job. + * Since the job is new, we won't need to lock the job. + */ + JOB_WLOCK(&job_table_sem); + + /* + * Determine if create should use specified JID or one that is + * generated. + */ + if (create.jid != 0) { + /* We use the specified JID value */ + + if (job_getjob(create.jid)) { + /* JID already in use, bail */ + /* error_return doesn't do JOB_WUNLOCK */ + JOB_WUNLOCK(&job_table_sem); + /* we haven't allocated a new pagg yet so error_return won't unlock + * this. We'll unlock here */ + up_write(¤t->pagg_sem); + errcode = -EBUSY; + /* error_return doesn't touch old_pagg so we don't detach */ + goto error_return; + } else { + /* Using specifiec JID */ + job->jid = create.jid; + } + + } else { + + /* We generate a new JID value */ + *(iptr_hid(job->jid)) = jid_hid; + *(iptr_sid(job->jid)) = current->pid; + } + + pagg = pagg_alloc(current, &pagg_hook); + if (!pagg) { + /* error */ + up_write(¤t->pagg_sem); /* write unlock pagg list */ + errcode = -ENOMEM; + goto error_return; + } + + /* Initialize job entry values & lists */ + job->refcnt = 1; + job->user = create.user; + job->start = jiffies; + job->csa.corehimem = 0; + job->csa.virthimem = 0; + job->csa.acctfile = NULL; + job->state = RUNNING; + init_rwsem(&job->sem); + INIT_LIST_HEAD(&job->attached); + list_add_tail(&attached->entry, &job->attached); + init_waitqueue_head(&job->wait); + init_waitqueue_head(&job->zombie); + job->waitcnt = 0; + job->waitinfo.status = 0; + + /* set link from entry in attached list to task and job entry */ + attached->task = current; + attached->job = job; + attached->pagg = pagg; + pagg->data = (void *)attached; + + /* Insert new job into front of chain list */ + list_add_tail(&job->entry, &job_table[ jid_hash(job->jid) ]);; + ++job_table_refcnt; + + JOB_WUNLOCK(&job_table_sem); + /* At this point, the possible error conditions where we would need the + * old pagg are gone. So we can remove it. We remove after we unlock + * because the pagg hook detach function does job table lock of its own. + */ + if (old_pagg) { + /* + * Detaching paggs for jobs never has a failure case, + * so we don't need to worry about error codes. + */ + old_pagg->hook->detach(current, old_pagg); + pagg_free(old_pagg); + } + up_write(¤t->pagg_sem); /* write unlock pagg list */ + + /* Issue callbacks into accounting subscribers */ + +#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE) + /* - CSA subscriber */ + if (acct_list[JOB_ACCT_CSA]) { + struct job_acctmod *acct = acct_list[JOB_ACCT_CSA]; + if (acct->module) { + if (try_module_get(acct->module) == 0) { + printk(KERN_WARNING + "job_sys_create: Tried to get non-living acct module\n"); + } + } + if (acct->jobstart) { + int res; + struct job_csa csa; + + csa.job_id = job->jid; + csa.job_uid = job->user; + csa.job_start = job->start; + csa.job_corehimem = job->csa.corehimem; + csa.job_virthimem = job->csa.virthimem; + csa.job_acctfile = job->csa.acctfile; + + res = acct->jobstart(JOB_EVENT_START, &csa); + if (res < 0) { + printk(KERN_WARNING "job_sys_create: CSA -" + " jobstart failed.\n"); + } + } + if (acct->module) + module_put(acct->module); + } +#endif /* CONFIG_CSA || defined(CONFIG_CSA_MODULE) */ + + + create.r_jid = job->jid; + if (copy_to_user(create_args, &create, sizeof(create))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + DBG_PRINTEXIT(0); + return 0; + +error_return: + DBG_PRINTEXIT(errcode); + if (attached) kfree(attached); + if (job) kfree(job); + if (pagg) { + pagg->hook->detach(current, pagg); /* detach the pagg */ + pagg_free(pagg); + /* This was locked at pagg_alloc call */ + up_write(¤t->pagg_sem); /* write unlock pagg list */ + } + create.r_jid = 0; + if (copy_to_user(create_args, &create, sizeof(create))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + return errcode; +} + + +/* + * job_sys_getjid + * + * Function retrieves the job ID (jid) for the specified process (pid). + * + * returns 0 on success, negative errno value on exit. + */ +static int +job_sys_getjid(struct job_getjid *getjid_args) +{ + struct job_getjid getjid; + int errcode = 0; + struct task_struct *task; + DBG_PRINTINIT("job_sys_getjid"); + + DBG_PRINTENTRY(); + + if (copy_from_user(&getjid, getjid_args, sizeof(getjid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + /* lock the tasklist until we grab the specific task */ + read_lock(&tasklist_lock); + + if (getjid.pid == current->pid) { + task = current; + } else { + task = find_task_by_pid(getjid.pid); + } + if (task) { + get_task_struct(task); /* Ensure the task doesn't vanish on us */ + read_unlock(&tasklist_lock); /* unlock the task list */ + getjid.r_jid = job_getjid(task); + put_task_struct(task); /* We're done accessing the task */ + if (getjid.r_jid == 0) { + errcode = -ENODATA; + } + } else { + read_unlock(&tasklist_lock); + getjid.r_jid = 0; + errcode = -ESRCH; + } + + + DBG_PRINTEXIT(errcode); + if (copy_to_user(getjid_args, &getjid, sizeof(getjid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + return errcode; +} + + +/* + * job_sys_waitjid + * + * This job allows a process to wait until a job exits & it returns the + * status information for the last process to exit the job. + * + * On success returns 0, failure it returns the negative errno value. + */ +static int +job_sys_waitjid(struct job_waitjid *waitjid_args) +{ + struct job_waitjid waitjid; + struct job_entry *job; + int retcode = 0; + DBG_PRINTINIT("job_sys_waitjid"); + + DBG_PRINTENTRY(); + + if (copy_from_user(&waitjid, waitjid_args, sizeof(waitjid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + + waitjid.r_jid = waitjid.stat = 0; + + if (waitjid.options != 0) { + retcode = -EINVAL; + goto general_return; + } + + /* Lock the job table so that the current jobs don't change */ + JOB_RLOCK(&job_table_sem); + + + if ((job = job_getjob(waitjid.jid)) == NULL ) { + JOB_RUNLOCK(&job_table_sem); + retcode = -ENODATA; + goto general_return; + } + + /* + * We got the job we need, we can release the job_table_sem + */ + JOB_WLOCK(&job->sem); + JOB_RUNLOCK(&job_table_sem); + + ++job->waitcnt; + + JOB_WUNLOCK(&job->sem); + + /* We shouldn't hold any locks at this point! The increment of the + * jobs waitcnt will ensure that the job is not removed without + * first notifying this current task */ + retcode = wait_event_interruptible(job->wait, + job->refcnt == 0); + + if (!retcode) { + /* + * This data is static at this point, we will + * not need a lock to read it. + */ + waitjid.stat = job->waitinfo.status; + waitjid.r_jid = job->jid; + } + + JOB_WLOCK(&job->sem); + --job->waitcnt; + + if (job->waitcnt == 0) { + JOB_WUNLOCK(&job->sem); + + /* + * We shouldn't hold any locks at this point! Else, the + * last process in the job will not be able to remove the + * job entry. + * + * That process is stuck waiting for this wake_up, so the + * job shouldn't disappear until after this function call. + * The job entry is not longer in the job table, so no + * other process can get to the entry to foul things up. + */ + wake_up(&job->zombie); + } else { + JOB_WUNLOCK(&job->sem); + } + +general_return: + + DBG_PRINTEXIT(retcode); + if (copy_to_user(waitjid_args, &waitjid, sizeof(waitjid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + return retcode; +} + + +/* + * job_sys_killjid + * + * This functions allows a signal to be sent to all processes in a job. + * + * returns 0 on success, negative of errno on failure. + */ +static int +job_sys_killjid(struct job_killjid *killjid_args) +{ + struct job_killjid killjid; + struct job_entry *job; + struct list_head *attached_entry; + struct siginfo info; + int retcode = 0; + DBG_PRINTINIT("job_sys_killjid"); + + DBG_PRINTENTRY(); + + if (copy_from_user(&killjid, killjid_args, sizeof(killjid))) { + retcode = -EFAULT; + goto cleanup_0locks_return; + } + + killjid.r_val = -1; + + /* A signal of zero is really a status check and is handled as such + * by send_sig_info. So we have < 0 instead of <= 0 here. + */ + if (killjid.sig < 0) { + retcode = -EINVAL; + goto cleanup_0locks_return; + } + + JOB_RLOCK(&job_table_sem); + job = job_getjob(killjid.jid); + if (!job) { + /* Job not found, copy back data & bail with error */ + retcode = -ENODATA; + goto cleanup_1locks_return; + } + + JOB_RLOCK(&job->sem); + + /* + * Check capability to signal job. The signaling user must be + * the owner of the job or have CAP_SYS_RESOURCE capability. + */ +#if 0 /* Use this if not capability is available */ + if (current->uid != 0) { +#else + if (!capable(CAP_SYS_RESOURCE)) { +#endif + if (current->uid != job->user) { + retcode = -EPERM; + goto cleanup_2locks_return; + } + } + + info.si_signo = killjid.sig; + info.si_errno = 0; + info.si_code = SI_USER; + info.si_pid = current->pid; + info.si_uid = current->uid; + + list_for_each(attached_entry, &job->attached) { + int err; + struct job_attach *attached; + + attached = list_entry(attached_entry, struct job_attach, entry); + err = send_sig_info(killjid.sig, &info, + attached->task); + if (err != 0) { + /* + * XXX - the "prime" process, or initiating process + * for the job may not be owned by the user. So, + * we would get an error in this case. However, we + * ignore the error for that specific process - it + * should exit when all the child processes exit. It + * should ignore all signals from the user. + * + */ + if (attached->entry.prev != &job->attached) { + retcode = err; + } + } + + } + +cleanup_2locks_return: + JOB_RUNLOCK(&job->sem); +cleanup_1locks_return: + JOB_RUNLOCK(&job_table_sem); +cleanup_0locks_return: + killjid.r_val = retcode; + + DBG_PRINTEXIT(retcode); + if (copy_to_user(killjid_args, &killjid, sizeof(killjid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + return retcode; +} + + +/* + * job_sys_getjidcnt + * + * Retun the number of jobs currently on the system. + * + * returns 0 on success & it always succeeds. + */ +static int +job_sys_getjidcnt(struct job_jidcnt *jidcnt_args) +{ + struct job_jidcnt jidcnt; + DBG_PRINTINIT("job_sys_getjidcnt"); + + DBG_PRINTENTRY(); + + /* read lock might be overdoing it in this case */ + JOB_RLOCK(&job_table_sem); + jidcnt.r_val = job_table_refcnt; + JOB_RUNLOCK(&job_table_sem); + + DBG_PRINTEXIT(0); + + if (copy_to_user(jidcnt_args, &jidcnt, sizeof(jidcnt))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + return 0; +} + + +/* + * job_sys_getjidlst + * + * Get the list of all jids currently on the system (limited by the number of + * jobs there are and the number you say you can accept. + */ +static int +job_sys_getjidlst(struct job_jidlst *jidlst_args) +{ + struct job_jidlst jidlst; + u64 *jid; + struct job_entry *job; + struct list_head *job_entry; + int i; + int count; + DBG_PRINTINIT("job_sys_getjidlst"); + + DBG_PRINTENTRY(); + + if (copy_from_user(&jidlst, jidlst_args, sizeof(jidlst))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + + if (jidlst.r_val == 0) { + DBG_PRINTEXIT(0); + return 0; + } + + jid = (u64 *)kmalloc(sizeof(u64)*jidlst.r_val, GFP_KERNEL); + if (!jid) { + jidlst.r_val = 0; + DBG_PRINTEXIT(-ENOMEM); + if (copy_to_user(jidlst_args, &jidlst, sizeof(jidlst))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + return -ENOMEM; + } + + + count = 0; + JOB_RLOCK(&job_table_sem); + for (i = 0; i < HASH_SIZE && count < jidlst.r_val; i++) { + list_for_each(job_entry, &job_table[i]) { + job = list_entry(job_entry, struct job_entry, entry); + jid[count++] = job->jid; + if (count == jidlst.r_val) { + break; + } + } + } + JOB_RUNLOCK(&job_table_sem); + + DBG_PRINTEXIT(0); + jidlst.r_val = count; + + for (i = 0; i < count; i++) { + if (copy_to_user(jidlst.jid+i, &jid[i], sizeof(u64))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + } + + kfree(jid); + + if (copy_to_user(jidlst_args, &jidlst, sizeof(jidlst))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + return 0; +} + + +/* + * job_sys_getpidcnt + * + * Get the number of processes currently attached to a specific job. + * + * returns 0 on success, or negative errno value on failure. + */ +static int +job_sys_getpidcnt(struct job_pidcnt *pidcnt_args) +{ + struct job_pidcnt pidcnt; + struct job_entry *job; + int retcode = 0; + DBG_PRINTINIT("job_sys_getpidcnt"); + + DBG_PRINTENTRY(); + + if (copy_from_user(&pidcnt, pidcnt_args, sizeof(pidcnt))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + pidcnt.r_val = 0; + + JOB_RLOCK(&job_table_sem); + job = job_getjob(pidcnt.jid); + if (!job) { + retcode = -ENODATA; + } else { + /* Read lock might be overdoing it for this case */ + JOB_RLOCK(&job->sem); + pidcnt.r_val = job->refcnt; + JOB_RUNLOCK(&job->sem); + } + JOB_RUNLOCK(&job_table_sem); + + DBG_PRINTEXIT(retcode); + + if (copy_to_user(pidcnt_args, &pidcnt, sizeof(pidcnt))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + return retcode; +} + +/* + * job_getpidlst + * + * Get the list of processes (pids) currently attached to the specified + * job. The number of processes provided is limited by the number the user + * specivies that they can accept (have memory for) and the number currently + * attached. + * + * returns 0 on success, negative errno value on failure. + */ +static int +job_sys_getpidlst(struct job_pidlst *pidlst_args) +{ + struct job_pidlst pidlst; + struct job_entry *job; + struct job_attach *attached; + struct list_head *attached_entry; + pid_t *pid; + int max; + int i; + DBG_PRINTINIT("job_sys_getpidlst"); + + DBG_PRINTENTRY(); + + if (copy_from_user(&pidlst, pidlst_args, sizeof(pidlst))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + + if (pidlst.r_val == 0) { + DBG_PRINTEXIT(0); + return 0; + } + + max = pidlst.r_val; + pidlst.r_val = 0; + pid = (pid_t *)kmalloc(sizeof(pid_t)*max, GFP_KERNEL); + if (!pid) { + DBG_PRINTEXIT(-ENOMEM); + if (copy_to_user(pidlst_args, &pidlst, sizeof(pidlst))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + return -ENOMEM; + } + + JOB_RLOCK(&job_table_sem); + + job = job_getjob(pidlst.jid); + if (!job) { + + JOB_RUNLOCK(&job_table_sem); + + DBG_PRINTEXIT(-ENODATA); + if (copy_to_user(pidlst_args, &pidlst, sizeof(pidlst))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + return -ENODATA; + } else { + + JOB_RLOCK(&job->sem); + JOB_RUNLOCK(&job_table_sem); + + i = 0; + list_for_each(attached_entry, &job->attached) { + if (i == max) { + break; + } + attached = list_entry(attached_entry, struct job_attach, + entry); + pid[i++] = attached->task->pid; + } + pidlst.r_val = i; + + JOB_RUNLOCK(&job->sem); + } + + for (i = 0; i < pidlst.r_val; i++) { + if (copy_to_user(pidlst.pid+i, &pid[i], sizeof(pid_t))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + } + kfree(pid); + + DBG_PRINTEXIT(0); + copy_to_user(pidlst_args, &pidlst, sizeof(pidlst)); + return 0; +} + + +/* + * job_sys_getuser + * + * Get the uid of the user that owns the job. + * + * returns 0 on success, returns negative errno on failure. + */ +static int +job_sys_getuser(struct job_user *user_args) +{ + struct job_entry *job; + struct job_user user; + int retcode = 0; + DBG_PRINTINIT("job_sys_getuser"); + + DBG_PRINTENTRY(); + + if (copy_from_user(&user, user_args, sizeof(user))) { + DBG_PRINTEXIT(-EFAULT); + return(-EFAULT); + } + + user.r_user = 0; + + JOB_RLOCK(&job_table_sem); + + job = job_getjob(user.jid); + if (!job) { + retcode = -ENODATA; + } else { + JOB_RLOCK(&job->sem); + user.r_user = job->user; + JOB_RUNLOCK(&job->sem); + } + + JOB_RUNLOCK(&job_table_sem); + + if (copy_to_user(user_args, &user, sizeof(user))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + DBG_PRINTEXIT(retcode); + return retcode; +} + + +/* + * job_sys_getprimepid + * + * Get the primary process - the oldest process in the job. + * + * returns 0 on success, negative errno on failure. + */ +static int +job_sys_getprimepid(struct job_primepid *primepid_args) +{ + struct job_primepid primepid; + struct job_entry *job = NULL; + struct job_attach *attached = NULL; + int retcode = 0; + DBG_PRINTINIT("getprimepid"); + + DBG_PRINTENTRY(); + + if (copy_from_user(&primepid, primepid_args, sizeof(primepid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + primepid.r_pid = 0; + + JOB_RLOCK(&job_table_sem); + + job = job_getjob(primepid.jid); + if (!job) { + JOB_RUNLOCK(&job_table_sem); + /* Job not found, return INVALID VALUE */ + DBG_PRINTEXIT(-ENODATA); + return -ENODATA; + } + + /* + * Job found, now look at first pid entry in the + * attached list. + */ + JOB_RLOCK(&job->sem); + JOB_RUNLOCK(&job_table_sem); + if (list_empty(&job->attached)) { + retcode = -ESRCH; + primepid.r_pid = 0; + } else { + attached = list_entry(job->attached.next, struct job_attach, entry); + if (!attached->task) { + retcode = -ESRCH; + } else { + primepid.r_pid = attached->task->pid; + } + } + JOB_RUNLOCK(&job->sem); + + if (copy_to_user(primepid_args, &primepid, sizeof(primepid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + DBG_PRINTEXIT(retcode); + return retcode; +} + + +/* + * job_sys_sethid + * + * This function is used to set the host ID segment for the job IDs (jid). + * If this does not get set, then the jids upper 32 bits will be set to + * 0 and the jid cannot be used reliably in a cluster environment. + * + * returns -errno value on fail, 0 on success + */ +static int +job_sys_sethid(struct job_sethid *sethid_args) +{ + struct job_sethid sethid; + int errcode = 0; + DBG_PRINTINIT("job_sys_sethid"); + + DBG_PRINTENTRY(); + + if (copy_from_user(&sethid, sethid_args, sizeof(sethid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + if (!capable(CAP_SYS_RESOURCE)) { + errcode = -EPERM; + sethid.r_hid = 0; + goto cleanup_return; + } + + /* + * Set job_table_sem, so no jobs can be deleted while doing + * this operation. + */ + JOB_WLOCK(&job_table_sem); + + sethid.r_hid = jid_hid = sethid.hid; + + JOB_WUNLOCK(&job_table_sem); + +cleanup_return: + DBG_PRINTEXIT(errcode); + if (copy_to_user(sethid_args, &sethid, sizeof(sethid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + return errcode; +} + + +/* + * job_sys_detachjid + * + * This function is detach all the processes from a job, but allows the + * processes to continue running. You need CAP_SYS_RESOURCE capability + * for this to succeed. Since all processes will be detached, the job will + * exit. + * + * returns -errno value on fail, 0 on success + */ +static int +job_sys_detachjid(struct job_detachjid *detachjid_args) +{ + struct job_detachjid detachjid; + struct job_entry *job; + struct list_head *entry; + int count; + int errcode = 0; + struct task_struct *task; + struct pagg *pagg; + + DBG_PRINTINIT("job_sys_detachjid"); + + DBG_PRINTENTRY(); + + if (copy_from_user(&detachjid, detachjid_args, sizeof(detachjid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + detachjid.r_val = 0; + + if (!capable(CAP_SYS_RESOURCE)) { + errcode = -EPERM; + goto cleanup_return; + } + + /* + * Set job_table_sem, so no jobs can be deleted while doing + * this operation. + */ + JOB_WLOCK(&job_table_sem); + + job = job_getjob(detachjid.jid); + + if (job) { + + JOB_WLOCK(&job->sem); + + /* Mark job as ZOMBIE so no new processes can attach to it */ + job->state = ZOMBIE; + + count = job->refcnt; + + /* Okay, no new processes can attach to the job. We can + * release the locks on the job_table and job since the only + * way for the job to change now is for tasks to detach and + * the job to be removed. And this is what we want to happen + */ + JOB_WUNLOCK(&job_table_sem); + JOB_WUNLOCK(&job->sem); + + + /* Walk through list of attached tasks and unset the + * pagg entries. + * + * We don't test with list_empty because that actually means NO tasks + * left rather than one task. If we used !list_empty or list_for_each, + * we could reference memory freed by the pagg hook detach function + * (job_detach). + * + * We know there is only one task left when job->attached.next and + * job->attached.prev both point to the same place. + */ + while (job->attached.next != job->attached.prev) { + entry = job->attached.next; + + task = (list_entry(entry, struct job_attach, entry))->task; + pagg = (list_entry(entry, struct job_attach, entry))->pagg; + + down_write(&task->pagg_sem); /* write lock pagg list */ + pagg->hook->detach(task, pagg); + pagg_free(pagg); + up_write(&task->pagg_sem); /* write unlock pagg list */ + + } + /* At this point, there is only one task left */ + + entry = job->attached.next; + + task = (list_entry(entry, struct job_attach, entry))->task; + pagg = (list_entry(entry, struct job_attach, entry))->pagg; + + down_write(&task->pagg_sem); /* write lock pagg list */ + pagg->hook->detach(task, pagg); + pagg_free(pagg); + up_write(&task->pagg_sem); /* write unlock pagg list */ + + detachjid.r_val = count; + + } else { + errcode = -ENODATA; + JOB_WUNLOCK(&job_table_sem); + } + +cleanup_return: + DBG_PRINTEXIT(errcode); + if (copy_to_user(detachjid_args, &detachjid, sizeof(detachjid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + return errcode; +} + + +/* + * job_sys_detachpid + * + * This function is detach a process from the job it is attached too, + * but allows the processes to continue running. You need + * CAP_SYS_RESOURCE capability for this to succeed. + * + * returns -errno value on fail, 0 on success + */ +static int +job_sys_detachpid(struct job_detachpid *detachpid_args) +{ + struct job_detachpid detachpid; + struct task_struct *task; + struct pagg *pagg; + int errcode = 0; + DBG_PRINTINIT("job_sys_detachpid"); + + DBG_PRINTENTRY(); + + if (copy_from_user(&detachpid, detachpid_args, sizeof(detachpid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + + detachpid.r_jid = 0; + + if (!capable(CAP_SYS_RESOURCE)) { + errcode = -EPERM; + goto cleanup_return; + } + + /* Lock the task list while we find a specific task */ + read_lock(&tasklist_lock); + task = find_task_by_pid(detachpid.pid); + if (!task) { + errcode = -ESRCH; + /* We need to unlock the tasklist here too or the lock is held forever */ + read_unlock(&tasklist_lock); + goto cleanup_return; + } + + /* We have a valid task now */ + get_task_struct(task); /* Ensure the task doesn't vanish on us */ + read_unlock(&tasklist_lock); /* Unlock the tasklist */ + down_write(&task->pagg_sem); /* write lock pagg list */ + + pagg = pagg_get(task, pagg_hook.name); + if (pagg) { + detachpid.r_jid = ((struct job_attach *)pagg->data)->job->jid; + pagg->hook->detach(task, pagg); + pagg_free(pagg); + } else { + errcode = -ENODATA; + } + put_task_struct(task); /* Done accessing the task */ + up_write(&task->pagg_sem); /* write unlock pagg list */ + +cleanup_return: + DBG_PRINTEXIT(errcode); + if (copy_to_user(detachpid_args, &detachpid, sizeof(detachpid))) { + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; + } + return errcode; +} + + +/* + * job_register_acct + * + * This function is used by modules that are registering to provide job + * accounting services. + * + * returns -errno value on fail, 0 on success. + */ +int +job_register_acct(struct job_acctmod *am) +{ + DBG_PRINTINIT("job_register_acct"); + + DBG_PRINTENTRY(); + + if (!am) { + DBG_PRINTEXIT(-EINVAL); + return -EINVAL; /* error, invalid value */ + } + if (am->type < 0 || am->type > (JOB_ACCT_COUNT-1)) { + DBG_PRINTEXIT(-EINVAL); + return -EINVAL; /* error, invalid value */ + } + + JOB_WLOCK(&acct_list_sem); + if (acct_list[am->type] != NULL) { + JOB_WUNLOCK(&acct_list_sem); + DBG_PRINTEXIT(-EBUSY); + return -EBUSY; /* error, duplicate entry */ + } + + acct_list[am->type] = am; + JOB_WUNLOCK(&acct_list_sem); + DBG_PRINTEXIT(0); + return 0; +} + + +/* + * job_unregister_acct + * + * This is used by accounting modules to unregister with the job module as + * subscribers for job accounting information. + * + * Returns -errno on failure and 0 on success. + */ +int +job_unregister_acct(struct job_acctmod *am) +{ + DBG_PRINTINIT("job_unregister_acct"); + + DBG_PRINTENTRY(); + + if (!am) { + DBG_PRINTEXIT(-EINVAL); + return -EINVAL; /* error, invalid value */ + } + if (am->type < 0 || am->type > (JOB_ACCT_COUNT-1)) { + DBG_PRINTEXIT(-EINVAL); + return -EINVAL; /* error, invalid value */ + } + + JOB_WLOCK(&acct_list_sem); + if (acct_list[am->type] != am) { + JOB_WUNLOCK(&acct_list_sem); + DBG_PRINTEXIT(-EFAULT); + return -EFAULT; /* error, not matching entry */ + } + + acct_list[am->type] = NULL; + JOB_WUNLOCK(&acct_list_sem); + DBG_PRINTEXIT(0); + return 0; +} + +/* + * job_getjid + * + * This function will return the Job ID for the given task. If + * the task is not attached to a job, then 0 is returned. + * + */ +u64 job_getjid(struct task_struct *task) +{ + struct pagg *pagg = NULL; + struct job_entry *job = NULL; + u64 jid = 0; + DBG_PRINTINIT("job_getjid"); + + DBG_PRINTENTRY(); + + down_read(&task->pagg_sem); /* lock pagg list */ + pagg = pagg_get(task, pagg_hook.name); + if (pagg) { + job = ((struct job_attach *)pagg->data)->job; + JOB_RLOCK(&job->sem); + jid = job->jid; + JOB_RUNLOCK(&job->sem); + } + up_read(&task->pagg_sem); + + DBG_PRINTEXIT((int)jid); + return jid; +} + + +/* + * job_getacct + * + * This function is used by accounting subscribers to get accounting + * information about a job. + * + * The caller must supply the Job ID (jid) that specifies the job. The + * "type" argument indicates the type of accounting data to be returned. + * The data will be returned in the memory accessed via the data pointer + * argument. The data pointer is void so that this function interface + * can handle different types of accounting data. + */ +int job_getacct(u64 jid, int type, void *data) +{ + struct job_entry *job; + DBG_PRINTINIT("job_getacct"); + + DBG_PRINTENTRY(); + + if (!data) { + DBG_PRINTEXIT(-EINVAL); + return -EINVAL; + } + + if (!jid) { + DBG_PRINTEXIT(-EINVAL); + return -EINVAL; + } + + JOB_RLOCK(&job_table_sem); + job = job_getjob(jid); + if (!job) { + JOB_RUNLOCK(&job_table_sem); + DBG_PRINTEXIT(-ENODATA); + return -ENODATA; + } + + JOB_RLOCK(&job->sem); + JOB_RUNLOCK(&job_table_sem); + + switch (type) { +#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE) + case JOB_ACCT_CSA: + { + struct job_csa *csa = (struct job_csa *)data; + + csa->job_id = job->jid; + csa->job_uid = job->user; + csa->job_start = job->start; + csa->job_corehimem = job->csa.corehimem; + csa->job_virthimem = job->csa.virthimem; + csa->job_acctfile = job->csa.acctfile; + break; + } +#endif + default: + JOB_RUNLOCK(&job->sem); + DBG_PRINTEXIT(-EINVAL); + return -EINVAL; + break; + } + JOB_RUNLOCK(&job->sem); + DBG_PRINTEXIT(0); + return 0; +} + +/* + * job_setacct + * + * This function is used by accounting subscribers to set specific + * accounting information in the job (so that the job remembers it + * in relation to a specific job). + * + * The job is identified by the jid argument. The type indicates the + * type of accounting the information is associated with. The subfield + * is a bitmask that indicates exactly what subfields are to be changed. + * The data that is used to set the values is supplied by the data pointer. + * The data pointer is a void type so that the interface can be used for + * different types of accounting information. + */ +int job_setacct(u64 jid, int type, int subfield, void *data) +{ + struct job_entry *job; + DBG_PRINTINIT("job_setacct"); + + DBG_PRINTENTRY(); + + if (!data) { + DBG_PRINTEXIT(-EINVAL); + return -EINVAL; + } + + if (!jid) { + DBG_PRINTEXIT(-EINVAL); + return -EINVAL; + } + + JOB_RLOCK(&job_table_sem); + job = job_getjob(jid); + if (!job) { + JOB_RUNLOCK(&job_table_sem); + DBG_PRINTEXIT(-ENODATA); + return -ENODATA; + } + + JOB_RLOCK(&job->sem); + JOB_RUNLOCK(&job_table_sem); + + switch (type) { +#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE) + case JOB_ACCT_CSA: + { + struct job_csa *csa = (struct job_csa *)data; + + if (subfield & JOB_CSA_ACCTFILE) { + job->csa.acctfile = csa->job_acctfile; + } + break; + } +#endif + default: + JOB_RUNLOCK(&job->sem); + DBG_PRINTEXIT(-EINVAL); + return -EINVAL; + break; + } + JOB_RUNLOCK(&job->sem); + DBG_PRINTEXIT(0); + return 0; +} + + + +/* + * job_syscall + * + * Function to handle job syscall requests. + * + * Returns 0 on success and -(ERRNO VALUE) upon failure. + */ +int +job_syscall(unsigned int request, unsigned long data) +{ + int rc=0; + + DBG_PRINTINIT("job_syscall"); + + DBG_PRINTENTRY(); + + switch (request) { + case JOB_CREATE: + rc = job_sys_create((struct job_create *)data); + break; + case JOB_ATTACH: + case JOB_DETACH: + /* RESERVED */ + rc = -EBADRQC; + break; + case JOB_GETJID: + rc = job_sys_getjid((struct job_getjid *)data); + break; + case JOB_WAITJID: + rc = job_sys_waitjid((struct job_waitjid *)data); + break; + case JOB_KILLJID: + rc = job_sys_killjid((struct job_killjid *)data); + break; + case JOB_GETJIDCNT: + rc = job_sys_getjidcnt((struct job_jidcnt *)data); + break; + case JOB_GETJIDLST: + rc = job_sys_getjidlst((struct job_jidlst *)data); + break; + case JOB_GETPIDCNT: + rc = job_sys_getpidcnt((struct job_pidcnt *)data); + break; + case JOB_GETPIDLST: + rc = job_sys_getpidlst((struct job_pidlst *)data); + break; + case JOB_GETUSER: + rc = job_sys_getuser((struct job_user *)data); + break; + case JOB_GETPRIMEPID: + rc = job_sys_getprimepid((struct job_primepid *)data); + break; + case JOB_SETHID: + rc = job_sys_sethid((struct job_sethid *)data); + break; + case JOB_DETACHJID: + rc = job_sys_detachjid((struct job_detachjid *)data); + break; + case JOB_DETACHPID: + rc = job_sys_detachpid((struct job_detachpid *)data); + break; + case JOB_SETJLIMIT: + case JOB_GETJLIMIT: + case JOB_GETJUSAGE: + case JOB_FREE: + default: + rc = -EBADRQC; + break; + } + + DBG_PRINTEXIT(rc); + return rc; +} + + +/* + * job_ioctl + * + * Function to handle job ioctl call requests. + * + * Returns 0 on success and -(ERRNO VALUE) upon failure. + */ +int +job_ioctl(struct inode *inode, struct file *file, unsigned int request, + unsigned long data) +{ + return job_syscall(request, data); +} + + +/* + * init_module + * + * This function is called when a module is inserted into a kernel. This + * function allocates any necessary structures and sets initial values for + * module data. + * + * If the function succeeds, then 0 is returned. On failure, -1 is returned. + */ +static int __init +init_job(void) +{ + int i,rc; + + + /* Initialize the job table chains */ + for (i = 0; i < HASH_SIZE; i++) { + INIT_LIST_HEAD(&job_table[i]); + } + + /* Initialize the list for accounting subscribers */ + for (i = 0; i < JOB_ACCT_COUNT; i++) { + acct_list[i] = NULL; + } + + /* Get hostID string and fill in jid_template hostID segment */ + if (hid) { + jid_hid = (int)simple_strtoul(hid, &hid, 16); + } else { + jid_hid = 0; + } + + rc = pagg_hook_register(&pagg_hook); + if (rc < 0) { + return -1; + } + + /* Setup our /proc entry file */ + job_proc_entry = create_proc_entry(JOB_PROC_ENTRY, + S_IFREG | S_IRUGO, &proc_root); + + if (!job_proc_entry) { + pagg_hook_unregister(&pagg_hook); + return -1; + } + + job_proc_entry->proc_fops = &job_file_ops; + job_proc_entry->proc_iops = NULL; + + + return 0; +} +module_init(init_job); + +/* + * cleanup_module + * + * This function is called to cleanup after a module when it is removed. + * All memory allocated for this module will be freed. + * + * This function does not take any inputs or produce and output. + */ +static void __exit +cleanup_job(void) +{ + remove_proc_entry(JOB_PROC_ENTRY, &proc_root); + pagg_hook_unregister(&pagg_hook); + return; +} +module_exit(cleanup_job); + +EXPORT_SYMBOL(job_register_acct); +EXPORT_SYMBOL(job_unregister_acct); +EXPORT_SYMBOL(job_getjid); +EXPORT_SYMBOL(job_getacct); +EXPORT_SYMBOL(job_setacct); Index: linux/kernel/Makefile =================================================================== --- linux.orig/kernel/Makefile +++ linux/kernel/Makefile @@ -19,6 +19,7 @@ obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o obj-$(CONFIG_COMPAT) += compat.o obj-$(CONFIG_PAGG) += pagg.o +obj-$(CONFIG_PAGG_JOB) += job.o obj-$(CONFIG_IKCONFIG) += configs.o obj-$(CONFIG_IKCONFIG_PROC) += configs.o obj-$(CONFIG_STOP_MACHINE) += stop_machine.o ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Process Aggregates (PAGG) for 2.6.7 2004-06-24 18:32 ` Limin Gu @ 2004-06-24 18:57 ` Chris Wright 2004-06-24 19:12 ` Limin Gu 2004-06-24 19:31 ` Jay Lan 1 sibling, 1 reply; 8+ messages in thread From: Chris Wright @ 2004-06-24 18:57 UTC (permalink / raw) To: Limin Gu; +Cc: Erik Jacobson, linux-kernel, jlan, limin, pwil3058 * Limin Gu (limin@dbear.engr.sgi.com) wrote: > Job has not received much feedback from the community yet, we welcome > any comments/suggestions/criticism for you. I recall seeing a bunch of syscall looking pieces in job that seemed odd to be stuck behind a module. Ah, yes... > +/* Function prototypes */ > +static int job_sys_create(struct job_create *); > +static int job_sys_getjid(struct job_getjid *); > +static int job_sys_waitjid(struct job_waitjid *); > +static int job_sys_killjid(struct job_killjid *); > +static int job_sys_getjidcnt(struct job_jidcnt *); > +static int job_sys_getjidlst(struct job_jidlst *); > +static int job_sys_getpidcnt(struct job_pidcnt *); > +static int job_sys_getpidlst(struct job_pidlst *); > +static int job_sys_getuser(struct job_user *); > +static int job_sys_getprimepid(struct job_primepid *); > +static int job_sys_sethid(struct job_sethid *); > +static int job_sys_detachjid(struct job_detachjid *); > +static int job_sys_detachpid(struct job_detachpid *); > +static int job_attach(struct task_struct *, struct pagg *, void *); > +static void job_detach(struct task_struct *, struct pagg *); > +static struct job_entry *job_getjob(u64 jid); > +static int job_syscall(unsigned int, unsigned long); > + > +u64 job_getjid(struct task_struct *); > + > +int job_ioctl(struct inode *, struct file *, unsigned int, unsigned long); [snip] > +/* > + * job_syscall > + * > + * Function to handle job syscall requests. > + * > + * Returns 0 on success and -(ERRNO VALUE) upon failure. > + */ > +int > +job_syscall(unsigned int request, unsigned long data) trivial...declared static above. > +{ > + int rc=0; > + > + DBG_PRINTINIT("job_syscall"); > + > + DBG_PRINTENTRY(); > + > + switch (request) { > + case JOB_CREATE: > + rc = job_sys_create((struct job_create *)data); > + break; > + case JOB_ATTACH: > + case JOB_DETACH: > + /* RESERVED */ > + rc = -EBADRQC; > + break; > + case JOB_GETJID: > + rc = job_sys_getjid((struct job_getjid *)data); > + break; > + case JOB_WAITJID: > + rc = job_sys_waitjid((struct job_waitjid *)data); > + break; > + case JOB_KILLJID: > + rc = job_sys_killjid((struct job_killjid *)data); > + break; > + case JOB_GETJIDCNT: > + rc = job_sys_getjidcnt((struct job_jidcnt *)data); > + break; > + case JOB_GETJIDLST: > + rc = job_sys_getjidlst((struct job_jidlst *)data); > + break; > + case JOB_GETPIDCNT: > + rc = job_sys_getpidcnt((struct job_pidcnt *)data); > + break; > + case JOB_GETPIDLST: > + rc = job_sys_getpidlst((struct job_pidlst *)data); > + break; > + case JOB_GETUSER: > + rc = job_sys_getuser((struct job_user *)data); > + break; > + case JOB_GETPRIMEPID: > + rc = job_sys_getprimepid((struct job_primepid *)data); > + break; > + case JOB_SETHID: > + rc = job_sys_sethid((struct job_sethid *)data); > + break; > + case JOB_DETACHJID: > + rc = job_sys_detachjid((struct job_detachjid *)data); > + break; > + case JOB_DETACHPID: > + rc = job_sys_detachpid((struct job_detachpid *)data); > + break; > + case JOB_SETJLIMIT: > + case JOB_GETJLIMIT: > + case JOB_GETJUSAGE: > + case JOB_FREE: > + default: > + rc = -EBADRQC; > + break; > + } > + > + DBG_PRINTEXIT(rc); > + return rc; > +} > + > + > +/* > + * job_ioctl > + * > + * Function to handle job ioctl call requests. > + * > + * Returns 0 on success and -(ERRNO VALUE) upon failure. > + */ > +int > +job_ioctl(struct inode *inode, struct file *file, unsigned int request, > + unsigned long data) > +{ > + return job_syscall(request, data); > +} So, this is really ioctl. This should be exposed in fs interface, or the primitives should be promoted to first class syscalls if others can use this. thanks, -chris -- Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Process Aggregates (PAGG) for 2.6.7 2004-06-24 18:57 ` Chris Wright @ 2004-06-24 19:12 ` Limin Gu 2004-06-24 19:15 ` Chris Wright 0 siblings, 1 reply; 8+ messages in thread From: Limin Gu @ 2004-06-24 19:12 UTC (permalink / raw) To: Chris Wright; +Cc: Erik Jacobson, linux-kernel, jlan, limin, pwil3058 > So, this is really ioctl. This should be exposed in fs interface, or > the primitives should be promoted to first class syscalls if others can > use this. Yes, that would be better. But right now, we only have CSA ( Comprehensive System Accounting) use job, :) --Limin > > thanks, > -chris > -- > Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Process Aggregates (PAGG) for 2.6.7 2004-06-24 19:12 ` Limin Gu @ 2004-06-24 19:15 ` Chris Wright 0 siblings, 0 replies; 8+ messages in thread From: Chris Wright @ 2004-06-24 19:15 UTC (permalink / raw) To: Limin Gu; +Cc: Chris Wright, Erik Jacobson, linux-kernel, jlan, limin, pwil3058 * Limin Gu (limin@dbear.engr.sgi.com) wrote: > > So, this is really ioctl. This should be exposed in fs interface, or > > the primitives should be promoted to first class syscalls if others can > > use this. > > Yes, that would be better. > > But right now, we only have CSA ( Comprehensive System Accounting) use > job, :) Sure, even as it stands it should probably move to an fs interface, and what about reuse/consolidation with CKRM needs? thanks, -chris -- Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Process Aggregates (PAGG) for 2.6.7 2004-06-24 18:32 ` Limin Gu 2004-06-24 18:57 ` Chris Wright @ 2004-06-24 19:31 ` Jay Lan 1 sibling, 0 replies; 8+ messages in thread From: Jay Lan @ 2004-06-24 19:31 UTC (permalink / raw) To: linux-kernel; +Cc: Erik Jacobson, limin, pwil3058 [-- Attachment #1: Type: text/plain, Size: 916 bytes --] Attached is the Comprehensive System Accounting (CSA) patch to kernel 2.6.7. The project page of CSA is located at http://oss.sgi.com/projects/csa/ The current CSA rpm release is 2.0.0. Kernel patches to 2.6 and 2.4 are provided. Any comment, suggestion, bug post/fix are very much welcome! Signed-off-by: Jay Lan <jlan@sgi.com> --- Jay Lan - Linux System Software Silicon Graphics Inc., Mountain View, CA Limin Gu wrote: >>Attached is a PAGG patch to kernel 2.6.7. >> >>The maintainers of two patches that make use of PAGG will post their patches >>in to this discussion thread shortly. > > > One user of PAGG is job, a loadable kernel module. > > You can find the documentation of job in the attached patch. > > Job has not received much feedback from the community yet, we welcome > any comments/suggestions/criticism for you. > > Thanks! > > Limin Gu - Linux System Software - Silicon Graphics > > [-- Attachment #2: linux-2.6.7.csa.patch --] [-- Type: text/plain, Size: 77424 bytes --] Index: linux/drivers/block/ll_rw_blk.c =================================================================== --- linux.orig/drivers/block/ll_rw_blk.c +++ linux/drivers/block/ll_rw_blk.c @@ -1617,6 +1617,7 @@ { DEFINE_WAIT(wait); struct request *rq; + unsigned long start_wait = jiffies; generic_unplug_device(q); do { @@ -1645,6 +1646,7 @@ finish_wait(&rl->wait[rw], &wait); } while (!rq); + current->bwtime += (unsigned long) jiffies - start_wait; return rq; } @@ -1895,10 +1897,12 @@ if (rw == READ) { disk_stat_add(rq->rq_disk, read_sectors, nr_sectors); + current->rblk += nr_sectors; if (!new_io) disk_stat_inc(rq->rq_disk, read_merges); } else if (rw == WRITE) { disk_stat_add(rq->rq_disk, write_sectors, nr_sectors); + current->wblk += nr_sectors; if (!new_io) disk_stat_inc(rq->rq_disk, write_merges); } Index: linux/fs/exec.c =================================================================== --- linux.orig/fs/exec.c +++ linux/fs/exec.c @@ -47,6 +47,7 @@ #include <linux/syscalls.h> #include <linux/rmap.h> #include <linux/pagg.h> +#include <linux/csa_internal.h> #include <asm/uaccess.h> #include <asm/pgalloc.h> #include <asm/mmu_context.h> @@ -1137,6 +1138,9 @@ /* execve success */ security_bprm_free(&bprm); + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); return retval; } Index: linux/fs/read_write.c =================================================================== --- linux.orig/fs/read_write.c +++ linux/fs/read_write.c @@ -215,8 +215,11 @@ ret = file->f_op->read(file, buf, count, pos); else ret = do_sync_read(file, buf, count, pos); - if (ret > 0) + if (ret > 0) { dnotify_parent(file->f_dentry, DN_ACCESS); + current->rchar += ret; + } + current->syscr++; } } @@ -259,8 +262,11 @@ ret = file->f_op->write(file, buf, count, pos); else ret = do_sync_write(file, buf, count, pos); - if (ret > 0) + if (ret > 0) { dnotify_parent(file->f_dentry, DN_MODIFY); + current->wchar += ret; + } + current->syscw++; } } @@ -519,6 +525,10 @@ fput_light(file, fput_needed); } + if (ret > 0) { + current->rchar += ret; + } + current->syscr++; return ret; } @@ -535,6 +545,10 @@ fput_light(file, fput_needed); } + if (ret > 0) { + current->wchar += ret; + } + current->syscw++; return ret; } @@ -609,6 +623,13 @@ retval = in_file->f_op->sendfile(in_file, ppos, count, file_send_actor, out_file); + if (retval > 0) { + current->rchar += retval; + current->wchar += retval; + } + current->syscr++; + current->syscw++; + if (*ppos > max) retval = -EOVERFLOW; Index: linux/include/linux/csa.h =================================================================== --- /dev/null +++ linux/include/linux/csa.h @@ -0,0 +1,526 @@ +/* + * Copyright (c) 2000-2002 Silicon Graphics, Inc and LANL All Rights Reserved. + * Copyright (c) 2004 Silicon Graphics, Inc All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 59 Temple Place - Suite 330, Boston MA 02111-1307, USA. + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + */ +/* + * CSA (Comprehensive System Accounting) + * Job Accounting for Linux + * + * This header file contains the definitions needed for job + * accounting. The kernel CSA accounting module code and all + * user-level programs that try to write or process the binary job + * accounting data must include this file. + * + * + */ + +#ifndef _LINUX_CSA_H +#define _LINUX_CSA_H + +#ifndef __KERNEL__ +#include <stdint.h> +#include <sys/types.h> +#endif + +/* + * accounting flags per-process + */ +#define AFORK 0x01 /* fork, but did not exec */ +#define ASU 0x02 /* super-user privileges */ +#define ACKPT 0x04 /* process has been checkpointed */ +#define ACORE 0x08 /* produced corefile */ +#define AXSIG 0x10 /* killed by a signal */ +#define AMORE 0x20 /* more CSA acct records for this process */ +#define AINC 0x40 /* incremental accounting record */ + +#define AHZ 100 + +/* + * Magic number - for achead.ah_magic in the 1st header. The magic number + * in the 2nd header is the inverse of this. + */ +#define ACCT_MAGIC_BIG 030510 /* big-endian */ +#define ACCT_MAGIC_LITTLE 030512 /* little-endian */ +#ifdef __LITTLE_ENDIAN +#define ACCT_MAGIC ACCT_MAGIC_LITTLE +#else +#define ACCT_MAGIC ACCT_MAGIC_BIG +#endif + +/* + * Record types - for achead.ah_type in the 1st header. + */ +#define ACCT_KERNEL_CSA 0001 /* Kernel: CSA base record */ +#define ACCT_KERNEL_MEM 0002 /* Kernel: memory record */ +#define ACCT_KERNEL_IO 0004 /* Kernel: input/output record */ +#define ACCT_KERNEL_MT 0006 /* Kernel: multi-tasking record */ +#define ACCT_KERNEL_MPP 0010 /* Kernel: multi-PE appl record */ +#define ACCT_KERNEL_SOJ 0012 /* Kernel: start-of-job record */ +#define ACCT_KERNEL_EOJ 0014 /* Kernel: end-of-job record */ +#define ACCT_KERNEL_CFG 0020 /* Kernel: configuration record */ + +#define ACCT_KERNEL_SITE0 0100 /* Kernel: reserved for site */ +#define ACCT_KERNEL_SITE1 0101 /* Kernel: reserved for site */ + +#define ACCT_DAEMON_NQS 0120 /* Daemon: NQS record */ +#define ACCT_DAEMON_WKMG 0122 /* Daemon: workload management record, + i.e., LSF */ +#define ACCT_DAEMON_TAPE 0124 /* Daemon: tape record */ +#define ACCT_DAEMON_DMIG 0126 /* Daemon: data migration record */ +#define ACCT_DAEMON_SOCKET 0130 /* Daemon: socket record */ + +#define ACCT_DAEMON_SITE0 0200 /* Daemon: reserved for site */ +#define ACCT_DAEMON_SITE1 0201 /* Daemon: reserved for site */ + +#define ACCT_JOB_HEADER 0220 /* csabuild: job header record */ +#define ACCT_CACCT 0222 /* cacct: consolidated data */ +#define ACCT_CMS 0224 /* cms: command summary data */ + +/* Record types - for achead.ah_type in the 2nd header. */ +#define ACCT_MEM 1<<0 /* Process generated memory record */ +#define ACCT_IO 1<<1 /* Process generated I/O record */ +#define ACCT_MT 1<<2 /* Process used multi-tasking */ +#define ACCT_MPP 1<<3 /* Process used multi-PE */ + +/* + * Record revision levels. + * + * These are incremented to indicate that a record's format has changed since + * a previous release. + */ +#define REV_CSA 02400 /* Kernel: CSA base record */ +#define REV_MEM 02400 /* Kernel: memory record */ +#define REV_IO 02400 /* Kernel: I/O record */ +#define REV_MT 02400 /* Kernel: multi-tasking record */ +#define REV_MPP 02400 /* Kernel: multi-PE appl record */ +#define REV_SOJ 02400 /* Kernel: start-of-job record */ +#define REV_EOJ 02400 /* Kernel: end-of-job record */ +#define REV_CFG 02400 /* Kernel: configuration record */ + +#define REV_NQS 02400 /* Daemon: NQS record */ +#define REV_WKMG 02400 /* Daemon: workload management (i.e., LSF) + record */ +#define REV_TAPE 02400 /* Daemon: tape record */ +#define REV_DMIG 02400 /* Daemon: data migration record */ +#define REV_SOCKET 02400 /* Daemon: socket record */ + +#define REV_JOB 02400 /* csabuild: job header record */ +#define REV_CACCT 02400 /* cacct: consolidated data */ +#define REV_CMS 02400 /* cms: command summary data */ + +/* + * Record header + */ +struct achead +{ + unsigned int ah_magic:17; /* Magic */ + unsigned int ah_revision:15; /* Revision */ + unsigned int ah_type:8; /* Record type */ + unsigned int ah_flag:8; /* Record flags */ + unsigned int ah_size:16; /* Size of record */ +}; + +/* + * In order to keep the accounting records the same size across different + * machine types, record fields will be defined to types that won't + * vary (i.e. uint_32_t instead of uid_t). +*/ + +/* + * Per process base accounting record. + */ +struct acctcsa +{ + struct achead ac_hdr1; /* Header */ + struct achead ac_hdr2; /* 2nd header for continued records */ + double ac_sbu; /* System billing units */ + unsigned int ac_stat:8; /* Exit status */ + unsigned int ac_nice:8; /* Nice value */ + unsigned char ac_sched; /* Scheduling discipline */ + unsigned int :8; /* Unused */ + uint32_t ac_uid; /* User ID */ + uint32_t ac_gid; /* Group ID */ + uint64_t ac_ash; /* Array session handle */ + uint64_t ac_jid; /* Job ID */ + uint64_t ac_prid; /* Project ID -> account ID */ + uint32_t ac_pid; /* Process ID */ + uint32_t ac_ppid; /* Parent process ID */ + time_t ac_btime; /* Beginning time [sec since 1970] */ + char ac_comm[16]; /* Command name */ +/* CPU resource usage information. */ + uint64_t ac_etime; /* Elapsed time [usecs] */ + uint64_t ac_utime; /* User CPU time [usec] */ + uint64_t ac_stime; /* System CPU time [usec] */ + uint64_t ac_spare; /* Spare field */ + uint64_t ac_spare1; /* Spare field */ +}; + +/* + * Memory accounting structure + * This structure is part of the acctmem record. + */ +struct memint +{ + uint64_t himem; /* Hiwater memory usage [Kbytes] */ + uint64_t mem1; /* Memory integral 1 [Mbytes/uSec] */ + uint64_t mem2; /* Memory integral 2 - not used */ + uint64_t mem3; /* Memory integral 3 - not used */ +}; + +/* + * Memory accounting record + */ +struct acctmem +{ + struct achead ac_hdr; /* Header */ + double ac_sbu; /* System billing units */ + struct memint ac_core; /* Core memory integrals */ + struct memint ac_virt; /* Virtual memory integrals */ + uint64_t ac_pgswap; /* # of pages swapped */ + uint64_t ac_minflt; /* # of minor page faults */ + uint64_t ac_majflt; /* # of major page faults */ + uint64_t ac_spare; /* Spare field */ +}; + +/* + * Input/Output accounting record + */ +struct acctio +{ + struct achead ac_hdr; /* Header */ + double ac_sbu; /* System billing units */ + uint64_t ac_bwtime; /* Block I/O wait time [usecs] */ + uint64_t ac_rwtime; /* Raw I/O wait time [usecs] */ + uint64_t ac_chr; /* Number of chars (bytes) read */ + uint64_t ac_chw; /* Number of chars (bytes) written */ + uint64_t ac_bkr; /* Number of blocks read */ + uint64_t ac_bkw; /* Number of blocks written */ + uint64_t ac_scr; /* Number of read system calls */ + uint64_t ac_scw; /* Number of write system calls */ + uint64_t ac_spare; /* Spare field */ +}; + +/* + * Multi-tasking accounting structure + * This structure is part of the acctmt record. + */ +struct mtask +{ + uint64_t mt; /* CPU+1 connect time [usecs] */ + uint64_t spare1; /* Spare field */ + uint64_t spare2; /* Spare field */ +}; + +/* + * Multi-tasking accounting record - currently not used, adapted from UNICOS. + */ +#define ACCT_MAXCPUS 512 /* Maximum number of CPUs supported */ + +struct acctmt +{ + struct achead ac_hdr; /* Header */ + double ac_sbu; /* System billing units */ + unsigned int ac_numcpu:16; /* Max number of CPUs used */ + unsigned int ac_maxcpu:16; /* Max number of CPUs available */ + unsigned int :32; /* Unused */ + int64_t ac_smwtime; /* Semaphore wait time [usec] */ + struct mtask ac_mttime[ACCT_MAXCPUS]; /* Time connected to (i+1) + CPUs [usec] */ +}; + +/* + * MPP PE accounting structure - MPP hardware specific. + * This structure is part of the acctmpp record. + */ +struct acctpe +{ + uint64_t utime; /* User CPU time [usecs] */ + uint64_t srtime; /* System & remote CPU time [usecs] */ + uint64_t io; /* Number of chars transferred */ +}; + +/* + * MPP accounting record - MPP hardware specific; currently not used. + */ +#define ACCT_MAXPES 1024 /* Maximum number of PEs */ + +struct acctmpp +{ + struct achead ac_hdr; /* Header */ + double ac_sbu; /* System billing units */ + unsigned int ac_mpbesu:8; /* Number of BESUs used */ + unsigned int ac_mppe:24; /* Number of PEs used */ + uint64_t ac_himem; /* Maximum memory hiwater [Mbytes] */ + + struct acctpe ac_mpp[ACCT_MAXPES]; /* Per PE information */ +}; + +/* + * MPP Detailed PE accounting structure - currently not used + */ +struct acctdpe +{ + struct achead ac_hdr; /* Header */ + + uint64_t utime; /* User CPU time [usecs] */ + uint64_t stime; /* System CPU time [usecs] */ + uint64_t rtime; /* Remote CPU time [usecs] */ + + uint64_t ctime; /* Connect CPU time [usecs] */ + uint64_t io; /* Number of chars transferred */ + uint64_t spare; /* Spare field */ +}; + +/* + * Start-of-job record + * Written when a job is created. + */ + +typedef enum +{ + AC_INIT_LOGIN, /* Initiated by login */ + AC_INIT_NQS, /* Initiated by NQS */ + AC_INIT_LSF, /* Initiated by LSF */ + AC_INIT_CROND, /* Initiated by crond */ + AC_INIT_FTPD, /* Initiated by ftpd */ + AC_INIT_INETD, /* Initiated by inetd */ + AC_INIT_TELNETD, /* Initiated by telnetd */ + AC_INIT_MAX +} ac_inittype; + + +#define AC_SOJ 1 /* Start-of-job record type */ +#define AC_ROJ 2 /* Restart-of-job record type */ + +struct acctsoj +{ + struct achead ac_hdr; /* Header */ + unsigned int ac_type:8; /* Record type (AC_SOJ, AC_ROJ) */ + ac_inittype ac_init:8; /* Initiator - currently not used */ + unsigned int :16; /* Unused */ + uint32_t ac_uid; /* User ID */ + uint64_t ac_jid; /* Job ID */ + time_t ac_btime; /* Start time [secs since 1970] */ + time_t ac_rstime; /* Restart time [secs since 1970] */ +}; + +/* + * End-of-job record + * Written when the last process of a job exits. + */ +struct accteoj +{ + struct achead ac_hdr1; /* Header */ + struct achead ac_hdr2; /* 2nd header for continued records */ + double ac_sbu; /* System billing units */ + ac_inittype ac_init:8; /* Initiator - currently not used */ + unsigned int ac_nice:8; /* Nice value */ + unsigned int :16; /* Unused */ + uint32_t ac_uid; /* User ID */ + uint32_t ac_gid; /* Group ID */ + uint64_t ac_ash; /* Array session handle; not used */ + uint64_t ac_jid; /* Job ID */ + uint64_t ac_prid; /* Project ID; not used */ + time_t ac_btime; /* Job start time [secs since 1970] */ + time_t ac_etime; /* Job end time [secs since 1970] */ + uint64_t ac_corehimem; /* Hiwater core mem [Kbytes] */ + uint64_t ac_virthimem; /* Hiwater virt mem [Kbytes] */ +/* CPU resource usage information. */ + uint64_t ac_utime; /* User CPU time [usec] */ + uint64_t ac_stime; /* System CPU time [usec] */ + uint32_t ac_spare; +}; + +/* + * Accounting configuration uname structure + * This structure is part of the acctcfg record. + */ +struct ac_utsname +{ + char sysname[26]; + char nodename[26]; + char release[42]; + char version[41]; + char machine[26]; +}; + +/* + * Accounting configuration record + * Written for accounting configuration changes. + */ +typedef enum +{ + AC_CONFCHG_BOOT, /* Boot time (always first) */ + AC_CONFCHG_FILE, /* Reporting pacct file change */ + AC_CONFCHG_ON, /* Reporting xxx ON */ + AC_CONFCHG_OFF, /* Reporting xxx OFF */ + AC_CONFCHG_INC_DELTA, /* Report incremental acct clock delta change */ AC_CONFCHG_INC_EVENT, /* Report incremental accounting event */ + AC_CONFCHG_MAX +} ac_eventtype; + +struct acctcfg +{ + struct achead ac_hdr; /* Header */ + unsigned int ac_kdmask; /* Kernel and daemon config mask */ + unsigned int ac_rmask; /* Record configuration mask */ + int64_t ac_uptimelen; /* Bytes from the end of the boot + record to the next boot record */ + ac_eventtype ac_event:8; /* Accounting configuration event */ + unsigned int :24; /* Unused */ + time_t ac_boottime; /* System boot time [secs since 1970]*/ + time_t ac_curtime; /* Current time [secs since 1970] */ + struct ac_utsname ac_uname; /* Condensed uname information */ +}; + + +/* + * Accounting control status values. + */ +typedef enum +{ + ACS_OFF, /* Accounting stopped for this entry */ + ACS_ERROFF, /* Accounting turned off by kernel */ + ACS_ON /* Accounting started for this entry */ +} ac_status; + +/* + * Function codes for CSA library interface + */ +typedef enum +{ + AC_START, /* Start kernel, daemon, or record accounting */ + AC_STOP, /* Stop kernel, daemon, or record accounting */ + AC_HALT, /* Stop all kernel, daemon, and record accounting */ + AC_CHECK, /* Check a kernel, daemon, or record accounting state*/ + AC_KDSTAT, /* Check all kernel & daemon accounting states */ + AC_RCDSTAT, /* Check all record accounting states */ + AC_JASTART, /* Start user job accounting */ + AC_JASTOP, /* Stop user job accounting */ + AC_WRACCT, /* Write accounting record for daemon */ + AC_AUTH, /* Verify executing user is authorized */ + AC_INCACCT, /* Control incremental accounting */ + AC_MREQ +} ac_request; + +/* + * Define the CSA accounting record type indices. + */ +typedef enum +{ + ACCT_KERN_CSA, /* Kernel CSA accounting */ + ACCT_KERN_JOB_PROC, /* Kernel job process summary accounting */ + ACCT_KERN_ASH, /* Kernel array session summary accounting */ + ACCT_DMD_NQS, /* Daemon NQS accounting */ + ACCT_DMD_WKMG, /* Daemon workload management (i.e. LSF) acct */ + ACCT_DMD_TAPE, /* Daemon tape accounting */ + ACCT_DMD_DMIG, /* Daemon data migration accounting */ + ACCT_DMD_SOCKET, /* Daemon socket accounting */ + ACCT_DMD_SITE1, /* Site reserved daemon acct */ + ACCT_DMD_SITE2, /* Site reserved daemon acct */ + ACCT_MAXKDS, /* Max # kernel and daemon entries */ + + ACCT_RCD_MPPDET, /* Record acct for MPP detail exit info */ + ACCT_RCD_MEM, /* Record acct for memory */ + ACCT_RCD_IO, /* Record acct for input/output */ + ACCT_RCD_MT, /* Record acct for multi-tasking */ + ACCT_RCD_MPP, /* Record acct for MPP accumulated info */ + ACCT_THD_MEM, /* Record acct for memory size threshhold */ + ACCT_THD_TIME, /* Record acct for CPU time threshhold */ + ACCT_RCD_INCACCT, /* Record acct for incremental accounting */ + ACCT_RCD_APPACCT, /* Record acct for application accounting */ + ACCT_RCD_SITE1, /* Site reserved record acct */ + ACCT_RCD_SITE2, /* Site reserved record acct */ + ACCT_MAXRCDS /* Max # record entries */ +} ac_kdrcd; + +#define ACCT_RCDS ACCT_RCD_MPPDET /* Record acct low range definition */ +#define NUM_KDS (ACCT_MAXKDS - ACCT_KERN_CSA) +#define NUM_RCDS (ACCT_MAXRCDS - ACCT_RCDS) +#define NUM_KDRCDS (NUM_KDS + NUM_RCDS) + + +/* + * The following structures are used to get status of a CSA accounting type. + */ + +/* + * Accounting entry status structure + */ +struct actstat +{ + ac_kdrcd ac_ind; /* Entry index */ + ac_status ac_state; /* Entry status */ + int64_t ac_param; /* Entry parameter */ +}; + +/* + * Accounting control and status structure + */ +#define ACCT_PATH 128 /* Max path length for accounting file */ + +struct actctl +{ + int ac_sttnum; /* Number of status array entries */ + char ac_path[ACCT_PATH]; /* Path name for accounting file */ + struct actstat ac_stat[NUM_KDRCDS]; /* Entry status array */ +}; + +/* + * Function codes for incremental accounting; currently not used + */ +typedef enum +{ + IA_NONE, /* Zero entry place holder */ + IA_DELTA, /* Change clock delta for incremental accounting */ + IA_EVENT, /* Cause incremental accounting event now */ + IA_MAX +} ac_iafnc; + +/* + * Incremental accounting structure; currently not used + */ +struct actinc +{ + int ac_ind; /* Entry index */ + ac_iafnc ac_fnc; /* Entry function */ + int64_t ac_param; /* Entry parameter */ +}; + +/* + * Daemon write accounting structure + */ +#define MAX_WRACCT 1024 /* Maximum buffer size of wracct() */ + +struct actwra +{ + int ac_did; /* Daemon index */ + int ac_len; /* Length of buffer (bytes) */ + uint64_t ac_jid; /* Job ID */ + char *ac_buf; /* Daemon accounting buffer */ +}; + +/* These definitions are used with the CSA /proc IOCTL interface */ +#define CSA_PROC "csa" +#define CSA_IOCTL_NUM 'A' + + +#endif /* _LINUX_CSA_H */ Index: linux/include/linux/csa_internal.h =================================================================== --- /dev/null +++ linux/include/linux/csa_internal.h @@ -0,0 +1,85 @@ +/* + * Copyright (c) 2000-2002 Silicon Graphics, Inc and LANL All Rights Reserved. + * Copyright (c) 2004 Silicon Graphics, Inc All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 59 Temple Place - Suite 330, Boston MA 02111-1307, USA. + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + */ + +/* + * CSA (Comprehensive System Accounting) + * Job Accounting for Linux + * + * This header file contains the definitions needed for communication + * between the kernel and the CSA module. + */ + +#ifndef _LINUX_CSA_INTERNAL_H +#define _LINUX_CSA_INTERNAL_H + +#include <linux/config.h> + +extern void (*do_csa_acct) (int, struct task_struct *); + +#if defined (CONFIG_CSA) || defined (CONFIG_CSA_MODULE) + +#include <linux/linkage.h> +#include <linux/ptrace.h> + +static inline void csa_update_integrals(void) +{ + long delta; + + if (current->mm) { + delta = current->stime - current->csa_stimexpd; + current->csa_stimexpd = current->stime; + current->csa_rss_mem1 += delta * current->mm->rss; + current->csa_vm_mem1 += delta * current->mm->total_vm; + } +} + +static inline void csa_clear_integrals(struct task_struct *tsk) +{ + if (tsk) { + tsk->csa_stimexpd = 0; + tsk->csa_rss_mem1 = 0; + tsk->csa_vm_mem1 = 0; + } +} + +/* + * This is the wrapper for the CSA end-of-process accounting record, which + * is written by the CSA csa.c code when a task within a job exits. + */ +static inline void +csa_acct(int exitcode, struct task_struct *p) +{ + if (do_csa_acct != NULL) { + do_csa_acct(exitcode, p); + } +} + +#else /* CONFIG_CSA || CONFIG_CSA_MODULE */ + +#define csa_update_integrals() do { } while (0); +#define csa_clear_integrals(task) do { } while (0); +#define csa_acct(exitcode, task) do { } while (0); +#endif /* CONFIG_CSA || CONFIG_CSA_MODULE */ + +#endif /* _LINUX_CSA_INTERNAL_H */ Index: linux/include/linux/sched.h =================================================================== --- linux.orig/include/linux/sched.h +++ linux/include/linux/sched.h @@ -234,6 +234,8 @@ struct kioctx *ioctx_list; struct kioctx default_kioctx; + + unsigned long hiwater_rss, hiwater_vm; }; extern int mmlist_nr; @@ -441,6 +443,7 @@ unsigned long it_real_value, it_prof_value, it_virt_value; unsigned long it_real_incr, it_prof_incr, it_virt_incr; struct timer_list real_timer; + struct list_head posix_timers; /* POSIX.1b Interval Timers */ unsigned long utime, stime, cutime, cstime; unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw; /* context switch counts */ u64 start_time; @@ -459,6 +462,7 @@ char comm[16]; /* file system info */ int link_count, total_link_count; + struct tty_struct *tty; /* NULL if no tty */ /* ipc stuff */ struct sysv_sem sysvsem; /* CPU-specific state of this task */ @@ -519,7 +523,12 @@ struct list_head pagg_list; struct rw_semaphore pagg_sem; #endif - +/* i/o counters(bytes read/written, blocks read/written, #syscalls, waittime */ + unsigned long rchar, wchar, rblk, wblk, syscr, syscw, bwtime; +#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE) + unsigned long csa_rss_mem1, csa_vm_mem1; + clock_t csa_stimexpd; +#endif }; static inline pid_t process_group(struct task_struct *tsk) @@ -851,6 +860,19 @@ /* Remove the current tasks stale references to the old mm_struct */ extern void mm_release(struct task_struct *, struct mm_struct *); +/* Update highwater values */ +static inline void update_mem_hiwater(void) +{ + if (current->mm) { + if (current->mm->hiwater_rss < current->mm->rss) { + current->mm->hiwater_rss = current->mm->rss; + } + if (current->mm->hiwater_vm < current->mm->total_vm) { + current->mm->hiwater_vm = current->mm->total_vm; + } + } +} + extern int copy_thread(int, unsigned long, unsigned long, unsigned long, struct task_struct *, struct pt_regs *); extern void flush_thread(void); extern void exit_thread(void); @@ -928,8 +950,7 @@ extern void unhash_process(struct task_struct *p); -/* - * Protects ->fs, ->files, ->mm, ->ptrace and synchronises with wait4(). +/* Protects ->fs, ->files, ->mm, and synchronises with wait4(). * Nests both inside and outside of read_lock(&tasklist_lock). * It must not be nested with write_lock_irq(&tasklist_lock), * neither inside nor outside. Index: linux/init/Kconfig =================================================================== --- linux.orig/init/Kconfig +++ linux/init/Kconfig @@ -154,6 +154,31 @@ a module, select this entry using M. If you do not want support for jobs, select N. +config CSA + tristate " CSA Job Accounting" + depends on PAGG_JOB + help + Comprehensive System Accounting (CSA) provides job level + accounting of resource usage. The accounting records are + written by the kernel into a file. CSA user level scripts + and commands process the binary accounting records and + combine them by job identifier within system boot uptime + periods. These accounting records are then used to produce + reports and charge fees to users. + + Say Y here if you want job level accounting to be compiled + into the kernel. Say M here if you want the writing of + accounting records portion of this feature to be a loadable + module. Say N here if you do not want job level accounting + (the default). + + The CSA commands and scripts package needs to be installed + to process the CSA accounting records. See + http://oss.sgi.com/projects/csa for further information + about CSA and download instructions for the CSA commands + package and documentation. + + config SYSCTL bool "Sysctl support" ---help--- Index: linux/kernel/Makefile =================================================================== --- linux.orig/kernel/Makefile +++ linux/kernel/Makefile @@ -20,6 +20,7 @@ obj-$(CONFIG_COMPAT) += compat.o obj-$(CONFIG_PAGG) += pagg.o obj-$(CONFIG_PAGG_JOB) += job.o +obj-$(CONFIG_CSA) += csa.o obj-$(CONFIG_IKCONFIG) += configs.o obj-$(CONFIG_IKCONFIG_PROC) += configs.o obj-$(CONFIG_STOP_MACHINE) += stop_machine.o Index: linux/kernel/csa.c =================================================================== --- /dev/null +++ linux/kernel/csa.c @@ -0,0 +1,1664 @@ +/* + * Copyright (c) 2000-2002 Silicon Graphics, Inc and LANL All Rights Reserved. + * Copyright (c) 2004 Silicon Graphics, Inc All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 59 Temple Place - Suite 330, Boston MA 02111-1307, USA. + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + */ + +/* + * Description: + * This file, csa.c, contains the procedures that handle kernel CSA + * job accounting. It configures CSA, writes CSA accounting + * records, and processes the acctctl /proc ioctl. This code can + * either be compiled directly into the kernel or compiled as + * a loadable module. + * + * During initialization, this code registers procedure callbacks + * with the PAGG job code. + * + * Author: + * Marlys Kohnke (kohnke@sgi.com) + * + * Contributors: + * + * Changes: + * January 31, 2001 (kohnke) Changed to use semaphores rather than + * spinlocks. Was seeing a spinlock deadlock sometimes when an accounting + * record was being written to disk with 2.4.0 (didn't happen with + * 2.4.0-test7). + * + * February 2, 2001 (kohnke) Changed to handle being compiled directly + * into the kernel, not just compiled as a loadable module. Renamed + * init_module() as init_csa() and cleanup_module() as cleanup_csa(). + * Added calls to module_init() and module_exit(). + * + * January 21, 2003 (jlan) Changed to provide /proc ioctl interface. + * Also, provided MODULE_* clause. + */ + + +#include <linux/config.h> + +#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE) + +#include <linux/errno.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/init.h> +#include <linux/types.h> +#include <linux/mm.h> +#include <linux/file.h> +#include <linux/utsname.h> +#include <linux/proc_fs.h> +#include <asm/uaccess.h> +#include <asm/semaphore.h> + +#include <linux/csa_internal.h> +#include <linux/csa.h> +#include <linux/job.h> + + +static int csa_registered = 0; + +MODULE_AUTHOR("Silicon Graphics, Inc."); +MODULE_DESCRIPTION("CSA Kernel Module"); +MODULE_LICENSE("GPL"); + +static int csa_jstart(int, void *); +static int csa_jexit(int, void *); +static void csa_acct_eop(int, struct task_struct *); +static int csa_modify_buf(char *, struct acctcsa *, struct acctmem *, + struct acctio *, int, int); +static int csa_write(char *, int, int, uint64_t, int, struct job_csa *); +static void csa_config_make(ac_eventtype, struct acctcfg *); +static int csa_config_write(ac_eventtype,struct file *); +static void csa_header(struct achead *, int, int, int); +static long int sc_CLK(long int); + +#define JID_ERR1 "do_csa_acct: No job table entry for jid 0x%llx.\n" +#define JID_ERR2 "csa user job accounting write error %d, jid 0x%llx\n" +#define JID_ERR3 "Can't disable csa user job accounting jid 0x%llx\n" +#define JID_ERR4 "csa user job accounting disabled, jid 0x%llx\n" + +/* #define CSA_DEBUG 0 */ + +#ifdef CSA_DEBUG +#define PRINTK(args...) printk(args) +#else +#define PRINTK(args...) +#endif /* CSA_DEBUG */ + +/* this defines can be removed once they're available in kernel header files */ +/* #define USEC_PER_SEC 1000000L */ /* number of usecs for 1 second */ +#define USEC_PER_TICK (USEC_PER_SEC/HZ) +#define NBPC PAGE_SIZE /* Number of bytes per click */ +#define ctob(x) ((uint64_t)(x)*NBPC) + + +static struct file *csa_acctvp = (struct file *)NULL; +static time_t boottime = 0; + +struct timeval acct_now; /* present time (sec, usec) */ + +static DECLARE_MUTEX(csa_sem); +static DECLARE_MUTEX(csa_write_sem); + +static int csa_flag = 0; /* accounting start state flag */ +char csa_path[ACCT_PATH] = ""; /* current accounting file path name */ +char new_path[ACCT_PATH] = ""; /* new accounting file path name */ + + +static struct job_acctmod csa_job_callbacks = { + .type = JOB_ACCT_CSA, + .jobstart = csa_jstart, + .jobend = csa_jexit, + .module = THIS_MODULE +}; + + +/* modify this when changes are made to ac_kdrcd in csa.h */ +char *acct_dmd_name[ACCT_MAXKDS] = + {"CSA", + "JOB", + "ASH", + "NQS", + "WORKLOAD MGMT", + "TAPE", + "DATA MIGRATION", + "SOCKET", + "SITE1", + "SITE2" }; + +typedef enum { + A_SYS, /* system accounting action (0) */ + A_CJA, /* Job accounting action (1) */ + A_DMD, /* daemon accounting action (2) */ + A_MAX} a_fnc; + +struct actstat acct_dmd[ACCT_MAXKDS][A_MAX]; +struct actstat acct_rcd[ACCT_MAXRCDS-ACCT_RCDS][A_MAX]; + +/* Initialize the CSA accounting state information. */ +#define INIT_DMD(t, i, s, p) acct_dmd[i][t].ac_ind = i; \ + acct_dmd[i][t].ac_state = s; \ + acct_dmd[i][t].ac_param = p; +#define INIT_RCD(t, i, s, p) acct_rcd[i-ACCT_RCDS][t].ac_ind = i; \ + acct_rcd[i-ACCT_RCDS][t].ac_state = s; \ + acct_rcd[i-ACCT_RCDS][t].ac_param = p; + +static int csa_ioctl( struct inode *, struct file *, unsigned int, + unsigned long); +/* proc dir entry */ +struct proc_dir_entry *csa_proc_entry; + +/* file operations for proc file */ +static struct file_operations csa_file_ops = { + owner: THIS_MODULE, + ioctl: csa_ioctl +}; + +#ifdef DEBUG + +#define DBG_PRINTINIT(s) \ + char *dbg_fname = s + +#define DBG_PRINTENTRY() \ +do { \ + printk(KERN_DEBUG __FILE__ ": %s: entry\n", dbg_fname); \ +} while(0) + +#define DBG_PRINTEXIT(c) \ +do { \ + printk(KERN_DEBUG __FILE__ ": %s: exit, code = %d\n", dbg_fname, c); \ +} while(0) + +/* write lock semaphore */ +#define JOB_WLOCK(l) \ +do { \ + printk(KERN_DEBUG __FILE__ ": wlock = %p\n", l); \ + down_write(l); \ +} while(0); + +/* write unlock semaphore */ +#define JOB_WUNLOCK(l) \ +do { \ + printk(KERN_DEBUG __FILE__ ": wunlock = %p\n", l); \ + up_write(l); \ +} while(0); + +/* read lock semaphore */ +#define JOB_RLOCK(l) \ +do { \ + printk(KERN_DEBUG __FILE__ ": rlock = %p\n", l); \ + down_read(l); \ +} while(0); + +/* read unlock semaphore */ +#define JOB_RUNLOCK(l) \ +do { \ + printk(KERN_DEBUG __FILE__ ": runlock = %p\n", l); \ + up_read(l); \ +} while(0); + + +#else /* #ifdef DEBUG */ + +#define DBG_PRINTINIT(s) + +#define DBG_PRINTENTRY() \ +do { \ +} while(0) + +#define DBG_PRINTEXIT(c) \ +do { \ +} while(0) + +/* write lock semaphore */ +#define JOB_WLOCK(l) \ +do { \ + down_write(l); \ +} while(0); + +/* write unlock semaphore */ +#define JOB_WUNLOCK(l) \ +do { \ + up_write(l); \ +} while(0); + +/* read lock semaphore */ +#define JOB_RLOCK(l) \ +do { \ + down_read(l); \ +} while(0); + +/* read unlock semaphore */ +#define JOB_RUNLOCK(l) \ +do { \ + up_read(l); \ +} while(0); + + +#endif /* #ifdef DEBUG */ + + + +/* + * register procedure callbacks with the kernel/csa.c CSA + * code and with the PAGG job code + */ +static int __init +init_csa(void) +{ + int retval = 0; + + if (csa_registered) { + /* + * + * incorrectly using csa_job_acct.c as a loadable module and + * compiled into the kernel?? + */ + printk(KERN_WARNING "init_csa: %s\n", + "Multiple attempts to register CSA support\n"); + return -EBUSY; + } else { + csa_registered = 1; + } + + /* + * register callbacks with the PAGG job code to process + * start-of-job and end-of-job accounting records. If this is a + * module, this registration will also increment the job module + * use count so the job module won't be unloaded out from under + * the CSA module. + */ + retval = job_register_acct(&csa_job_callbacks); + if (retval != 0) { + printk(KERN_INFO "CSA: failed to register job\n"); + return retval; + } + + /* setup our /proc entry file */ + csa_proc_entry = create_proc_entry(CSA_PROC, S_IFREG|S_IRUGO, + &proc_root); + if (!csa_proc_entry) { + csa_registered = 0; + job_unregister_acct(&csa_job_callbacks); + return -1; + } + + csa_proc_entry->proc_fops = &csa_file_ops; + csa_proc_entry->proc_iops = NULL; + + do_csa_acct = csa_acct_eop; + + printk(KERN_INFO "CSA: initialized\n"); + + return retval; +} + + +/* + * Do module cleanup before the module is removed; unregister + * procedure callbacks with the kernel non-module CSA code and + * with the PAGG job module (which decrements the job module use count). + */ +static void __exit +cleanup_csa(void) +{ + int retval = 0; + + csa_registered = 0; + do_csa_acct = NULL; + + retval = job_unregister_acct(&csa_job_callbacks); + if (retval < 0) { + printk(KERN_ERR "CSA module can't unregister with job module." + "Continuing with CSA module cleanup.\n"); + } + remove_proc_entry(CSA_PROC, &proc_root); + printk(KERN_INFO "CSA removed\n"); + return; +} + +/* + * Initialize the CSA accounting state table. + * Modify this when changes are made to ac_kdrcd in csa.h + * + */ +static void +csa_init_acct(int flag) +{ + csa_flag = flag; + + boottime = xtime.tv_sec - (jiffies / HZ); + + /* Initialize system accounting states. */ + INIT_DMD(A_SYS, ACCT_KERN_CSA, ACS_OFF, 0); + INIT_DMD(A_SYS, ACCT_KERN_JOB_PROC, ACS_OFF, 0); + INIT_DMD(A_SYS, ACCT_KERN_ASH, ACS_OFF, 0); + INIT_DMD(A_SYS, ACCT_DMD_NQS, ACS_OFF, 0); + INIT_DMD(A_SYS, ACCT_DMD_WKMG, ACS_OFF, 0); + INIT_DMD(A_SYS, ACCT_DMD_TAPE, ACS_OFF, 0); + INIT_DMD(A_SYS, ACCT_DMD_SOCKET, ACS_OFF, 0); + INIT_DMD(A_SYS, ACCT_DMD_DMIG, ACS_OFF, 0); + INIT_DMD(A_SYS, ACCT_DMD_SITE1, ACS_OFF, 0); + INIT_DMD(A_SYS, ACCT_DMD_SITE2, ACS_OFF, 0); + + INIT_RCD(A_SYS, ACCT_RCD_MPPDET, ACS_OFF, 0); + INIT_RCD(A_SYS, ACCT_RCD_MEM, ACS_OFF, 0); + INIT_RCD(A_SYS, ACCT_RCD_IO, ACS_OFF, 0); + INIT_RCD(A_SYS, ACCT_RCD_MT, ACS_OFF, 0); + INIT_RCD(A_SYS, ACCT_RCD_MPP, ACS_OFF, 0); + INIT_RCD(A_SYS, ACCT_THD_MEM, ACS_OFF, 0); + INIT_RCD(A_SYS, ACCT_THD_TIME, ACS_OFF, 0); + INIT_RCD(A_SYS, ACCT_RCD_INCACCT, ACS_OFF, 0); + INIT_RCD(A_SYS, ACCT_RCD_APPACCT, ACS_OFF, 0); + INIT_RCD(A_SYS, ACCT_RCD_SITE1, ACS_OFF, 0); + INIT_RCD(A_SYS, ACCT_RCD_SITE2, ACS_OFF, 0); + + return; +} + +/* + * convert ticks into microseconds; necessary kernel math ops not + * available on 32-bit systems, so can't use uint64_t + */ +static long int +sc_CLK(long int clock) +{ + long int sec, split; + + sec = clock / HZ; + split = (clock % HZ) * 1000000 / HZ; + + return ((sec * 1000000) + split); +} + +/* Initialize CSA accounting header. */ +static void +csa_header(struct achead *head, int revision, int type, int size) +{ + head->ah_magic = ACCT_MAGIC; + head->ah_revision = revision; + head->ah_type = type; + head->ah_flag = 0; + head->ah_size = size; + + return; +} + +/* + * Create a CSA end-of-process accounting record and write it to + * appropriate file(s) + */ +void +csa_acct_eop(int exitcode, struct task_struct *p) +{ + char acctent[sizeof(struct acctcsa) + + sizeof(struct acctmem) + + sizeof(struct acctio) ]; + char modacctent[sizeof(struct acctcsa) + + sizeof(struct acctmem) + + sizeof(struct acctio) ]; + struct acctcsa *csa = NULL; + struct acctmem *mem = NULL; + struct acctio *io = NULL; + struct achead *hdr1, *hdr2; + char *cb = acctent; + struct job_csa job_acctbuf; + uint64_t jid = 0; + int len = 0; + int csa_enabled = 0; + int ja_enabled = 0; + int io_enabled = 0; + int mem_enabled = 0; + int retval = 0; + uint64_t memtime; + + if (p == NULL) { + printk(KERN_ERR "do_csa_acct: CSA null task pointer\n"); + return; + } + jid = job_getjid(p); + if (jid <= 0) { + /* no job table entry; not all processes are part of a job */ + return; + } + memset(&job_acctbuf, 0, sizeof(job_acctbuf)); + retval = job_getacct(jid, JOB_ACCT_CSA, &job_acctbuf); + if (retval != 0) { + /* couldn't get accounting info stored in the job table entry */ + printk(KERN_WARNING JID_ERR1, (unsigned long long) jid); + return; + } + + down(&csa_sem); + /* + * figure out what's turned on, which determines which record types + * need to be written. All records are written to a user job + * accounting file. Only those record types configured on are + * written to the system pacct file + */ + if (job_acctbuf.job_acctfile != (struct file *)NULL) { + ja_enabled = 1; + } + if (acct_dmd[ACCT_KERN_CSA][A_SYS].ac_state == ACS_ON) { + csa_enabled = 1; + } + if (acct_rcd[ACCT_RCD_IO-ACCT_RCDS][A_SYS].ac_state == ACS_ON) { + io_enabled = 1; + } + if (acct_rcd[ACCT_RCD_MEM-ACCT_RCDS][A_SYS].ac_state == ACS_ON) { + mem_enabled = 1; + } + + if (!ja_enabled && !csa_enabled) { + /* nothing to do */ + up(&csa_sem); + return; + } + up(&csa_sem); + + csa = (struct acctcsa *)acctent; + memset(csa, 0, sizeof(struct acctcsa)); + hdr1 = &csa->ac_hdr1; + csa_header(hdr1, REV_CSA, ACCT_KERNEL_CSA, sizeof(struct acctcsa) ); + hdr2 = &csa->ac_hdr2; + csa_header(hdr2, REV_CSA, ACCT_KERNEL_CSA, 0 ); + hdr2->ah_magic = ~ACCT_MAGIC; + + csa->ac_stat = exitcode; + csa->ac_uid = p->uid; + csa->ac_gid = p->gid; + + /* XXX change this when array session handle info available */ + csa->ac_ash = 0; + csa->ac_jid = job_acctbuf.job_id; + /* XXX change this when project ids are available */ + csa->ac_prid = 0; + csa->ac_nice = task_nice(p); + csa->ac_sched = p->policy; + + csa->ac_pid = p->pid; + csa->ac_ppid = (p->parent) ? p->parent->pid : 0; + if (p->flags & PF_FORKNOEXEC) { + csa->ac_hdr1.ah_flag |= AFORK; + } + if (p->flags & PF_SUPERPRIV) { + csa->ac_hdr1.ah_flag |= ASU; + } + if (p->flags & PF_DUMPCORE) { + csa->ac_hdr1.ah_flag |= ACORE; + } + if (p->flags & PF_SIGNALED) { + csa->ac_hdr1.ah_flag |= AXSIG; + } + csa->ac_hdr1.ah_flag &= ~ACKPT; + + strncpy(csa->ac_comm, p->comm, sizeof(csa->ac_comm)); +/* csa->ac_btime = CT_TO_SECS(p->start_time) + (xtime.tv_sec - + (jiffies / HZ)); */ + csa->ac_btime = do_div(p->start_time, HZ) + (xtime.tv_sec - (jiffies / HZ)); + + /* + * cpu usage is accumulated by the kernel in ticks. + * convert from clock ticks to microseconds; each process gets + * a minimum of a tick for elapsed time. If the granularity + * changes to something finer than a tick in the future, + * then these zero cpu and elapsed time modifications should be + * looked at again. + */ + csa->ac_etime = (jiffies - p->start_time == 0) ? (USEC_PER_TICK) : + ((uint64_t)(jiffies - p->start_time) * USEC_PER_TICK); + + cb += sizeof(struct acctcsa); + len += sizeof(struct acctcsa); + + /* convert from ticks to microseconds */ + csa->ac_utime = p->utime * USEC_PER_TICK; + csa->ac_stime = p->stime * USEC_PER_TICK; + /* Each process gets a minimum of a half tick cpu time */ + if ((csa->ac_utime == 0) && (csa->ac_stime == 0)) { + csa->ac_stime = USEC_PER_TICK/2; + } + + /* Create the memory record if needed */ + if (ja_enabled || mem_enabled) { + mem = (struct acctmem *)cb; + memset(mem, 0, sizeof(struct acctmem)); + hdr1->ah_flag |= AMORE; + hdr2->ah_type |= ACCT_MEM; + hdr1 = &mem->ac_hdr; + csa_header(hdr1, REV_MEM, ACCT_KERNEL_MEM, + sizeof(struct acctmem) ); + + /* adjust from pages/ticks to Mb/usec */ + memtime = sc_CLK((long int)p->csa_rss_mem1); + mem->ac_core.mem1 = ctob(memtime) / (1024 * 1024); + memtime = sc_CLK((long int)p->csa_vm_mem1); + mem->ac_virt.mem1 = ctob(memtime) / (1024 * 1024); + + /* adjust page size to 1K units */ + if (p->mm) { + mem->ac_virt.himem = p->mm->hiwater_vm * (PAGE_SIZE / 1024); + mem->ac_core.himem = p->mm->hiwater_rss * (PAGE_SIZE/1024); + /* + * For processes with zero systime, set the integral + * to the highwater mark rather than leave at zero + */ + if (mem->ac_core.mem1 == 0) { + mem->ac_core.mem1 = mem->ac_core.himem / 1024; + } + if (mem->ac_virt.mem1 == 0) { + mem->ac_virt.mem1 = mem->ac_virt.himem / 1024; + } + } + + mem->ac_minflt = p->min_flt; + mem->ac_majflt = p->maj_flt; + + cb += sizeof(struct acctmem); + hdr2->ah_size += sizeof(struct acctmem); + len += sizeof(struct acctmem); + } + /* Create the I/O record */ + if (ja_enabled || io_enabled) { + io = (struct acctio *)cb; + memset(io, 0, sizeof(struct acctio)); + hdr1->ah_flag |= AMORE; + hdr2->ah_type |= ACCT_IO; + hdr1 = &io->ac_hdr; + csa_header(hdr1, REV_IO, ACCT_KERNEL_IO, + sizeof(struct acctio) ); + + /* convert from ticks to microseconds */ + /* XXX when able to do kernel 64 bit divide, change type */ + PRINTK(KERN_INFO "CSA: block wait time %lu\n",(unsigned long int)p->bwtime); + io->ac_bwtime = CT_TO_USECS((unsigned long int)p->bwtime); + PRINTK(KERN_INFO "CSA: converted bwtime %lu\n",io->ac_bwtime); + + io->ac_bkr = p->rblk; + io->ac_bkw = p->wblk; + + /* raw wait time; currently not used */ + io->ac_rwtime = 0; + + io->ac_chr = p->rchar; + io->ac_chw = p->wchar; + io->ac_scr = p->syscr; + io->ac_scw = p->syscw; + + cb += sizeof(struct acctio); + hdr2->ah_size += sizeof(struct acctio); + len += sizeof(struct acctio); + } + + /* record always written to a user job accounting file */ + if ((len > 0) && (job_acctbuf.job_acctfile != (struct file *)NULL) ) { + csa_write((caddr_t)&acctent, ACCT_KERN_CSA, + len, jid, A_CJA, &job_acctbuf); + } + /* + * check the cpu time and virtual memory thresholds before writing + * this record to the system pacct file + */ + if ((acct_rcd[ACCT_THD_MEM-ACCT_RCDS][A_SYS].ac_state == ACS_ON) && + (ja_enabled || mem_enabled)) { + if (mem->ac_virt.himem < + acct_rcd[ACCT_THD_MEM-ACCT_RCDS][A_SYS].ac_param) { + /* don't write record to pacct */ + return; + } + } + if ((acct_rcd[ACCT_THD_TIME-ACCT_RCDS][A_SYS].ac_state == ACS_ON)) { + if ((csa->ac_utime + csa->ac_stime) < + acct_rcd[ACCT_THD_TIME-ACCT_RCDS][A_SYS].ac_param) { + /* don't write record to pacct */ + return; + } + } + + if ((len > 0) && (csa_acctvp != (struct file *)NULL) && csa_enabled ) { + if (io_enabled && mem_enabled) { + /* write out buffer as is to system pacct file */ + csa_write((caddr_t)&acctent, ACCT_KERN_CSA, + len, jid, A_SYS, &job_acctbuf); + } else { + /* only write out record types turned on */ + len = csa_modify_buf(modacctent, csa, mem, io, + io_enabled, mem_enabled); + csa_write((caddr_t)&modacctent, ACCT_KERN_CSA, + len, jid, A_SYS, &job_acctbuf); + } + } + return; +} + +/* + * Copy needed accounting records into buffer, skipping record + * types which are not enabled. May need to adjust downward + * the second header size if not both memory and io continuation + * records are written, plus adjust the second header types and + * first header flags. + */ +static int +csa_modify_buf(char *modacctent, struct acctcsa *csa, struct acctmem *mem, + struct acctio *io, int io_enabled, int mem_enabled) +{ + int size = 0; + int len = 0; + char *bufptr; + struct achead *hdr1, *hdr2; + + size = sizeof(struct acctcsa) + sizeof(struct acctmem) + + sizeof(struct acctio); + memset(modacctent, 0, size); + bufptr = modacctent; + /* + * adjust values that might not be correct anymore if all of + * the continuation records aren't written out to the pacct file + */ + hdr1 = &csa->ac_hdr1; + hdr2 = &csa->ac_hdr2; + hdr1->ah_flag &= ~AMORE; + hdr2->ah_type = ACCT_KERNEL_CSA; + hdr2->ah_size = 0; + if (mem_enabled) { + hdr1->ah_flag |= AMORE; + hdr2->ah_type |= ACCT_MEM; + hdr2->ah_size += sizeof(struct acctmem); + hdr1 = &mem->ac_hdr; + hdr1->ah_flag &= ~AMORE; + } + if (io_enabled) { + hdr1->ah_flag |= AMORE; + hdr2->ah_type |= ACCT_IO; + hdr2->ah_size += sizeof(struct acctio); + hdr1 = &io->ac_hdr; + hdr1->ah_flag &= ~AMORE; + } + memcpy(bufptr, csa, sizeof(struct acctcsa)); + bufptr += sizeof(struct acctcsa); + len += sizeof(struct acctcsa); + + if (mem_enabled) { + memcpy(bufptr, mem, sizeof(struct acctmem)); + len += sizeof(struct acctmem); + bufptr += sizeof(struct acctmem); + } + if(io_enabled) { + memcpy(bufptr, io, sizeof(struct acctio)); + len += sizeof(struct acctio); + } + + return len; +} + + +/* + * csa_ioctl + * + */ +static int +csa_ioctl( + struct inode *inode, + struct file *file, + unsigned int req, + unsigned long data) +{ + struct actctl actctl; + struct actstat actstat; + + int daemon = 0; + int error = 0; + int err = 0; + static int flag = 010000; + int ind; + int id; + int len; + int num; + + PRINTK(KERN_INFO "CSA: csa_ioctl\n"); + down(&csa_sem); + if (!csa_flag) { + csa_init_acct(flag++); + } + up(&csa_sem); + + if ((req < 0) || (req >= AC_MREQ) ) { + return -EINVAL; + } + + memset(&actctl, 0, sizeof(struct actctl)); + memset(&actstat, 0, sizeof(struct actstat)); + + switch (req) { + /* + * Start specified types of accounting. + */ + case AC_START: + { + int id, ind; + struct file *newvp; + + PRINTK(KERN_INFO "CSA: AC_START\n"); + if (!capable(CAP_SYS_PACCT) ) { + error = -EPERM; + break; + } + + if (copy_from_user(&actctl, (void*)data, sizeof(int)) ) { + error = -EFAULT; + break; + } + + num = (actctl.ac_sttnum == 0) ? 1 : actctl.ac_sttnum; + if ((num < 0) || (num > NUM_KDRCDS) ) { + error = -EINVAL; + break; + + } + + len = sizeof(struct actctl) - + sizeof(struct actstat) * NUM_KDRCDS + + sizeof(struct actstat) * num; + if (copy_from_user(&actctl, (void*)data, len)) { + error = -EFAULT; + break; + } + /* + * Verify all indexes in actstat structures specified. + */ + for(ind = 0; ind < num; ind++) { + id = actctl.ac_stat[ind].ac_ind; + if ((id < 0) || (id >= ACCT_MAXRCDS) ) { + error = -EINVAL; + break; + } + + if (id == ACCT_MAXKDS) { + error = -EINVAL; + break; + } + } + down(&csa_sem); + /* + * If an accounting file was specified, make sure + * that we can access it. + */ + if (strlen(actctl.ac_path) ) { + strncpy(new_path, actctl.ac_path, ACCT_PATH); + newvp = filp_open(new_path,O_WRONLY|O_APPEND, 0); + if (IS_ERR(newvp)) { + error = PTR_ERR(newvp); + up(&csa_sem); + break; + } else if (!S_ISREG(newvp->f_dentry->d_inode->i_mode)) { + error = -EACCES; + filp_close(newvp, NULL); + up(&csa_sem); + break; + } else if (!newvp->f_op->write) { + error = -EIO; + filp_close(newvp, NULL); + up(&csa_sem); + break; + } + if ((csa_acctvp != (struct file *)NULL) && + csa_acctvp == newvp) { + /* + * this file already being used, so ignore + * request to use this file; just continue on + */ + filp_close(newvp, NULL); + newvp = (struct file *)NULL; + } + + } else { + newvp = (struct file *)NULL; + } + /* + * If a new accounting file was specified and there's + * an old accounting file, stop writing to it. + */ + if (newvp != (struct file *)NULL) { + if (csa_acctvp != (struct file *)NULL) { + error = csa_config_write(AC_CONFCHG_FILE,NULL); + filp_close(csa_acctvp, NULL); + } else if (!csa_flag) { + csa_init_acct(flag++); + } + + strncpy(csa_path, new_path, ACCT_PATH); + down(&csa_write_sem); + csa_acctvp = newvp; + up(&csa_write_sem); + + } else { + if (csa_acctvp == (struct file *)NULL) { + error = -EINVAL; + up(&csa_sem); + break; + } + } + + /* + * Loop through each actstat block and turn ON that accounting. + */ + for(ind = 0; ind < num; ind++) { + struct actstat *stat; + + id = actctl.ac_stat[ind].ac_ind; + stat = &actctl.ac_stat[ind]; + if (id < ACCT_RCDS) { + acct_dmd[id][A_SYS].ac_state = ACS_ON; + acct_dmd[id][A_SYS].ac_param = stat->ac_param; + + stat->ac_state = acct_dmd[id][A_SYS].ac_state; + stat->ac_param = acct_dmd[id][A_SYS].ac_param; + } else { + int tid = id -ACCT_RCDS; + + acct_rcd[tid][A_SYS].ac_state = ACS_ON; + acct_rcd[tid][A_SYS].ac_param = stat->ac_param; + + stat->ac_state = acct_rcd[tid][A_SYS].ac_state; + stat->ac_param = acct_rcd[tid][A_SYS].ac_param; + } + } + + up(&csa_sem); + error = csa_config_write(AC_CONFCHG_ON, NULL); + /* + * Return the accounting states to the user. + */ + if (copy_to_user((void*)data, &actctl, len)) { + error = -EFAULT; + break; + } + } + break; + + /* + * Stop specified types of accounting. + */ + case AC_STOP: + { + int id, ind; + + PRINTK(KERN_INFO "CSA: AC_STOP\n"); + if (!capable(CAP_SYS_PACCT) ) { + error = -EPERM; + break; + } + + if (copy_from_user(&actctl, (void*)data, sizeof(int)) ) { + error = -EFAULT; + break; + } + + num = (actctl.ac_sttnum == 0) ? 1 : actctl.ac_sttnum; + if ((num <= 0) || (num > NUM_KDRCDS) ) { + error = -EINVAL; + break; + } + + len = sizeof(struct actctl) - + sizeof(struct actstat) * NUM_KDRCDS + + sizeof(struct actstat) * num; + if (copy_from_user(&actctl, (void*)data, len)) { + error = -EFAULT; + break; + } + + /* + * Verify all of the indexes in actstat structures specified. + */ + for(ind = 0; ind < num; ind++) { + id = actctl.ac_stat[ind].ac_ind; + if ((id < 0) || (id >= NUM_KDRCDS) ) { + error = -EINVAL; + break; + } + } + + /* + * Loop through each actstat block and turn off that accounting. + */ + down(&csa_sem); + /* + * Disable accounting for this entry. + */ + for(ind = 0; ind < num; ind++) { + id = actctl.ac_stat[ind].ac_ind; + if (id < ACCT_RCDS) { + acct_dmd[id][A_SYS].ac_state = ACS_OFF; + acct_dmd[id][A_SYS].ac_param = 0; + + actctl.ac_stat[ind].ac_state = + acct_dmd[id][A_SYS].ac_state; + actctl.ac_stat[ind].ac_param = 0; + } else { + int tid = id -ACCT_RCDS; + + acct_rcd[tid][A_SYS].ac_state = ACS_OFF; + acct_rcd[tid][A_SYS].ac_param = 0; + actctl.ac_stat[ind].ac_state = + acct_rcd[tid][A_SYS].ac_state; + actctl.ac_stat[ind].ac_param = + acct_rcd[tid][A_SYS].ac_param; + } + } /* end of for(ind) */ + /* + * Check the daemons to see if any are still on. + */ + for(ind = 0; ind < ACCT_MAXKDS; ind++) { + if (acct_dmd[ind][A_SYS].ac_state == ACS_ON) { + daemon += 1<<ind; + } + } + up(&csa_sem); + /* + * If all daemons are off and there's an old accounting file, + * stop writing to it. + */ + if (!daemon && (csa_acctvp != (struct file *)NULL) ) { + error = csa_config_write(AC_CONFCHG_OFF,NULL); + filp_close(csa_acctvp, NULL); + down(&csa_write_sem); + csa_acctvp = (struct file *)NULL; + up(&csa_write_sem); + } else { + error = csa_config_write(AC_CONFCHG_OFF, NULL); + } + /* + * Return the accounting states to the user. + */ + if (copy_to_user((void*)data, &actctl, len)) { + error = -EFAULT; + break; + } + } + break; + + /* + * Halt all accounting. + */ + case AC_HALT: + { + int ind; + + PRINTK(KERN_INFO "CSA: AC_HALT\n"); + if (!capable(CAP_SYS_PACCT) ) { + error = -EPERM; + break; + } + down(&csa_sem); + /* Turn off all accounting if any is on. */ + for(ind = 0; ind <ACCT_MAXKDS; ind++) { + acct_dmd[ind][A_SYS].ac_state = ACS_OFF; + acct_dmd[ind][A_SYS].ac_param = 0; + } + + for(ind = ACCT_RCDS; ind < ACCT_MAXRCDS; ind++) { + int tid = ind -ACCT_RCDS; + + acct_rcd[tid][A_SYS].ac_state = ACS_OFF; + acct_rcd[tid][A_SYS].ac_param = 0; + } + + up(&csa_sem); + /* If there's an old accounting file, stop writing to it. */ + if (csa_acctvp != (struct file *)NULL) { + error = csa_config_write(AC_CONFCHG_OFF,NULL); + filp_close(csa_acctvp, NULL); + down(&csa_write_sem); + csa_acctvp = (struct file *)NULL; + up(&csa_write_sem); + } + } + break; + + /* + * Process daemon/record status function. + */ + case AC_CHECK: + { + PRINTK(KERN_INFO "CSA: AC_CHECK\n"); + if (copy_from_user(&actstat, (void*)data, sizeof(struct actstat)) ) { + error = -EFAULT; + break; + } + id = actstat.ac_ind; + if ((id >= 0) && (id < ACCT_MAXKDS) ) { + actstat.ac_state = acct_dmd[id][A_SYS].ac_state; + actstat.ac_param = acct_dmd[id][A_SYS].ac_param; + + } else if ((id >= ACCT_RCDS) && (id < ACCT_MAXRCDS) ) { + int tid = id-ACCT_RCDS; + + actstat.ac_state = acct_rcd[tid][A_SYS].ac_state; + actstat.ac_param = acct_rcd[tid][A_SYS].ac_param; + + } else { + error = -EINVAL; + break; + } + if (copy_to_user((void*)data, &actstat, sizeof(struct actstat)) ) { + error = -EFAULT; + } + } + break; + + /* + * Process daemon status function. + */ + case AC_KDSTAT: + { + PRINTK(KERN_INFO "CSA: AC_KDSTAT\n"); + if (copy_from_user(&actctl, (void*)data, sizeof(int)) ) { + error = -EFAULT; + break; + } + + num = actctl.ac_sttnum; + + if (num <= 0) { + error = EINVAL; + break; + } else if (num > NUM_KDS) { + num = NUM_KDS; + } + for(ind = 0; ind < num; ind++) { + actctl.ac_stat[ind].ac_ind = + acct_dmd[ind][A_SYS].ac_ind; + actctl.ac_stat[ind].ac_state = + acct_dmd[ind][A_SYS].ac_state; + actctl.ac_stat[ind].ac_param = + acct_dmd[ind][A_SYS].ac_param; + } /* end of for(ind) */ + actctl.ac_sttnum = num; + strncpy(actctl.ac_path, csa_path, ACCT_PATH); + + len = sizeof(struct actctl) - + sizeof(struct actstat) * NUM_KDRCDS + + sizeof(struct actstat) * num; + if (copy_to_user((void*)data, &actctl, len)) { + error = -EFAULT; + break; + } + } + break; + + /* + * Process record status function. + */ + case AC_RCDSTAT: + { + PRINTK(KERN_INFO "CSA: AC_RCDSTAT\n"); + if (copy_from_user(&actctl, (void*)data, sizeof(int)) ) { + error = -EFAULT; + break; + } + num = actctl.ac_sttnum; + + if (num <= 0) { + error = -EINVAL; + break; + } else if (num > NUM_RCDS) { + num = NUM_RCDS; + } + for(ind = 0; ind < num; ind++) { + actctl.ac_stat[ind].ac_ind = + acct_rcd[ind][A_SYS].ac_ind; + actctl.ac_stat[ind].ac_state = + acct_rcd[ind][A_SYS].ac_state; + actctl.ac_stat[ind].ac_param = + acct_rcd[ind][A_SYS].ac_param; + } + actctl.ac_sttnum = num; + strncpy(actctl.ac_path, csa_path, ACCT_PATH); + len = sizeof(struct actctl) - + sizeof(struct actstat) * NUM_KDRCDS + + sizeof(struct actstat) * num; + if (copy_to_user((void*)data, &actctl, len)) { + error = -EFAULT; + break; + } + } + break; + + /* + * Turn user job accounting ON or OFF. + */ + case AC_JASTART: + case AC_JASTOP: + { + char localpath[ACCT_PATH]; + struct file *newvp = NULL; + struct file *oldvp; + uint64_t jid; + struct job_csa job_acctbuf; + int retval = 0; + + if (req == AC_JASTART) + PRINTK(KERN_INFO "CSA: AC_JASTART\n"); + else + PRINTK(KERN_INFO "CSA: AC_JASTOP\n"); + len = sizeof(struct actctl) - + sizeof(struct actstat) * (NUM_KDRCDS -1); + if (copy_from_user(&actctl, (void*)data, len)) { + error = -EFAULT; + break; + } + /* + * If an accounting file was specified, make sure + * that we can access it. + */ + if (strlen(actctl.ac_path)) { + strncpy(localpath, actctl.ac_path, ACCT_PATH); + newvp = filp_open(localpath,O_WRONLY|O_APPEND,0); + if (IS_ERR(newvp)) { + error = PTR_ERR(newvp); + break; + } else if (!S_ISREG(newvp->f_dentry->d_inode->i_mode)) { + error = -EACCES; + filp_close(newvp, NULL); + break; + } else if (!newvp->f_op->write) { + error = -EIO; + filp_close(newvp, NULL); + break; + } + } else if (req == AC_JASTART) { + error = -EINVAL; + break; + } + if (req == AC_JASTOP) { + newvp = (struct file *)NULL; + } + jid = job_getjid(current); + if (jid <= 0) { + /* no job table entry */ + error = -ENOENT; + break; + } + memset(&job_acctbuf, 0, sizeof(job_acctbuf)); + retval = job_getacct(jid, JOB_ACCT_CSA, &job_acctbuf); + if (retval != 0) { + /* couldn't get csa info in the job table entry */ + error = retval; + break; + } + /* Use this semaphore since csa_write() can also change this + * file pointer. + */ + down(&csa_write_sem); + if ((oldvp = job_acctbuf.job_acctfile) != (struct file *)NULL) { + /* Stop writing to the old job accounting file */ + filp_close(oldvp, NULL); + } + + /* Establish new job accounting file or stop job accounting */ + job_acctbuf.job_acctfile = newvp; + + retval = job_setacct(jid, JOB_ACCT_CSA, JOB_CSA_ACCTFILE, + &job_acctbuf); + if (retval != 0) { + /* couldn't set the new file name in the job entry */ + error = retval; + up(&csa_write_sem); + break; + } + up(&csa_write_sem); + /* Write a config record so ja has uname info */ + if (req == AC_JASTART) { + error = csa_config_write(AC_CONFCHG_ON, + job_acctbuf.job_acctfile); + } + } + break; + + /* + * Write an accounting record for a system daemon. + */ + case AC_WRACCT: + { + int len; + int retval = 0; + uint64_t jid; + struct job_csa job_acctbuf; + struct actwra actwra; + + PRINTK(KERN_INFO "CSA: AC_WRACCT\n"); + if (!capable(CAP_SYS_PACCT) ) { + error = -EPERM; + break; + } + if (copy_from_user(&actwra, (void*)data, sizeof(struct actwra))) { + error = -EFAULT; + break; + } + /* Verify the parameters. */ + jid = actwra.ac_jid; + if (jid < 0) { + error = -EINVAL; + break; + } + + id = actwra.ac_did; + if ((id < 0) || (id >= ACCT_MAXKDS) ) { + error = -EINVAL; + break; + } + + len = actwra.ac_len; + if ((len <= 0) || (len > MAX_WRACCT) ) { + error = -EINVAL; + break; + } + + if (actwra.ac_buf == (char *)NULL) { + error = -EINVAL; + break; + } + + /* If the daemon type is on, write out the daemon buffer. */ + if ((acct_dmd[id][A_SYS].ac_state == ACS_ON) && + (csa_acctvp != (struct file *)NULL) ) { + error = csa_write(actwra.ac_buf, id, len, + jid, A_DMD, NULL); + } + + /* get the job table entry for this jid */ + memset(&job_acctbuf, 0, sizeof(job_acctbuf)); + retval = job_getacct(jid, JOB_ACCT_CSA, &job_acctbuf); + if (retval != 0) { + /* couldn't get accounting info stored in job table */ + error = retval; + break; + } + + /* maybe write out daemon record to ja user accounting file */ + if (job_acctbuf.job_acctfile != NULL) { + error = csa_write(actwra.ac_buf, id, len, jid, A_CJA, + &job_acctbuf); + } + } + break; + + /* + * Return authorized state information. + */ + case AC_AUTH: + { + PRINTK(KERN_INFO "CSA: AC_AUTH\n"); + if (!capable(CAP_SYS_PACCT) ) { + error = -EPERM; + break; + } + /* + * Process user authorization request...If we get to this spot, + * the user is authorized. + */ + } + break; + + /* + * Process the incremental accounting request. + */ + case AC_INCACCT: + PRINTK(KERN_INFO "CSA: AC_INCACCT\n"); + error = -EINVAL; + break; + + default: + PRINTK(KERN_INFO "CSA: Unknown request %d\n", req); + error = -EINVAL; + + } /* end of switch(req) */ + + return(error ? error : err); +} + + +/* + * Create a configuration change accounting record. + */ +static void +csa_config_make(ac_eventtype event, struct acctcfg *cfg) +{ + int daemon = 0; + int record = 0; + int ind; + int nmsize = 0; + + memset(cfg, 0, sizeof(struct acctcfg)); + /* Setup the record and header. */ + csa_header(&cfg->ac_hdr, REV_CFG, ACCT_KERNEL_CFG, + sizeof(struct acctcfg) ); + cfg->ac_event = event; + if (!boottime) { + boottime = xtime.tv_sec - (jiffies / HZ); + } + cfg->ac_boottime = boottime; + cfg->ac_curtime = xtime.tv_sec; + + /* + * Create the masks of the types that are on. + */ + for(ind = 0; ind < ACCT_MAXKDS; ind++) { + if (acct_dmd[ind][A_SYS].ac_state == ACS_ON) { + daemon += 1<<ind; + } + } + for(ind = ACCT_RCDS; ind < ACCT_MAXRCDS; ind++) { + int tid = ind -ACCT_RCDS; + + if (acct_rcd[tid][A_SYS].ac_state == ACS_ON) { + record += 1<<tid; + } + } + cfg->ac_kdmask = daemon; + cfg->ac_rmask = record; + + nmsize = sizeof(cfg->ac_uname.sysname); + memcpy(cfg->ac_uname.sysname, system_utsname.sysname, nmsize-1); + cfg->ac_uname.sysname[nmsize-1] = '\0'; + nmsize = sizeof(cfg->ac_uname.nodename); + memcpy(cfg->ac_uname.nodename, system_utsname.nodename, nmsize-1); + cfg->ac_uname.nodename[nmsize-1] = '\0'; + nmsize = sizeof(cfg->ac_uname.release); + memcpy(cfg->ac_uname.release, system_utsname.release, nmsize-1); + cfg->ac_uname.release[nmsize-1] = '\0'; + nmsize = sizeof(cfg->ac_uname.version); + memcpy(cfg->ac_uname.version, system_utsname.version, nmsize-1); + cfg->ac_uname.version[nmsize-1] = '\0'; + nmsize = sizeof(cfg->ac_uname.machine); + memcpy(cfg->ac_uname.machine, system_utsname.machine, nmsize-1); + cfg->ac_uname.machine[nmsize-1] = '\0'; + + return; +} + + +/* + * Create and write a configuration change accounting record. + */ +static int +csa_config_write(ac_eventtype event, struct file *job_acctfile) +{ + int error = 0; /* errno */ + struct acctcfg acctcfg; + mm_segment_t fs; + + /* write record to process accounting file. */ + csa_config_make(event, &acctcfg); + + down(&csa_write_sem); + if (csa_acctvp != (struct file *)NULL) { + fs = get_fs(); + set_fs(KERNEL_DS); + error = csa_acctvp->f_op->write(csa_acctvp, (char *)&acctcfg, + sizeof(struct acctcfg), &csa_acctvp->f_pos); + set_fs(fs); + } + if (job_acctfile != (struct file *)NULL) { + fs = get_fs(); + set_fs(KERNEL_DS); + error = job_acctfile->f_op->write(job_acctfile,(char *)&acctcfg, + sizeof(struct acctcfg), &job_acctfile->f_pos); + set_fs(fs); + } + if (error >= 0) { + error = 0; + } + up(&csa_write_sem); + return(error); +} + + + +/* + * When first process in a job is created. + */ +int +csa_jstart(int event, void *data) +{ + struct job_csa *job_sojbuf = (struct job_csa *)data; + struct acctsoj acctsoj; /* start of job record */ + DBG_PRINTINIT(__FUNCTION__); + + DBG_PRINTENTRY(); + + /* Are we doing any accounting? */ + if (csa_acctvp == (struct file *)NULL) { + DBG_PRINTEXIT(0); + return 0; + } + + if (!job_sojbuf) { + /* bad pointer */ + printk(KERN_ERR + "csa_jstart: Received bad soj pointer, pid %d.\n", + current->pid); + DBG_PRINTEXIT(-1); + return -1; + } + + memset(&acctsoj, 0, sizeof(struct acctsoj)); + DBG_PRINTEXIT(__LINE__); + csa_header(&acctsoj.ac_hdr, REV_SOJ, ACCT_KERNEL_SOJ, + sizeof(struct acctsoj)); + DBG_PRINTEXIT(__LINE__); + acctsoj.ac_jid = job_sojbuf->job_id; + DBG_PRINTEXIT(__LINE__); + acctsoj.ac_uid = job_sojbuf->job_uid; + DBG_PRINTEXIT(__LINE__); + if (event == JOB_EVENT_START) { + DBG_PRINTEXIT(__LINE__); + acctsoj.ac_type = AC_SOJ; + acctsoj.ac_btime = CT_TO_SECS(job_sojbuf->job_start) + + (xtime.tv_sec - (jiffies / HZ) ); + } else if (event == JOB_EVENT_RESTART) { + DBG_PRINTEXIT(__LINE__); + acctsoj.ac_type = AC_ROJ; + acctsoj.ac_rstime = CT_TO_SECS(job_sojbuf->job_start) + + (xtime.tv_sec - (jiffies / HZ) ); + } else { + DBG_PRINTEXIT(__LINE__); + DBG_PRINTEXIT(-1); + return -1; + } + + /* + * Write the accounting record to the process accounting + * file if any accounting is enabled. + */ + DBG_PRINTEXIT(__LINE__); + if (csa_acctvp != (struct file *)NULL) { + DBG_PRINTEXIT(__LINE__); + (void)csa_write((caddr_t)&acctsoj, ACCT_KERN_CSA, + sizeof(acctsoj), job_sojbuf->job_id, A_SYS, job_sojbuf); + } + + DBG_PRINTEXIT(__LINE__); + DBG_PRINTEXIT(0); + return 0; +} + +/* + * When last process in a job is done, write an EOJ record + */ +int +csa_jexit(int event, void *data) +{ + struct achead *hdr1, *hdr2; + struct accteoj eoj; /* end of job record */ + struct job_csa *job_eojbuf = (struct job_csa *)data; + + /* Are we doing any accounting? */ + if (csa_acctvp == (struct file *)NULL) { + return 0; + } + + if (!job_eojbuf) { + /* bad pointer */ + printk(KERN_ERR + "csa_jexit: Received bad eoj pointer, pid %d.\n", + current->pid); + return -1; + } + + memset(&eoj, 0, sizeof(struct accteoj)); + + /* Set up record. */ + hdr1 = &eoj.ac_hdr1; + csa_header(hdr1, REV_EOJ, ACCT_KERNEL_EOJ, sizeof(struct accteoj) ); + hdr2 = &eoj.ac_hdr2; + csa_header(hdr2, REV_EOJ, ACCT_KERNEL_EOJ, 0 ); + hdr2->ah_magic = ~ACCT_MAGIC; + + eoj.ac_nice = task_nice(current); + eoj.ac_uid = job_eojbuf->job_uid; + eoj.ac_gid = current->gid; + + eoj.ac_jid = job_eojbuf->job_id; + + eoj.ac_btime = CT_TO_SECS(job_eojbuf->job_start) + + (xtime.tv_sec - (jiffies / HZ) ); + eoj.ac_etime = xtime.tv_sec; + + /* + * XXX Once we have real values in these two fields, convert them + * to Kbytes. + */ + eoj.ac_corehimem = job_eojbuf->job_corehimem; + eoj.ac_virthimem = job_eojbuf->job_virthimem; + + /* + * Write the accounting record to the process accounting + * file if job accounting is enabled. + */ + if (csa_acctvp != (struct file *)NULL) { + (void) csa_write((caddr_t)&eoj, ACCT_KERN_CSA, + sizeof(struct accteoj), job_eojbuf->job_id, A_SYS, + job_eojbuf); + } + + return 0; +} + +/* + * Write buf out to the accounting file. + * If an error occurs, return the error code to the caller + */ +int +csa_write(char *buf, int did, int nbyte, uint64_t jid, int type, + struct job_csa *jp) +{ + int error = 0; /* errno */ + int retval = 0; + struct file *vp; /* acct file */ + mm_segment_t fs; + unsigned long limit; + + down(&csa_write_sem); + /* Locate the accounting type. */ + switch (type) { + case A_SYS: + case A_DMD: + vp = csa_acctvp; + break; + + case A_CJA: + if (jp != (struct job_csa *)NULL) { + vp = jp->job_acctfile; + } else { + vp = (struct file *)NULL; + } + break; + + default: + up(&csa_write_sem); + return -EINVAL; + + } /* end of switch(type) */ + + /* Check if this type of accounting is turned on. */ + if (vp == (struct file *)NULL) { + up(&csa_write_sem); + return 0; + } + fs = get_fs(); + set_fs(KERNEL_DS); + + /* make sure we don't get hit by a process file size limit */ + limit = current->rlim[RLIMIT_FSIZE].rlim_cur; + current->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY; + error = vp->f_op->write(vp,buf, nbyte, &vp->f_pos); + current->rlim[RLIMIT_FSIZE].rlim_cur = limit; + + set_fs(fs); + if (error >= 0) { + error = 0; + } + /* If an error occurred, disable this type of accounting. */ + if (error) { + switch(type) { + + case A_SYS: + case A_DMD: + csa_acctvp = (struct file *)NULL; + acct_dmd[did][A_SYS].ac_state = ACS_ERROFF; + acct_dmd[ACCT_KERN_CSA][A_SYS].ac_state = ACS_ERROFF; + printk(KERN_ALERT + "csa accounting pacct write error %d; %s disabled\n", + error, acct_dmd_name[did]); + filp_close(vp, NULL); + break; + case A_CJA: + jp->job_acctfile = (struct file *)NULL; + retval = job_setacct(jid, JOB_ACCT_CSA, + JOB_CSA_ACCTFILE, jp); + printk(KERN_WARNING JID_ERR2, error, + (unsigned long long) jid); + if (retval != 0) { + printk(KERN_WARNING JID_ERR3, + (unsigned long long) jid); + } else { + printk(KERN_WARNING JID_ERR4, + (unsigned long long) jid); + } + filp_close(vp, NULL); + break; + } + up(&csa_write_sem); + return(error); + } + up(&csa_write_sem); + return(error); +} + + +module_init(init_csa); +module_exit(cleanup_csa); + +#endif /* defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE) */ Index: linux/kernel/exit.c =================================================================== --- linux.orig/kernel/exit.c +++ linux/kernel/exit.c @@ -24,7 +24,7 @@ #include <linux/proc_fs.h> #include <linux/mempolicy.h> #include <linux/pagg.h> - +#include <linux/csa_internal.h> #include <asm/uaccess.h> #include <asm/unistd.h> #include <asm/pgtable.h> @@ -32,6 +32,8 @@ extern void sem_exit (void); extern struct task_struct *child_reaper; +void (*do_csa_acct) (int, struct task_struct *) = NULL; +EXPORT_SYMBOL(do_csa_acct); int getrusage(struct task_struct *, int, struct rusage __user *); @@ -793,7 +795,12 @@ ptrace_notify((PTRACE_EVENT_EXIT << 8) | SIGTRAP); } + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); acct_process(code); + /* no-op if CONFIG_CSA not set */ + csa_acct(code, tsk); __exit_mm(tsk); exit_sem(tsk); Index: linux/kernel/fork.c =================================================================== --- linux.orig/kernel/fork.c +++ linux/kernel/fork.c @@ -37,7 +37,7 @@ #include <linux/audit.h> #include <linux/rmap.h> #include <linux/pagg.h> - +#include <linux/csa_internal.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> #include <asm/uaccess.h> @@ -576,6 +576,9 @@ if (retval) goto free_pt; + mm->hiwater_rss = mm->rss; + mm->hiwater_vm = mm->total_vm; + good_mm: tsk->mm = mm; tsk->active_mm = mm; @@ -964,6 +967,10 @@ p->utime = p->stime = 0; p->cutime = p->cstime = 0; + p->rchar = p->wchar = p->rblk = p->wblk = p->syscr = p->syscw = 0; + p->bwtime = 0; + /* no-op if CONFIG_CSA not set */ + csa_clear_integrals(p); p->lock_depth = -1; /* -1 = no lock */ p->start_time = get_jiffies_64(); p->security = NULL; Index: linux/mm/memory.c =================================================================== --- linux.orig/mm/memory.c +++ linux/mm/memory.c @@ -44,6 +44,7 @@ #include <linux/highmem.h> #include <linux/pagemap.h> #include <linux/rmap.h> +#include <linux/csa_internal.h> #include <linux/module.h> #include <linux/init.h> @@ -596,6 +597,8 @@ tlb = tlb_gather_mmu(mm, 0); unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted, details); tlb_finish_mmu(tlb, address, end); + /* no-op unless CONFIG_CSA is set */ + csa_update_integrals(); spin_unlock(&mm->page_table_lock); } @@ -1080,9 +1083,12 @@ spin_lock(&mm->page_table_lock); page_table = pte_offset_map(pmd, address); if (likely(pte_same(*page_table, pte))) { - if (PageReserved(old_page)) + if (PageReserved(old_page)) { ++mm->rss; - else + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); + } else page_remove_rmap(old_page); break_cow(vma, new_page, address, page_table); lru_cache_add_active(new_page); @@ -1355,6 +1361,10 @@ remove_exclusive_swap_page(page); mm->rss++; + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); + pte = mk_pte(page, vma->vm_page_prot); if (write_access && can_share_swap_page(page)) { pte = maybe_mkwrite(pte_mkdirty(pte), vma); @@ -1420,6 +1430,9 @@ goto out; } mm->rss++; + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); entry = maybe_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)), vma); @@ -1529,6 +1542,10 @@ if (pte_none(*page_table)) { if (!PageReserved(new_page)) ++mm->rss; + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); + flush_icache_page(vma, new_page); entry = mk_pte(new_page, vma->vm_page_prot); if (write_access) Index: linux/mm/mmap.c =================================================================== --- linux.orig/mm/mmap.c +++ linux/mm/mmap.c @@ -20,6 +20,7 @@ #include <linux/hugetlb.h> #include <linux/profile.h> #include <linux/module.h> +#include <linux/csa_internal.h> #include <linux/mount.h> #include <linux/mempolicy.h> #include <linux/rmap.h> @@ -970,6 +971,9 @@ pgoff, flags & MAP_NONBLOCK); down_write(&mm->mmap_sem); } + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); return addr; unmap_and_free_vma: @@ -1209,6 +1213,9 @@ vma->vm_mm->total_vm += grow; if (vma->vm_flags & VM_LOCKED) vma->vm_mm->locked_vm += grow; + /* no-op if CONFIG_CSA_JOB_ACCT not set */ + csa_update_integrals(); + update_mem_hiwater(); anon_vma_unlock(vma); return 0; } @@ -1670,6 +1677,9 @@ mm->locked_vm += len >> PAGE_SHIFT; make_pages_present(addr, addr + len); } + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); return addr; } Index: linux/mm/mremap.c =================================================================== --- linux.orig/mm/mremap.c +++ linux/mm/mremap.c @@ -16,6 +16,7 @@ #include <linux/fs.h> #include <linux/highmem.h> #include <linux/security.h> +#include <linux/csa_internal.h> #include <asm/uaccess.h> #include <asm/pgalloc.h> @@ -232,6 +233,10 @@ new_addr + new_len); } + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); + return new_addr; } @@ -364,6 +369,9 @@ make_pages_present(addr + old_len, addr + new_len); } + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); ret = addr; goto out; } Index: linux/mm/rmap.c =================================================================== --- linux.orig/mm/rmap.c +++ linux/mm/rmap.c @@ -32,6 +32,7 @@ #include <linux/swapops.h> #include <linux/slab.h> #include <linux/init.h> +#include <linux/csa_internal.h> #include <linux/rmap.h> #include <asm/tlbflush.h> @@ -510,6 +511,8 @@ mm->rss--; BUG_ON(!page->mapcount); page->mapcount--; + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); page_cache_release(page); out_unmap: @@ -609,6 +612,8 @@ page_remove_rmap(page); page_cache_release(page); + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); mm->rss--; (*mapcount)--; } Index: linux/mm/swapfile.c =================================================================== --- linux.orig/mm/swapfile.c +++ linux/mm/swapfile.c @@ -24,6 +24,7 @@ #include <linux/module.h> #include <linux/rmap.h> #include <linux/security.h> +#include <linux/csa_internal.h> #include <linux/backing-dev.h> #include <asm/pgtable.h> @@ -435,6 +436,9 @@ set_pte(dir, pte_mkold(mk_pte(page, vma->vm_page_prot))); page_add_anon_rmap(page, vma, address); swap_free(entry); + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); } /* vma->vm_mm->page_table_lock is held */ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Process Aggregates (PAGG) for 2.6.7 2004-06-24 18:08 [PATCH] Process Aggregates (PAGG) for 2.6.7 Erik Jacobson 2004-06-24 18:32 ` Limin Gu @ 2004-06-24 23:22 ` Peter Williams 2004-06-25 2:02 ` Erik Jacobson 1 sibling, 1 reply; 8+ messages in thread From: Peter Williams @ 2004-06-24 23:22 UTC (permalink / raw) To: Erik Jacobson; +Cc: linux-kernel, jlan, limin Erik Jacobson wrote: > Attached is a PAGG patch to kernel 2.6.7. > > The maintainers of two patches that make use of PAGG will post their patches > in to this discussion thread shortly. > > The biggest change in this patch from the last one I posted is that > Peter Williams supplied an implementation for the init function pointer > in the pagg hook. We kicked this around a few times to flush out > locking issues. I wish that you had included me in this discussion. Can you explain exactly what the locking issues with my code were? We might have been able to come up with a better solution than the one you have used which places unnecessary restrictions (i.e. no blocking) on the init() callbacks which weren't applicable in the code that I provided. Since these callbacks are highly likely to want to allocate dynamic memory there is always a chance that they will block and the no blocking restriction becomes an unnecessary burden. Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Process Aggregates (PAGG) for 2.6.7 2004-06-24 23:22 ` Peter Williams @ 2004-06-25 2:02 ` Erik Jacobson 0 siblings, 0 replies; 8+ messages in thread From: Erik Jacobson @ 2004-06-25 2:02 UTC (permalink / raw) To: Peter Williams; +Cc: linux-kernel, jlan, limin > I wish that you had included me in this discussion. Can you explain > exactly what the locking issues with my code were? We might have been I certainly didn't mean to offend you or not include you. It's easy enough to send a new patch if needed. I'll go try to find the whole history of the discussion and see if we left you out anywhere by accident. I'll keep this out of LKML unless prefer it here. This is totally correctable though. We'll just update the patch as necessary. Sorry for any trouble I may have caused or for anything I overlooked. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-06-25 2:02 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-06-24 18:08 [PATCH] Process Aggregates (PAGG) for 2.6.7 Erik Jacobson 2004-06-24 18:32 ` Limin Gu 2004-06-24 18:57 ` Chris Wright 2004-06-24 19:12 ` Limin Gu 2004-06-24 19:15 ` Chris Wright 2004-06-24 19:31 ` Jay Lan 2004-06-24 23:22 ` Peter Williams 2004-06-25 2:02 ` Erik Jacobson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.