Dynamic System Calls & System Call Hijacking

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* Dynamic System Calls & System Call Hijacking
@ 2004-04-20  9:07 Zoltan Menyhart
  2004-04-20  9:08 ` Zoltan Menyhart
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Zoltan Menyhart @ 2004-04-20  9:07 UTC (permalink / raw)
  To: linux-ia64, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3757 bytes --]

- Disappointed, 'cause they don't wanna take your brand new syscall into the
  kernel ?

  + No problem, I'll do it for you.

- Can't recompile the kernel, otherwise you gonna lose RedHat guarantee ?
  Or some ISVs like whose name starts with an "O" and terminates with "racle"
  ain't gonna support it ?

  + No problem, I'll load your syscall in a module.

- Got a syscall number conflict 'cause of an exotic patch slipped in before
  your one ?

  + No problem, I'll find a free syscall number for you dynamically.

- Wanna try your own version of a syscall without recompiling the kernel or
  rebooting it ?

  + No problem, I'll hijack the syscall for you.

- Fed up with the infinite number of different kernel configurations ?
  Can't follow any more what .config you've done for which of your clients ?

  + No problem, make a minimal kernel with almost nothing in it and load
    dynamically the syscalls actually needed.

My loadable kernel module "dyn_syscall.ko" provides for
registering / unregistering or hijacking / restoring system calls.

Sure, it's a loadable kernel module, who wants to modify the kernel ? :-)

My patch is against the version 2.6.4. As there is not much in the way of
direct dependency on the kernel, it should work with more recent versions, too.

Playing with the system call mechanism is very much architecture dependent.
Its key element is written in assembly.
I've got an IA64 version only.


How can it be used ?
--------------------

Assuming you've got a system call like "asmlinkage long sys_foo(...)" in a
loadable kernel module.
You can register it with an unused system call number:

        const char name[] = "foo";
        rc = dyn_syscall_reg(name, syscall_no, (dyn_syscall_t) sys_foo);

If "syscall_no" is zero, I'll find a free system call number for you.
(Do check the return code. On success, it's your system call number.)
Or you can register your system call over an existing one:

        rc = hijack_syscall(name, syscall_no, (dyn_syscall_t) sys_foo);

Having fully initialized your system call, you can make it available:

        rc = syscall_unlock(name, syscall_no);

This sequence is usually included in the "module_init(...)" function.

User applications can find out what your system call number is by consulting
"/proc/sys/kernel/dynamic_syscalls/foo" or
"/proc/sys/kernel/hijacked_syscalls/foo", respectively.

Having played enough with your system call, you can launch the module unload
procedure, without worrying about the "living calls" which may be "part way"
through your module:

        rc = prep_restore_syscall(name, syscall_no);

This function locks out further calls to the "syscall_no" (they will be refused
with the return code "-ENOSYS"). It returns "-EAGAIN" if there is still someone
inside your system call.
In this latter case you can wait until your last client leaves:

        while((rc = syscall_trylock(name, syscall)) == -EAGAIN)
                schedule();

If you have a blocking system call, then instead of busy waiting, wake up the
waiting tasks and go to sleep a bit in the mean time.
Finally, you can invoke:

        rc = dyn_syscall_unreg(name, syscall_no);

or

        rc = restore_syscall(name, syscall_no);

to remove completely your registered or hijacked system call, respectively.

This sequence is usually included in the "module_exit(...)" function.

The function prototypes are in "asm/dyn_syscall.h".

In order to configure this module, say "m" in:

        Processor type and features:
                Support for dynamic system calls

The patch & the demos arrive in the next mails.

Your remarks will be appreciated.

Zoltán Menyhárt

[-- Attachment #2: dyn_syscall_man.txt --]
[-- Type: text/plain, Size: 5285 bytes --]



NAME

	dyn_syscall_reg, hijack_syscall - Register a system call

SYNOPSIS

	#include <asm/dyn_syscall.h>

	int
	dyn_syscall_reg(const char *name,
			const unsigned int syscall_no,
			const dyn_syscall_t fp);
	int
	hijack_syscall(const char *name,
			const unsigned int syscall_no,
			const dyn_syscall_t fp);

DESCRIPTION

	"dyn_syscall_reg()" and "hijack_syscall()" are exported services
	available for loadable kernel modules.

	"dyn_syscall_reg()" registers a new, dynamic system call.
	If "syscall_no" is zero, then an otherwise unused system call number
	will be assigned.

	"hijack_syscall()" registers a system call which overloads an
	existing one.

	"name" points to a string that shall persist while the system call is
	alive.

	"syscall_no" should be in the range of
	[__NR_ni_syscall + 1... __NR_ni_syscall + NR_syscalls).

	"fp" refers to the new system call.
	For the IA64 architecture, the function descriptor "dyn_syscall_t"
	refers to a structure containing the program counter and the global
	pointer.

	User applications can find this system call number in
	"/proc/sys/kernel/dynamic_syscalls/<name>" or in
	"/proc/sys/kernel/hijacked_syscalls/<name>", respectively.
	On read, each of these files contains a 4 digit decimal number
	terminated with a '\n' character.

RETURN VALUE

	On success, the system call number accepted / assigned is returned.

	On error, the following codes may be returned:

	-ENOENT:	No more free system call is available -
			"dyn_syscall_reg()" only
	-EINVAL:	Illegal system call number - both
	-EBUSY:		System call is already in use - "dyn_syscall_reg()" only
	-ENOMEM:	Cannot create "/proc/..." - both

SEE ALSO

	syscall_unlock, prep_restore_syscall, syscall_trylock,
	dyn_syscall_unreg, restore_syscall


--------------------------------------------------------------------------------


NAME

	syscall_unlock, syscall_trylock - Unlock / try to lock a system call
	prep_restore_syscall - Prepare to unregister a system call

SYNOPSIS

	#include <asm/dyn_syscall.h>

	int
	syscall_unlock(const char *name,
			const unsigned int syscall_no);
	int
	syscall_trylock(const char *name,
			const unsigned int syscall_no);

	int
	prep_restore_syscall(const char *name,
			const unsigned int syscall_no);

DESCRIPTION

	"syscall_unlock()", "syscall_trylock()" and "prep_restore_syscall()"
	are exported services available for loadable kernel modules.

	Each system call is protected by a semaphore.

	When a new system call is added, it is locked for write.
	Regular system call invocation tries to take the semaphore for read.
	Unless it is "syscall_unlock()"-ed, any attempt to use the system call
         will be refused and "-ENOSYS" will be reported.

	Before undoing a system call registration, it is necessary to lock out
	any further invocation of the system call by re-locking it for write.
	(They will be refused by returning "-ENOSYS".)
	Apart from some small administration task, "prep_restore_syscall()"
	attempts to do it. If it fails (indicated by "-EAGAIN" returned), then
	there is at least one "living call" which may be "part way" through
	the system call code.

	"syscall_trylock()" should be invoked repeatedly while it returns
	"-EAGAIN". In order not to over penalise other tasks, "schedule()"
	should be invoked at each iteration. If the system call is blocking,
         i.e. there can be tasks sleeping inside the system call, then they have
         to be woke up. In such a case, it is recommended to sleep a bit
         between two iterations of "syscall_trylock()".

	"name" should be the same as that was used during the registration.

	"syscall_no" should be in the range of
	[__NR_ni_syscall + 1... __NR_ni_syscall + NR_syscalls).

RETURN VALUE

	On success, zero is returned.

	"syscall_trylock()" and "prep_restore_syscall()" return "-EAGAIN" if
         they have failed to take the semaphore for write.
	
	On error, the following codes can be returned:

	-EBADF:		Name or system call number does not match the parameters
			which was used during the system call registration
	-EINVAL:	Illegal system call number

SEE ALSO

	dyn_syscall_reg, hijack_syscall, dyn_syscall_unreg, restore_syscall


--------------------------------------------------------------------------------


NAME

	dyn_syscall_unreg, restore_syscall - Unregister a system call

SYNOPSIS

	#include <asm/dyn_syscall.h>

	int
	dyn_syscall_unreg(const char *name,
			const unsigned int syscall_no);
	int
	restore_syscall(const char *name,
			const unsigned int syscall_no);

DESCRIPTION

	"dyn_syscall_unreg()" and "restore_syscall()" are exported services
	available for loadable kernel modules.

	"dyn_syscall_unreg()" unregisters a dynamic system call.

	"restore_syscall()" restores a hijacked system call.

	"name" should be the same as that was used during the registration.

	"syscall_no" should be in the range of
	[__NR_ni_syscall + 1... __NR_ni_syscall + NR_syscalls).

RETURN VALUE

	On success, zero is returned.

	On error, the following codes can be returned:

	-EBADF:		Name or system call number does not match the parameters
			which was used during the system call registration
	-EINVAL:	Illegal system call number

SEE ALSO

	dyn_syscall_reg, hijack_syscall,
	syscall_unlock, syscall_trylock,  prep_restore_syscall



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Dynamic System Calls & System Call Hijacking
  2004-04-20  9:07 Dynamic System Calls & System Call Hijacking Zoltan Menyhart
@ 2004-04-20  9:08 ` Zoltan Menyhart
  2004-04-20  9:09 ` Dynamic System Calls & System Call Hijacking - demo syscall Zoltan Menyhart
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Zoltan Menyhart @ 2004-04-20  9:08 UTC (permalink / raw)
  To: linux-kernel, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]

 

[-- Attachment #2: dyn_syscall-2004-apr-19 --]
[-- Type: text/plain, Size: 47340 bytes --]

diff -ruN 2.6.4.ref/arch/ia64/Kconfig 2.6.4.mig2-tmp/arch/ia64/Kconfig
--- 2.6.4.ref/arch/ia64/Kconfig	Tue Mar 16 13:36:30 2004
+++ 2.6.4.mig2-tmp/arch/ia64/Kconfig	Mon Apr 19 10:41:55 2004
@@ -218,6 +218,14 @@
 	  Access).  This option is for configuring high-end multiprocessor
 	  server systems.  If in doubt, say N.
 
+config DYN_SYSCALL
+        tristate "Support for dynamic system calls"
+	default m
+	help
+	  Say m if you want a module supporting to register / unregister or
+	  to hijack / restore system calls.
+	  This stuff is not foreseen to run inside the kernel.
+
 config VIRTUAL_MEM_MAP
 	bool "Virtual mem map"
 	default y if !IA64_HP_SIM
diff -ruN 2.6.4.ref/arch/ia64/kernel/Makefile 2.6.4.mig2-tmp/arch/ia64/kernel/Makefile
--- 2.6.4.ref/arch/ia64/kernel/Makefile	Tue Mar 16 13:36:30 2004
+++ 2.6.4.mig2-tmp/arch/ia64/kernel/Makefile	Mon Apr 19 10:14:14 2004
@@ -18,8 +18,11 @@
 obj-$(CONFIG_MODULES)		+= module.o
 obj-$(CONFIG_SMP)		+= smp.o smpboot.o
 obj-$(CONFIG_PERFMON)		+= perfmon_default_smpl.o
+obj-$(CONFIG_DYN_SYSCALL)	+= dyn_syscall.o
 obj-$(CONFIG_IA64_CYCLONE)	+= cyclone.o
 
+dyn_syscall-objs := dyn_syscall_asm.o dyn_syscall_main.o
+
 # The gate DSO image is built using a special linker script.
 targets += gate.so gate-syms.o
 
diff -ruN 2.6.4.ref/arch/ia64/kernel/dyn_syscall_asm.S 2.6.4.mig2-tmp/arch/ia64/kernel/dyn_syscall_asm.S
--- 2.6.4.ref/arch/ia64/kernel/dyn_syscall_asm.S	Thu Jan  1 01:00:00 1970
+++ 2.6.4.mig2-tmp/arch/ia64/kernel/dyn_syscall_asm.S	Mon Apr 19 10:14:19 2004
@@ -0,0 +1,252 @@
+/*
+ * Dynamic System Calls & System Call Hijacking
+ * ============================================
+ *
+ * Version 0.1, 19th of April 2004
+ * By Zoltan Menyhart, Bull S.A. <Zoltan.Menyhart@bull.net>
+ * The usual GPL applies.
+ *
+ * See also "Documentation/dyn_syscall.txt".
+ */
+ 
+
+#include <asm/asmmacro.h>
+#include <asm/unistd.h>
+#define	_SOME_PRIVATE_DEFS_
+#include <asm/dyn_syscall.h>
+
+
+	.text
+	.align		32
+
+
+/*
+ * This is the link table for the dynamic / hijacked system calls:
+ *
+ *	struct {
+ *		<link code>;
+ *	} x_module_link[NR_syscalls];
+ *
+ * For a dynamic / hijacked system call, "sys_call_table[i]" is modified to
+ * point at "x_module_link[i]", where "i = <syscall number> - __NR_ni_syscall".
+ *
+ * Each "x_module_link[i].<link code>" puts "i * sizeof(assembler's long)" into
+ * "R2" and jumps to the common link routine.
+ */
+x_module_link:
+	.global		x_module_link
+	.set		tmp, 0
+	.rept		NR_syscalls
+	mov		r2 = tmp
+	br.sptk.few	common_link
+	;;
+	.set		tmp, tmp + 4		// sizeof(assembler's long)
+	.endr
+x_module_ln_end:
+	.global		x_module_ln_end
+
+
+/*
+ * This is the return linkage table for the dynamic / hijacked system calls:
+ *
+ *	struct {
+ *		<link code>;
+ *	} x_module_ret[NR_syscalls];
+ *
+ * A system call is invoked with "B0" pointing at "x_module_ret[i].<link code>",
+ *  where "i = <syscall number> - __NR_ni_syscall".
+ *
+ * Each "x_module_ret[i].<link code>" puts "i * sizeof(assembler's long)" into
+ * "R2" and jumps to the common return linkage routine.
+ */
+x_module_ret:
+	.set		tmp, 0
+	.rept		NR_syscalls
+	mov		r2 = tmp
+	br.sptk.few	common_ret
+	;;
+	.set		tmp, tmp + 4		// sizeof(assembler's long)
+	.endr
+
+
+/*
+ * Common link routine for the dynamic / hijacked system calls.
+ *
+ * Save "B0" in "x_module_b0_tab[i]" and jump at the function pointed at
+ * by "x_module_fp_tab[i]" if "x_module_sem_tab[i]" can be taken.
+ *
+ * Input:	R2:	(System call number - __NR_ni_syscall) *
+ *							sizeof(assembler's long)
+ * Output:	B0:	-> "x_module_ret[i].<link code>"
+ *
+ * Pseudo code:
+ *
+ *	int	tmp = x_module_sem_tab[i];
+ *
+ *	if (!(_SEM_WRITE_ & tmp))
+ *		if (cmpxchg_acq(&x_module_sem_tab[i], tmp, tmp + _SEM_RD_DELTA_)
+ *									== tmp){
+ *			(* x_module_fp_tab[i])(args, ...);
+ *			goto x_module_ret[i];
+ *		}
+ *	goto sys_ni_syscall;
+ */
+
+	.set		fp_tab_off, x_module_fp_tab - common_link
+	.set		b0_tab_off, x_module_b0_tab - common_link
+	.set		ret_off, x_module_ret - common_link
+	.set		sem_off, x_module_sem_tab - common_link
+	.set		sys_ni_off, x_module_sys_ni - common_link
+
+common_link:
+	mov		r15 = ip
+        movl		r14 = _SEM_WRITE_
+	mov		r8 = b0
+	;;
+	shladd		r20 = r2, 1, r15
+	shladd		r3 = r2, 2, r15
+	add		r18 = r2, r15
+	;;
+	add		r20 = b0_tab_off, r20	// -> x_module_b0_tab[i]
+	add		r17 = fp_tab_off, r3	// -> x_module_fp_tab[i].IP
+	add		r2 = fp_tab_off + 8, r3	// -> x_module_fp_tab[i].GP
+	add		r16 = ret_off, r3	// -> x_module_ret[i]
+	add		r18 = sem_off, r18	// -> x_module_sem_tab[i]
+	;;
+	st8		[r20] = r8		// Save old B0
+	ld8		r17 = [r17]		// New IP
+	mov		b0 = r16
+	ld4		r3 = [r18]		// Old x_module_sem_tab[i] value
+	;;
+	zxt4		r20 = r3
+	and		r14 = r3, r14		// if (!(_SEM_WRITE_ & tmp))
+	mov		b6 = r17
+	;;
+	cmp4.eq		p8, p9 = 0, r14
+	add		r17 = _SEM_RD_DELTA_, r20
+	mov		ar.ccv = r20
+	;;
+(p8)	cmpxchg4.acq	r3 = [r18], r17, ar.ccv
+	;;
+(p8)	cmp4.eq		p8, p9 = r3, r20
+	;;
+(p8)	ld8		r1 = [r2]
+(p8)	br.sptk.few	b6
+(p9)	add		r14 = sys_ni_off, r15
+	;;
+(p9)	ld8		r17 = [r14]		// -> sys_ni_syscall()
+	;;
+(p9)	mov		b6 = r17
+(p9)	br.sptk.few	b6
+
+
+/*
+ * Common return linkage routine for the dynamic / hijacked system calls.
+ *
+ * Restore "B0" from "x_module_b0_tab[i]", load the kernel "GP" and release
+ * "x_module_sem_tab[i]".
+ *
+ * Input:	R2:	(System call number - __NR_ni_syscall) *
+ *							sizeof(assembler's long)
+ *
+ * We are sure that "x_module_sem_tab[i]" is not taken and cannot be taken in
+ * the mean time, for write. However, "_SEM_WRITE_" can be OR-ed to the
+ * semaphore indicating that writer is waiting.
+ *
+ * Pseudo code:
+ *
+ *	int	tmp;
+ *
+ *	do {
+ *		tmp = x_module_sem_tab[i];
+ *	} while (cmpxchg_rel(&x_module_sem_tab[i], tmp, tmp - _SEM_RD_DELTA_)
+ *									!= tmp);
+ *	return;
+ */
+	.set		b0_tab_off_r, x_module_b0_tab - common_ret
+	.set		k_gp_off, x_module_k_gp - common_ret
+	.set		sem_off_r, x_module_sem_tab - common_ret
+
+common_ret:
+	mov		r15 = ip
+	;;
+	add		r16 = k_gp_off, r15	// -> kernel GP
+	shladd		r20 = r2, 1, r15
+	add		r18 = r2, r15
+	;;
+	add		r20 = b0_tab_off_r, r20	// -> x_module_b0_tab[i]
+	add		r18 = sem_off_r, r18	// -> x_module_sem_tab[i]
+	ld8		r1 = [r16]
+	;;
+	ld8		r20 = [r20]
+	;;
+1:	ld4		r3 = [r18]		// Old x_module_sem_tab[i] value
+	mov		b0 = r20
+	;;
+	zxt4		r3 = r3
+	;;
+	sub		r17 = _SEM_RD_DELTA_, r3
+	mov		ar.ccv = r3
+	;;
+	cmpxchg4.rel	r16 = [r18], r17, ar.ccv
+	;;
+	cmp4.eq		p8, p9 = r3, r16
+(p8)	br.sptk.few	b0
+(p9)	br.cond.dptk	1b
+	;;
+
+
+/*
+ * The GP of the kernel is saved here. Yes, in the text segment.
+ */
+x_module_k_gp:
+	.global		x_module_k_gp
+	.quad		0
+
+
+/*
+ * Address of "sys_ni_syscall()"
+ */
+x_module_sys_ni:
+	.global		x_module_sys_ni
+	.quad		0
+
+
+/*
+ * Pointers to the dynamic / hijacked system calls:
+ *
+ *	struct fdesc {
+ *		unsigned long	ip;
+ *		unsigned long	gp;
+ *	} x_module_fp_tab[NR_syscalls];
+ */
+x_module_fp_tab:
+	.global		x_module_fp_tab
+	.rept		NR_syscalls
+	.quad		0			// New IP
+	.quad		0			// New GP
+	.endr
+
+
+/*
+ * Table for saving the return addresses to the kernel:
+ *
+ *	unsigned long x_module_b0_tab[NR_syscalls];
+ */
+x_module_b0_tab:
+	.rept		NR_syscalls
+	.quad		0			// Old return address
+	.endr
+
+
+/*
+ * Semaphores:
+ *
+ *	x_mod_sem_t x_module_sem_tab[NR_syscalls];
+ */
+x_module_sem_tab:
+	.global		x_module_sem_tab
+	.rept		NR_syscalls
+	.long		_SEM_WRITE_		// Locked for write
+	.endr
+
diff -ruN 2.6.4.ref/arch/ia64/kernel/dyn_syscall_main.c 2.6.4.mig2-tmp/arch/ia64/kernel/dyn_syscall_main.c
--- 2.6.4.ref/arch/ia64/kernel/dyn_syscall_main.c	Thu Jan  1 01:00:00 1970
+++ 2.6.4.mig2-tmp/arch/ia64/kernel/dyn_syscall_main.c	Mon Apr 19 13:17:30 2004
@@ -0,0 +1,903 @@
+#define	_TEST_
+
+
+/*
+ * Dynamic System Calls & System Call Hijacking
+ * ============================================
+ *
+ * This loadable kernel module "dyn_syscall.ko" is a wrapper module that provides
+ * for registering / unregistering or hijacking / restoring system calls.
+ *
+ * This wrapper module includes a shadow system call table that is spitted between
+ * "dyn_syscall_main.c" and in "dyn_syscall_asm.S", in order to facilitate assembly
+ * programming :-)
+ *
+ * The shadow system call table consists of:
+ *
+ *	"sh_syscall[NR_syscalls]" in "dyn_syscall_main.c":
+ *
+ *		- The name of the system call
+ *		- The saved entry from "sys_call_table"
+ *		- A pointer to "sys/kernel/dynamic_syscalls" or to
+ *		  to "sys/kernel/hijacked_syscalls" directory in the "/proc"
+ *		  file system
+ *		- A pointer to "sys/kernel/dynamic_syscalls/<name>" or to
+ *		  to "sys/kernel/hijacked_syscalls/<name>" entry in the "/proc"
+ *		  file system
+ *
+ *	in "dyn_syscall_asm.S":
+ *
+ *		- "x_module_sem_tab[]": table of the semaphores, see the man
+ *		  page of "syscall_unlock()" and "syscall_trylock()"
+ *		- "x_module_fp_tab[]": table of the function descriptors of the
+ *		  new system calls
+ *		- "x_module_b0_tab[]": room to save the return address to the
+ *		  kernel (from the register "B0")
+ *		- "x_module_link[]": contains linkage code used to invoke the
+ *		  new system calls
+ *		- "x_module_ret[]": contains linkage code used to return from
+ *		  the new system calls to the kernel
+ *
+ * Some notes about the synchronization strategy:
+ *
+ * - Dynamically assigned and hijacked system call entries form two distinct sets.
+ *   + For dynamic system call assignment:
+ *     * Atomically check & decrement "free_entries"
+ *     * If a specific system call number is requested, then reserve the
+ *       corresponding "sh_syscall[]" entry by use of a compare & swap
+ *       atomic operation
+ *     * Otherwise select a free entry in "sh_syscall[]" by use of a
+ *       compare & swap atomic operation
+ *   + For system call hijacking:
+ *     * Reserve the corresponding entry in "sh_syscall[]" by use of a
+ *       compare & swap atomic operation
+ *     * No nested hijacking
+ *
+ * - First the selected entry in "sh_syscall[i]" is prepared, including
+ *   "x_module_fp_tab[i]"
+ *
+ * - Then "sys_call_table[i]" is modified to point at the linkage code in
+ *   "x_module_link[i]"
+ *
+ * - Undo operations work in the reverse order
+ *
+ * Note that "dyn_syscall.ko" can be unloaded but it is unsafe.
+ * On the other hand, unloading modules which have correctly unregistered their
+ * system calls is 100% safe.
+ *
+ * See also "Documentation/dyn_syscall.txt".
+ *
+ * 19th of April 2004
+ */
+
+
+#include <linux/module.h>
+#include <linux/pagemap.h>	/* For IA64_GRANULE_SIZE */
+#include <linux/proc_fs.h>
+#include <asm/unistd.h>
+#include <linux/syscalls.h>
+#define	_SOME_PRIVATE_DEFS_
+#include <asm/dyn_syscall.h>
+
+
+MODULE_DESCRIPTION("Dynamic System Call Support Module");
+MODULE_VERSION("0.1");
+MODULE_AUTHOR("Zoltan Menyhart, Bull S.A., <Zoltan.Menyhart@bull.net>");
+MODULE_LICENSE("GPL");
+
+
+#if	defined(_TEST_)
+#define	STATIC
+#define	INLINE
+#else
+#define	STATIC		static
+#define	INLINE		inline
+#endif
+
+#define	PRINT(args...)	printk(args)
+
+
+static const char	headline[] = "Dynamic System Call Support Module";
+static const char	ill_syscall_no[] = "Illegal syscall no.: %d\n";
+static const char	syscall_inuse[] = "Syscall %d in use\n";
+static const char	not_free[] = "Not a free syscall, no.: %d\n";
+static const char	not_yours[] = "Syscall #%d is not yours\n";
+static const char	not_locked[] = "Syscall %d not locked\n";
+static const char	kernel_syms[] = "/proc/kallsyms";
+static const char	cant_find[] = "Can't find %s\n";
+static const char	_sys_call_table[] = "sys_call_table";
+static const char	_sys_ni_syscall[] = "sys_ni_syscall";
+static const char	_kernel_gp[] = "__gp";
+
+
+/* "sys_call_table" entries should have been declared as ones of this type */
+typedef	unsigned long	entry_t;
+
+/* "sys_call_table[]" defined in "itv.S */
+entry_t			*sys_call_table_addr;
+
+/* Address of the "syscall not implemented" function - not a function pointer */
+entry_t			sys_ni_syscall_addr;
+
+
+static atomic_t			free_entries = ATOMIC_INIT(0);
+static char			dyn_scall_dir[] = "sys/kernel/dynamic_syscalls";
+static struct proc_dir_entry	*dyn_pde_p;
+static char			hijack_dir[] = "sys/kernel/hijacked_syscalls";
+static struct proc_dir_entry	*hi_pde_p;
+
+
+/*
+ * Decrement "var" only if the condition (e.g. "> 0") is met.
+ *
+ * Returns TRUE if the operation has been successfully carried out.
+ */
+#define atomic_check_and_dec(var, condition)				\
+({									\
+	__s32	___old;							\
+	int	___rc;							\
+									\
+	do {								\
+		___old = atomic_read(var);				\
+		if (!(___rc = (___old condition)))			\
+			break;						\
+	} while	(cmpxchg(var, ___old, ___old - 1) != ___old);		\
+	___rc;								\
+})
+
+
+/*
+ * Returns the *OLD* value as usually one would expect.
+ */
+#define	my_fetch_add64(delta, v)					\
+	ia64_fetchadd(delta, &atomic64_read(v), rel);
+
+
+/*
+ * Shadow system call table.
+ *
+ * In order to facilitate assembly programming, several structure members have
+ * been moved into "dyn_syscall_asm.S":
+ * - System call semafores
+ * - Pointers to the dynamic / hijacked system calls
+ * - Saved the return addresses to the kernel
+ *
+ * A comment says in the "ivt.S" file where "sys_call_table" is defined, that
+ * the very first element must be "sys_ni_syscall()" => we shall not
+ * use "sh_syscall[0]".
+ *
+ * Usage of "entry":
+ *	- 0 means not in use
+ *	- 1 means reserved (going to be used)
+ *	- original "sys_call_table" entry | 1 means preparing to undo
+ *	- Otherwise saves the original "sys_call_table" entry (not an odd value)
+ */
+typedef struct {
+	const char		*name;
+	atomic64_t		entry;		/* Saved from "sys_call_table" */
+	struct proc_dir_entry	*pdentry;
+	struct proc_dir_entry	*p_pdentry;	/* Parent of "pdentry" */
+} sh_syscall_t;
+static sh_syscall_t sh_syscall[NR_syscalls];
+
+
+/*
+ * System call semafores.
+ */
+typedef unsigned int		x_mod_sem_t;		/* 4 byte quantity */
+extern x_mod_sem_t		x_module_sem_tab[];
+
+
+/*
+ * Pointers to the dynamic / hijacked system calls:
+ *
+ *	fdesc_t x_module_fp_tab[NR_syscalls];
+ */
+extern fdesc_t			x_module_fp_tab[];
+
+
+/*
+ * The linkage tables in "dyn_syscall_asm.S" are something like:
+ *
+ * The link table for the dynamic / hijacked system calls:
+ *
+ *	struct {
+ *		<link code>;
+ *	} x_module_link[NR_syscalls];
+ *
+ * For a dynamic / hijacked system call, "sys_call_table[i]" is modified to
+ * point at "x_module_link[i]", where "i = <syscall number> - __NR_ni_syscall".
+ */
+extern char			x_module_link[],
+				x_module_ln_end[];
+/* & x_module_link[i] */
+unsigned int			x_module_link_entry_size;
+#define	X_MODULE_LINK(i)	(x_module_link + i * x_module_link_entry_size)
+
+
+extern unsigned long		x_module_k_gp;		/* Kernel GP */
+extern unsigned long		x_module_sys_ni;	/* -> sys_ni_syscall() */
+
+
+STATIC INLINE int
+gimme_a_syscall(void);
+
+STATIC void
+install_syscall(const unsigned int, const dyn_syscall_t);
+
+STATIC int
+dsc_read_func(char *page, char **start, off_t off, int count, int *eof,
+								void *data);
+
+STATIC int
+make_proc_entry(struct proc_dir_entry * const, const char * const,
+							const unsigned int);
+
+
+/*
+ * Unlock a system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ */
+int
+syscall_unlock(const char * const name, const unsigned int scall_no)
+{
+	const int	scn = scall_no - __NR_ni_syscall;
+
+	if (scn < 1 || scn >= NR_syscalls){
+		PRINT(ill_syscall_no, scall_no);
+		return -EINVAL;
+	}
+	if ((entry_t) atomic64_read(&sh_syscall[scn].entry) <= 1 ||
+		sys_call_table_addr[scn] != (entry_t) X_MODULE_LINK(scn) ||
+				strcmp(sh_syscall[scn].name, name) != 0){
+		PRINT(not_yours, scall_no);
+		return -EBADF;
+	}
+	if (x_module_sem_tab[scn] != _SEM_WRITE_){
+		PRINT(not_locked, scall_no);
+		return -ENOLCK;
+	}
+	PRINT("Unlocking syscall \"%s\": No = %d\n", sh_syscall[scn].name,
+							scn + __NR_ni_syscall);
+	x_module_sem_tab[scn] = _SEM_FREE_;
+	return 0;
+}
+
+EXPORT_SYMBOL(syscall_unlock);
+
+
+/*
+ * Internal version of system call trylock.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *		scn:		System call number - __NR_ni_syscall
+ *
+ * Returns:	-EAGAIN is returned if we've failed to take lock. Can be retried.
+ *		As usual, -Exxx in case of errors
+ */
+STATIC int
+intern_trylock(const char * const name, const unsigned int scn)
+{
+	x_mod_sem_t	tmp;
+
+	tmp = x_module_sem_tab[scn];
+	/* No problem OR-ing more than once "_SEM_WRITE_" */
+	if (cmpxchg_acq(&x_module_sem_tab[scn], tmp, tmp | _SEM_WRITE_) != tmp)
+		return -EAGAIN;
+	if ((tmp & _READER_MASK_) != _SEM_FREE_)
+		return -EAGAIN;
+	PRINT("Successfully locking syscall \"%s\": No = %d\n",
+				sh_syscall[scn].name, scn + __NR_ni_syscall);
+	return 0;
+}
+
+
+/*
+ * Try to lock a system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *
+ * Returns:	-EAGAIN is returned if we've failed to take lock. Can be retried.
+ *		As usual, -Exxx in case of errors
+ */
+int
+syscall_trylock(const char * const name, const unsigned int scall_no)
+{
+	const int	scn = scall_no - __NR_ni_syscall;
+	entry_t		addr = (entry_t) atomic64_read(&sh_syscall[scn].entry);
+
+	if (scn < 1 || scn >= NR_syscalls){
+		PRINT(ill_syscall_no, scall_no);
+		return -EINVAL;
+	}
+	if (addr < KERNEL_START || !(addr & 1) ||
+				sys_call_table_addr[scn] != addr - 1 ||
+				strcmp(sh_syscall[scn].name, name) != 0){
+		PRINT(not_yours, scall_no);
+		return -EBADF;
+	}
+	return intern_trylock(name, scn);
+}
+
+EXPORT_SYMBOL(syscall_trylock);
+
+
+/*
+ * Register a dynamic system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *				(should persist while the system call is alive)
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *				(if it is 0, then I'll choose a number for you)
+ *		fp:		-> new system call
+ *
+ * Returns:	The system call number accepted / assigned.
+ *		As usual, -Exxx in case of errors
+ *
+ * Note:	A comment says in the "ivt.S" file where "sys_call_table" is
+ *		defined, that the very first element must be
+ *		"sys_ni_syscall()".
+ */
+int
+dyn_syscall_reg(const char * const name, const unsigned int scall_no,
+							const dyn_syscall_t fp)
+{
+	int	scn;		/* System call number - __NR_ni_syscall */
+	int	rc;
+
+	if (!atomic_check_and_dec(&free_entries, > 0)){
+		PRINT("No more free syscall entry\n");
+		return -ENOENT;
+	}
+	mb();			/* Make sure the new "free_entries" is seen */
+	if (scall_no == 0){
+		scn = gimme_a_syscall();
+		/* "h_syscall[scn]" has been marked as in use */
+	} else {
+		scn = scall_no - __NR_ni_syscall;
+		if (scn < 1 || scn >= NR_syscalls){
+			atomic_add(1, &free_entries);
+			PRINT(ill_syscall_no, scall_no);
+			return -EINVAL;
+		}
+		/* Try to mark the entry as in use */
+		if (cmpxchg(&sh_syscall[scn].entry, 0, 1) != 0){
+			atomic_add(1, &free_entries);
+			PRINT(syscall_inuse, scall_no);
+			return -EBUSY;
+		}
+		if (sys_call_table_addr[scn] != sys_ni_syscall_addr){
+			atomic64_set(&sh_syscall[scn].entry, 0);
+			mb();
+			atomic_add(1, &free_entries);
+			PRINT(not_free, scall_no);
+			return -EBUSY;
+		}
+	}
+	/* Create "/proc/sys/kernel/dynamic_syscalls/<name>" */
+	if ((rc = make_proc_entry(dyn_pde_p, name, scn)) < 0){
+		atomic64_set(&sh_syscall[scn].entry, 0);
+		mb();
+		atomic_add(1, &free_entries);
+		return rc;
+	}
+	sh_syscall[scn].name = name;
+	install_syscall(scn, fp);
+//	MOD_INC_USE_COUNT;
+	return scn + __NR_ni_syscall;
+}
+
+EXPORT_SYMBOL(dyn_syscall_reg);
+
+
+/*
+ * Allocate a free ("sys_ni_syscall()") and mark it as in use.
+ *
+ * Returns:	A system call number - __NR_ni_syscall
+ *
+ * Note:	A comment says in the "ivt.S" file where "sys_call_table" is
+ *		defined, that the very first element must be
+ *		"sys_ni_syscall()" => we shall not use "sh_syscall[0]".
+ */
+STATIC INLINE int
+gimme_a_syscall(void)
+{
+	unsigned int 	i;
+
+	/* Most of the usable entries are at the high indices */
+	for (i = NR_syscalls - 1; i > 0; i--){
+		if (sys_call_table_addr[i] != sys_ni_syscall_addr)
+			continue;
+		/* Try to mark the entry as in use */
+		if (cmpxchg(&sh_syscall[i].entry, 0, 1) != 0)
+			continue;
+		return i;
+	}
+	panic("\nWe've lost the \"sys_ni_syscall()\"-s ???\n");
+}
+
+
+/*
+ * Do install a dynamic system call.
+ *
+ * Arguments:	scn:		System call number - __NR_ni_syscall
+ *		fp:		-> new system call
+ */
+STATIC void
+install_syscall(const unsigned int scn, const dyn_syscall_t fp)
+
+{
+	PRINT("Syscall \"%s\": No = %d IP = 0x%lx GP = 0x%lx\n",
+				sh_syscall[scn].name, scn + __NR_ni_syscall,
+				((fdesc_t *) fp)->ip, ((fdesc_t *) fp)->gp);
+	x_module_fp_tab[scn] = * (fdesc_t *) fp;
+	atomic64_set(&sh_syscall[scn].entry,
+			sys_call_table_addr[scn]);	/* Must not be 0 */
+	mb();		/* "sys_call_table_addr[scn] =" must be the last */
+	sys_call_table_addr[scn] = (entry_t) X_MODULE_LINK(scn);
+}
+
+
+/*
+ * Do prepare to uninstall a dynamic / hijacked system call.
+ *
+ * Arguments:	scn:		System call number - __NR_ni_syscall
+ */
+STATIC INLINE void
+prepare_to_uninstall_syscall(const unsigned int scn)
+{
+	PRINT("Original IP = 0x%lx\n", atomic64_read(&sh_syscall[scn].entry));
+	sys_call_table_addr[scn] = my_fetch_add64(1, &sh_syscall[scn].entry);
+	mb();			/* "sys_call_table_addr[scn] =" must be seen */
+}
+
+
+/*
+ * Do uninstall a dynamic / hijacked system call.
+ *
+ * Arguments:	scn:		System call number - __NR_ni_syscall
+ */
+STATIC INLINE void
+uninstall_syscall(const unsigned int scn)
+{
+	PRINT("Restoring syscall \"%s\": No = %d\n", sh_syscall[scn].name,
+							scn + __NR_ni_syscall);
+	sh_syscall[scn].name = NULL;
+	mb();	/* "atomic64_set(&sh_syscall[scn].entry, 0)" must be the last */
+	atomic64_set(&sh_syscall[scn].entry, 0);
+}
+
+
+/*
+ * Common "/proc" read function. Outputs the system call number.
+ *
+ * System call number - __NR_ni_syscall is stored in "->data".
+ */
+#define MIN(a,b)	((a) < (b) ? (a) : (b))
+STATIC int
+read_func(char *page, char **start, off_t off, int count, int *eof, void *data)
+{
+	char		buff[6];		/* For "1234\n\0" */
+	unsigned int	ch_count;
+
+	sprintf(buff, "%4d\n", ((int) (long) data) + __NR_ni_syscall);
+	if (off >= sizeof(buff) - 1){
+		*eof = 1;
+		return 0;
+	}
+	ch_count = MIN(count, sizeof(buff) - 1 - off);
+	memcpy(page + off, &buff[off], ch_count);
+	return ch_count;
+}
+
+
+/*
+ * Create "/proc/sys/kernel/.../<name>" showing the actual system call number.
+ *
+ * Arguments:	p_pde_p:	-> parent /proc directory entry
+ *		name:		-> system call name
+ *		scn:		System call number - __NR_ni_syscall
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ */
+STATIC int
+make_proc_entry(struct proc_dir_entry * const p_pde_p, const char * const name,
+							const unsigned int scn)
+{
+	struct proc_dir_entry	*pde_p;
+
+	if ((pde_p = create_proc_entry(name, S_IRUSR | S_IRGRP | S_IROTH,
+							p_pde_p)) == NULL){
+		PRINT("Cannot create /proc/sys/kernel/.../%s entry\n", name);
+		return -ENOMEM;
+	}
+	pde_p->read_proc = read_func;
+	pde_p->data = (void *) (long) scn;
+	pde_p->owner = THIS_MODULE;
+	sh_syscall[scn].pdentry = pde_p;
+	sh_syscall[scn].p_pdentry = p_pde_p;
+	return 0;
+}
+
+
+/*
+ * Hijack a system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *				(should persist while the system call is alive)
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *		fp:		-> new system call
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ *
+ * Note:	A comment says in the "ivt.S" file where "sys_call_table" is
+ *		defined, that the very first element must be
+ *		"sys_ni_syscall()".
+ */
+int
+hijack_syscall(const char * const name, const unsigned int scall_no,
+							const dyn_syscall_t fp)
+{
+	const int	scn = scall_no - __NR_ni_syscall;
+	int		rc;
+
+	if (scn < 1 || scn >= NR_syscalls){
+		PRINT(ill_syscall_no, scall_no);
+		return -EINVAL;
+	}
+	/* Try to mark the entry as in use */
+	if (cmpxchg(&sh_syscall[scn].entry, 0, 1) != 0){
+		PRINT(syscall_inuse, scall_no);
+		return -EBUSY;
+	}
+	if (sys_call_table_addr[scn] == sys_ni_syscall_addr){
+		PRINT("Syscall is \"ni\"\n");
+		atomic64_set(&sh_syscall[scn].entry, 0);
+		return -ENOENT;
+	}
+	/* Create "/proc/sys/kernel/hijacked_syscalls/<name>" */
+	if ((rc = make_proc_entry(hi_pde_p, name, scn)) < 0){
+		atomic64_set(&sh_syscall[scn].entry, 0);
+		return rc;
+	}
+	sh_syscall[scn].name = name;
+	install_syscall(scn, fp);
+//	MOD_INC_USE_COUNT;
+	return 0;
+}
+
+EXPORT_SYMBOL(hijack_syscall);
+
+
+/*
+ * Prepare to restore a previously dynamic / hijacked dynamic system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *
+ * Returns:	-EAGAIN is returned if we've failed to take lock. Can be retried.
+ *		As usual, -Exxx in case of errors
+ */
+int
+prep_restore_syscall(const char * const name, const unsigned int scall_no)
+{
+	const int	scn = scall_no - __NR_ni_syscall;
+
+	if (scn < 1 || scn >= NR_syscalls){
+		PRINT(ill_syscall_no, scall_no);
+		return -EINVAL;
+	}
+	if ((entry_t) atomic64_read(&sh_syscall[scn].entry) <= 1 ||
+		sys_call_table_addr[scn] != (entry_t) X_MODULE_LINK(scn) ||
+				strcmp(sh_syscall[scn].name, name) != 0){
+		PRINT(not_yours, scall_no);
+		return -EBADF;
+	}
+	PRINT("Preparing to restore syscall \"%s\": No = %d\n",
+						sh_syscall[scn].name, scall_no);
+	remove_proc_entry(name, sh_syscall[scn].p_pdentry);
+	sh_syscall[scn].pdentry = sh_syscall[scn].p_pdentry = NULL;
+	prepare_to_uninstall_syscall(scn);
+	return intern_trylock(name, scn);
+}
+
+EXPORT_SYMBOL(prep_restore_syscall);
+
+
+/*
+ * Finish restoring a previously hijacked dynamic system call.
+ * (Used by "dyn_syscall_unreg()", too.)
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ */
+int
+restore_syscall(const char * const name, const unsigned int scall_no)
+{
+	const int	scn = scall_no - __NR_ni_syscall;
+	entry_t		addr = (entry_t) atomic64_read(&sh_syscall[scn].entry);
+
+	if (scn < 1 || scn >= NR_syscalls){
+		PRINT(ill_syscall_no, scall_no);
+		return -EINVAL;
+	}
+	if (addr < KERNEL_START || !(addr & 1) ||
+				sys_call_table_addr[scn] != addr - 1 ||
+				strcmp(sh_syscall[scn].name, name) != 0){
+		PRINT(not_yours, scall_no);
+		return -EBADF;
+	}
+	if (x_module_sem_tab[scn] != _SEM_WRITE_){
+		PRINT(not_locked, scall_no);
+		return -ENOLCK;
+	}
+	uninstall_syscall(scn);
+//	MOD_DEC_USE_COUNT;
+	return 0;
+}
+
+EXPORT_SYMBOL(restore_syscall);
+
+
+/*
+ * Finish restoring a previously registered dynamic system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ */
+int
+dyn_syscall_unreg(const char * const name, const unsigned int scall_no)
+{
+	int	rc;
+
+	if (( rc = restore_syscall(name, scall_no)) == 0){
+		mb();	/* "atomic_add(1, &free_entries)" must be the last */
+		atomic_add(1, &free_entries);
+	}
+	return rc;
+}
+
+EXPORT_SYMBOL(dyn_syscall_unreg);
+
+
+/*
+ * Count the "free" entries in "sys_call_table".
+ *
+ * Returns:	The system call number accepted / assigned.
+ *		As usual, -Exxx in case of errors
+ *
+ * Note:	A comment says in the "ivt.S" file where "sys_call_table" is
+ *		defined, that the very first element must be
+ *		"sys_ni_syscall()".
+ */
+STATIC INLINE int
+count_free_syscalls(void)
+{
+	unsigned int	i;
+	entry_t		*p;
+
+	p = (entry_t *) sys_call_table_addr;
+	if (*p++ != sys_ni_syscall_addr){
+		PRINT("The 1st one must be sys_ni_syscall(), see ivt.S\n");
+		return -ENOENT;
+	}
+	for (i = 1; i < NR_syscalls; i++, p++)
+		if (*p == sys_ni_syscall_addr)
+			atomic_add(1, &free_entries);
+	PRINT("Number of free entries:\t%d\n", atomic_read(&free_entries));
+	if (atomic_read(&free_entries) < 1){
+		PRINT("Not enough free sys_call_table[] entries\n");
+		return -ENOENT;
+	}
+	return 0;
+}
+
+
+/*
+ * Set up the following "/proc" directories:
+ *	- "sys/kernel/dynamic_syscalls"
+ *	- "sys/kernel/hijacked_syscalls"
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ */
+STATIC INLINE int
+init_proc_entries(void)
+{
+	if ((dyn_pde_p = proc_mkdir(dyn_scall_dir, NULL)) == NULL){
+		PRINT("Cannot create /proc/%s directory\n", dyn_scall_dir);
+		return -ENOMEM;
+	}
+	if ((hi_pde_p = proc_mkdir(hijack_dir, NULL)) == NULL){
+		PRINT("Cannot create /proc/%s directory\n", hijack_dir);
+		remove_proc_entry(dyn_scall_dir, NULL);
+		return -ENOMEM;
+	}
+	dyn_pde_p->owner = THIS_MODULE;
+	hi_pde_p->owner = THIS_MODULE;
+	return 0;
+}
+
+
+#define	RD_BUF_SIZE	80
+
+
+/*
+ * Read the next line from the "/proc/kallsyms" file.
+ * Truncate the lines longer than the buffer size.
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ */
+STATIC INLINE int
+read_truncate_line(const int fd, char *buff)
+{
+	char	*p;
+	int	rc;
+
+	for (p = buff; p < &buff[RD_BUF_SIZE];){
+		if ((rc = sys_read(fd, p, 1)) < 0)
+			return rc;
+		if (rc == 0)
+			return -ENODATA;
+		if (*p++ == '\n')
+			break;
+	}
+	p--;
+	while (*p != '\n'){
+		if ((rc = sys_read(fd, p, 1)) < 0)
+			return rc;
+		if (rc == 0)
+			return -ENODATA;
+	}
+	*p = '\0';
+	return 0;
+}
+
+
+/*
+ * Check to see if the line contains any of the following symbols:
+ *	- address of "sys_ni_syscall()"
+ *	- address of "sys_call_table"
+ *	- the GP of the kernel "__gp"
+ *
+ * Returns:	TRUE if all the 3 symbols have already been found
+ */
+STATIC INLINE int
+check_line(char * const line)
+{
+	unsigned long	tmp;
+	char		*p, *q;
+
+	tmp = simple_strtoul(line, &p, 16);
+	for (p += 3, q = p; *p != '\0' && *p != '\t' && *p != ' '; p++);
+	*p = '\0';
+	if (strcmp(q, _sys_call_table) == 0){
+		sys_call_table_addr = (entry_t *) tmp;
+		PRINT("%s:\t\t0x%p\n", _sys_call_table, sys_call_table_addr);
+	} else if (strcmp(q, _sys_ni_syscall) == 0){
+		sys_ni_syscall_addr = tmp;
+		PRINT("%s():\t0x%lx\n", _sys_ni_syscall, sys_ni_syscall_addr);
+	} else if (strcmp(q, _kernel_gp) == 0){
+		x_module_k_gp = tmp;
+		PRINT("%s:\t\t\t0x%lx\n", _kernel_gp, x_module_k_gp);
+	}
+	return sys_call_table_addr != NULL &&
+			sys_ni_syscall_addr != 0 && x_module_k_gp != 0;
+}
+
+
+/*
+ * Pick up some kernel symbols from "/proc/kallsyms" which happen not be
+ * exported :-)
+ *	- address of "sys_ni_syscall()"
+ *	- address of "sys_call_table"
+ *	- the GP of the kernel "__gp"
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ */
+STATIC INLINE int
+get_kernel_syms(void)
+{
+	char		buf[RD_BUF_SIZE];
+	int		fd;
+	int		rc;
+	mm_segment_t	orig_address_limit = get_fs();
+	mm_segment_t	tmp_address_limit = KERNEL_DS;
+
+	set_fs(tmp_address_limit);		/* Make "sys_open()" happy */
+	if ((fd = sys_open(kernel_syms, O_RDONLY, 0)) < 0){
+		PRINT("Can't open %s, error code: %d\n", kernel_syms, fd);
+		set_fs(orig_address_limit);
+		return fd;
+	}
+	while ((rc = read_truncate_line(fd, buf)) == 0)
+		if (check_line(buf))
+			break;
+	sys_close(fd);
+	set_fs(orig_address_limit);
+	if (rc != 0 && rc != -ENODATA)
+		return rc;
+	if (sys_call_table_addr == NULL){
+		PRINT(cant_find, _sys_call_table);
+		return -ENOENT;
+	}
+	if (sys_ni_syscall_addr == 0){
+		PRINT(cant_find, _sys_ni_syscall);
+		return -ENOENT;
+	}
+	if (x_module_k_gp == 0){
+		PRINT(cant_find, _kernel_gp);
+		return -ENOENT;
+	}
+	return 0;
+}
+
+
+/*
+ * Acquire some kernel symbols which happen not be exported :-)
+ *
+ * Set up the following "/proc" directories:
+ *	- "sys/kernel/dynamic_syscalls"
+ *	- "sys/kernel/hijacked_syscalls"
+ */
+STATIC int __init
+init_dyn_syscall(void)
+{
+	int	rc;
+
+	PRINT("\n%s\n", headline);
+	if ((rc = get_kernel_syms()) < 0)
+		return rc;
+	if (sys_call_table_addr < (entry_t *) KERNEL_START ||
+			sys_call_table_addr >= (entry_t *) (KERNEL_START +
+			IA64_GRANULE_SIZE - NR_syscalls * sizeof(entry_t))){
+		PRINT("Illegal %s address\n", "sys_call_table");
+		return -EFAULT;
+	}
+	if (sys_ni_syscall_addr < KERNEL_START || sys_ni_syscall_addr >=
+						(entry_t) sys_call_table_addr){
+		PRINT("Illegal %s address\n", "sys_ni_syscall()");
+		return -EFAULT;
+	}
+	if ((rc = count_free_syscalls()) < 0)
+		return rc;
+	/* Needed for "#define	X_MODULE_LINK(i)" */
+	x_module_link_entry_size = (x_module_ln_end - x_module_link) /
+								NR_syscalls;
+	if (x_module_k_gp < KERNEL_START || x_module_k_gp >=
+					KERNEL_START + IA64_GRANULE_SIZE){
+		PRINT("Illegal kernel GP\n");
+		return -EFAULT;
+	}
+	x_module_sys_ni = sys_ni_syscall_addr;
+	return init_proc_entries();
+}
+
+
+STATIC void __exit
+exit_dyn_syscall(void)
+{
+	PRINT("\n%s getting unloaded\n", headline);
+	remove_proc_entry(dyn_scall_dir, NULL);
+	remove_proc_entry(hijack_dir, NULL);
+}
+
+
+module_init(init_dyn_syscall);
+module_exit(exit_dyn_syscall);
+
diff -ruN 2.6.4.ref/include/asm-ia64/dyn_syscall.h 2.6.4.mig2-tmp/include/asm-ia64/dyn_syscall.h
--- 2.6.4.ref/include/asm-ia64/dyn_syscall.h	Thu Jan  1 01:00:00 1970
+++ 2.6.4.mig2-tmp/include/asm-ia64/dyn_syscall.h	Mon Apr 19 11:00:15 2004
@@ -0,0 +1,151 @@
+/*
+ * Dynamic System Calls & System Call Hijacking
+ * ============================================
+ *
+ * Version 0.1, 19th of April 2004
+ * By Zoltan Menyhart, Bull S.A. <Zoltan.Menyhart@bull.net>
+ * The usual GPL applies.
+ *
+ * See also "Documentation/dyn_syscall.txt".
+ */
+
+
+#if	!defined(__ASSEMBLY__)
+
+
+#define	PROC_DYN_SYSCALL_DIR	"sys/kernel/dynamic_syscalls"
+#define	PROC_HIJCK_SYSCALL_DIR	"sys/kernel/hijacked_syscalls"
+
+
+typedef	long (* dyn_syscall_t)(const int, ...);
+
+
+/*
+ * Function pointer - why isn't it defined in an "official" .h file ?
+ */
+typedef struct fdesc {
+	unsigned long	ip;
+	unsigned long	gp;
+} fdesc_t;
+
+
+/*
+ * Register a dynamic system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *				(should persist while the system call is alive)
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *				(if it is 0, then I'll choose a number for you)
+ *		fp:		-> new system call
+ *
+ * Returns:	The system call number accepted / assigned.
+ *		As usual, -Exxx in case of errors
+ *
+ * Note:	A comment says in the "ivt.S" file where "sys_call_table" is
+ *		defined, that the very first element must be
+ *		"sys_ni_syscall()".
+ */
+extern int
+dyn_syscall_reg(const char * const name, const unsigned int scall_no,
+							const dyn_syscall_t fp);
+
+
+/*
+ * Hijack a system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *				(should persist while the system call is alive)
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *		fp:		-> new system call
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ *
+ * Note:	A comment says in the "ivt.S" file where "sys_call_table" is
+ *		defined, that the very first element must be
+ *		"sys_ni_syscall()".
+ */
+extern int
+hijack_syscall(const char * const name, const unsigned int scall_no,
+							const dyn_syscall_t fp);
+
+
+/*
+ * Prepare to restore a previously dynamic / hijacked dynamic system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ */
+extern int
+prep_restore_syscall(const char * const name, const unsigned int scall_no);
+
+
+/*
+ * Finish restoring a previously hijacked dynamic system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ */
+extern int
+restore_syscall(const char * const name, const unsigned int scall_no);
+
+
+/*
+ * Finish restoring a previously registered dynamic system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ */
+extern int
+dyn_syscall_unreg(const char * const name, const unsigned int scall_no);
+
+
+/*
+ * Unlock a system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *
+ * Returns:	As usual, -Exxx in case of errors
+ */
+extern int
+syscall_unlock(const char * const name, const unsigned int scall_no);
+
+
+/*
+ * Try to lock a system call.
+ *
+ * Arguments:	name:		-> unique ASCII string
+ *		scall_no:	System call number in the [__NR_ni_syscall + 1...
+ *				__NR_ni_syscall + NR_syscalls) range
+ *
+ * Returns:	-EAGAIN is returned if we've failed to take lock. Can be retried.
+ *		As usual, -Exxx in case of errors
+ */
+extern int
+syscall_trylock(const char * const name, const unsigned int scall_no);
+
+
+#endif	/* #if	!defined(__ASSEMBLY__) */
+
+
+#if	defined(_SOME_PRIVATE_DEFS_)
+
+#define	_SEM_WRITE_	0x80000000	/* Locked for write */
+#define	_READER_MASK_	0x7fffffff	/* Mask of the reader counter */
+#define	_SEM_FREE_	0		/* Unlocked */
+#define	_SEM_RD_DELTA_	1		/* Reades increment by one */
+
+#endif	/* #if	defined(_SOME_PRIVATE_DEFS_) */
+
diff -ruN 2.6.4.ref/Documentation/dyn_syscall.txt 2.6.4.mig2-tmp/Documentation/dyn_syscall.txt
--- 2.6.4.ref/D*/dyn*txt	Thu Jan  1 01:00:00 1970
+++ 2.6.4.mig2/Documentation/dyn_syscall.txt	Tue Apr 20 10:27:00 2004
@@ -0,0 +1,303 @@
+Dynamic System Calls & System Call Hijacking
+============================================
+
+Version 0.1, 19th of April 2004
+Zoltan Menyhart, Bull S.A., <Zoltan.Menyhart@bull.net>
+
+
+- Disappointed, 'cause they don't wanna take your brand new syscall into the
+  kernel ?
+
+  + No problem, I'll do it for you.
+
+- Can't recompile the kernel, otherwise you gonna lose RedHat guarantee ?
+  Or some ISVs like whose name starts with an "O" and terminates with "racle"
+  ain't gonna support it ?
+
+  + No problem, I'll load your syscall in a module.
+
+- Got a syscall number conflict 'cause of an exotic patch slipped in before
+  your one ?
+
+  + No problem, I'll find a free syscall number for you dynamically.
+
+- Wanna try your own version of a syscall without recompiling the kernel or
+  rebooting it ?
+
+  + No problem, I'll hijack the syscall for you.
+
+- Fed up with the infinite number of different kernel configurations ?
+  Can't follow any more what .config you've done for which of your clients ?
+
+  + No problem, make a minimal kernel with almost nothing in it and load
+    dynamically the syscalls actually needed. 
+
+My loadable kernel module "dyn_syscall.ko" provides for
+registering / unregistering or hijacking / restoring system calls.
+
+Sure, it's a loadable kernel module, who wants to modify the kernel ? :-)
+
+My patch is against the version 2.6.4. As there is not much in the way of
+direct dependency on the kernel, it should work with more recent versions, too.
+
+Playing with the system call mechanism is very much architecture dependent.
+Its key element is written in assembly.
+I've got an IA64 version only.
+
+
+How can it be used ?
+--------------------
+
+Assuming you've got a system call like "asmlinkage long sys_foo(...)" in a
+loadable kernel module. 
+You can register it with an unused system call number:
+
+	const char name[] = "foo";
+	rc = dyn_syscall_reg(name, syscall_no, (dyn_syscall_t) sys_foo);
+
+If "syscall_no" is zero, I'll find a free system call number for you.
+(Do check the return code. On success, it's your system call number.)
+Or you can register your system call over an existing one:
+
+	rc = hijack_syscall(name, syscall_no, (dyn_syscall_t) sys_foo);
+
+Having fully initialized your system call, you can make it available:
+
+	rc = syscall_unlock(name, syscall_no);
+
+This sequence is usually included in the "module_init(...)" function.
+
+User applications can find out what your system call number is by consulting
+"/proc/sys/kernel/dynamic_syscalls/foo" or
+"/proc/sys/kernel/hijacked_syscalls/foo", respectively.
+
+Having played enough with your system call, you can launch the module unload
+procedure, without worrying about the "living calls" which may be "part way"
+through your module:
+
+	rc = prep_restore_syscall(name, syscall_no);
+
+This function locks out further calls to the "syscall_no" (they will be refused
+with the return code "-ENOSYS"). It returns "-EAGAIN" if there is still someone
+inside your system call.
+In this latter case you can wait until your last client leaves:
+
+	while((rc = syscall_trylock(name, syscall)) == -EAGAIN)
+		schedule();
+
+If you have a blocking system call, then instead of busy waiting, wake up the
+waiting tasks and go to sleep a bit in the mean time.
+Finally, you can invoke:
+
+	rc = dyn_syscall_unreg(name, syscall_no);
+
+or
+
+	rc = restore_syscall(name, syscall_no);
+
+to remove completely your registered or hijacked system call, respectively.
+
+This sequence is usually included in the "module_exit(...)" function.
+
+The function prototypes are in "asm/dyn_syscall.h".
+
+In order to configure this module, say "m" in:
+
+        Processor type and features:
+                Support for dynamic system calls
+
+
+man pages:
+----------
+
+
+--------------------------------------------------------------------------------
+
+
+NAME
+
+	dyn_syscall_reg, hijack_syscall - Register a system call
+
+SYNOPSIS
+
+	#include <asm/dyn_syscall.h>
+
+	int
+	dyn_syscall_reg(const char *name,
+			const unsigned int syscall_no,
+			const dyn_syscall_t fp);
+	int
+	hijack_syscall(const char *name,
+			const unsigned int syscall_no,
+			const dyn_syscall_t fp);
+
+DESCRIPTION
+
+	"dyn_syscall_reg()" and "hijack_syscall()" are exported services
+	available for loadable kernel modules.
+
+	"dyn_syscall_reg()" registers a new, dynamic system call.
+	If "syscall_no" is zero, then an otherwise unused system call number
+	will be assigned.
+
+	"hijack_syscall()" registers a system call which overloads an
+	existing one.
+
+	"name" points to a string that shall persist while the system call is
+	alive.
+
+	"syscall_no" should be in the range of
+	[__NR_ni_syscall + 1... __NR_ni_syscall + NR_syscalls).
+
+	"fp" refers to the new system call.
+	For the IA64 architecture, the function descriptor "dyn_syscall_t"
+	refers to a structure containing the program counter and the global
+	pointer.
+
+	User applications can find this system call number in
+	"/proc/sys/kernel/dynamic_syscalls/<name>" or in
+	"/proc/sys/kernel/hijacked_syscalls/<name>", respectively.
+	On read, each of these files contains a 4 digit decimal number
+	terminated with a '\n' character.
+
+RETURN VALUE
+
+	On success, the system call number accepted / assigned is returned.
+
+	On error, the following codes may be returned:
+
+	-ENOENT:	No more free system call is available -
+			"dyn_syscall_reg()" only
+	-EINVAL:	Illegal system call number - both
+	-EBUSY:		System call is already in use - "dyn_syscall_reg()" only
+	-ENOMEM:	Cannot create "/proc/..." - both
+
+SEE ALSO
+
+	syscall_unlock, prep_restore_syscall, syscall_trylock,
+	dyn_syscall_unreg, restore_syscall
+
+
+--------------------------------------------------------------------------------
+
+
+NAME
+
+	syscall_unlock, syscall_trylock - Unlock / try to lock a system call
+	prep_restore_syscall - Prepare to unregister a system call
+
+SYNOPSIS
+
+	#include <asm/dyn_syscall.h>
+
+	int
+	syscall_unlock(const char *name,
+			const unsigned int syscall_no);
+	int
+	syscall_trylock(const char *name,
+			const unsigned int syscall_no);
+
+	int
+	prep_restore_syscall(const char *name,
+			const unsigned int syscall_no);
+
+DESCRIPTION
+
+	"syscall_unlock()", "syscall_trylock()" and "prep_restore_syscall()"
+	are exported services available for loadable kernel modules.
+
+	Each system call is protected by a semaphore.
+
+	When a new system call is added, it is locked for write.
+	Regular system call invocation tries to take the semaphore for read.
+	Unless it is "syscall_unlock()"-ed, any attempt to use the system call
+         will be refused and "-ENOSYS" will be reported.
+
+	Before undoing a system call registration, it is necessary to lock out
+	any further invocation of the system call by re-locking it for write.
+	(They will be refused by returning "-ENOSYS".)
+	Apart from some small administration task, "prep_restore_syscall()"
+	attempts to do it. If it fails (indicated by "-EAGAIN" returned), then
+	there is at least one "living call" which may be "part way" through
+	the system call code.
+
+	"syscall_trylock()" should be invoked repeatedly while it returns
+	"-EAGAIN". In order not to over penalise other tasks, "schedule()"
+	should be invoked at each iteration. If the system call is blocking,
+         i.e. there can be tasks sleeping inside the system call, then they have
+         to be woke up. In such a case, it is recommended to sleep a bit
+         between two iterations of "syscall_trylock()".
+
+	"name" should be the same as that was used during the registration.
+
+	"syscall_no" should be in the range of
+	[__NR_ni_syscall + 1... __NR_ni_syscall + NR_syscalls).
+
+RETURN VALUE
+
+	On success, zero is returned.
+
+	"syscall_trylock()" and "prep_restore_syscall()" return "-EAGAIN" if
+         they have failed to take the semaphore for write.
+	
+	On error, the following codes can be returned:
+
+	-EBADF:		Name or system call number does not match the parameters
+			which was used during the system call registration
+	-EINVAL:	Illegal system call number
+
+SEE ALSO
+
+	dyn_syscall_reg, hijack_syscall, dyn_syscall_unreg, restore_syscall
+
+
+--------------------------------------------------------------------------------
+
+
+NAME
+
+	dyn_syscall_unreg, restore_syscall - Unregister a system call
+
+SYNOPSIS
+
+	#include <asm/dyn_syscall.h>
+
+	int
+	dyn_syscall_unreg(const char *name,
+			const unsigned int syscall_no);
+	int
+	restore_syscall(const char *name,
+			const unsigned int syscall_no);
+
+DESCRIPTION
+
+	"dyn_syscall_unreg()" and "restore_syscall()" are exported services
+	available for loadable kernel modules.
+
+	"dyn_syscall_unreg()" unregisters a dynamic system call.
+
+	"restore_syscall()" restores a hijacked system call.
+
+	"name" should be the same as that was used during the registration.
+
+	"syscall_no" should be in the range of
+	[__NR_ni_syscall + 1... __NR_ni_syscall + NR_syscalls).
+
+RETURN VALUE
+
+	On success, zero is returned.
+
+	On error, the following codes can be returned:
+
+	-EBADF:		Name or system call number does not match the parameters
+			which was used during the system call registration
+	-EINVAL:	Illegal system call number
+
+SEE ALSO
+
+	dyn_syscall_reg, hijack_syscall,
+	syscall_unlock, syscall_trylock,  prep_restore_syscall
+
+
+--------------------------------------------------------------------------------
+

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Dynamic System Calls & System Call Hijacking - demo syscall
  2004-04-20  9:07 Dynamic System Calls & System Call Hijacking Zoltan Menyhart
  2004-04-20  9:08 ` Zoltan Menyhart
@ 2004-04-20  9:09 ` Zoltan Menyhart
  2004-04-20  9:09 ` Dynamic System Calls & System Call Hijacking - demo user program Zoltan Menyhart
  2004-04-20 19:40 ` Dynamic System Calls & System Call Hijacking Pavel Machek
  3 siblings, 0 replies; 6+ messages in thread
From: Zoltan Menyhart @ 2004-04-20  9:09 UTC (permalink / raw)
  To: linux-kernel, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]

 

[-- Attachment #2: foo.c --]
[-- Type: text/plain, Size: 1286 bytes --]

/*
 * Demo dynamic syscall
 */


#include <linux/module.h>
#include <asm/dyn_syscall.h>


const char name[] = "foo";


asmlinkage long
sys_foo(const int cmd, const caddr_t address, const size_t length,
					const int node, const pid_t pid)
{
	printk("\nsys_foo(%d, 0x%p, 0x%lx, %d, %d)\n",
					cmd, address, length, node, pid);
	return 0;
}


int syscall;


static int __init
init_foo(void)
{
	int	rc;

	printk("\nModule Foo\n");
	rc = dyn_syscall_reg(name, 0, (dyn_syscall_t) sys_foo);
	printk("dyn_syscall_reg() returned: %d\n", rc);
	if (rc < 0)
		return rc;
	syscall = rc;
	rc = syscall_unlock(name, syscall);
	if (rc < 0)
		panic("syscall_unlock() returned: %d\n", rc);
	return 0;
}


static void __exit
exit_foo(void)
{
	printk("\nModule Foo getting unloaded\n");
	int	rc;
	rc = prep_restore_syscall(name, syscall);
	if (rc < 0)
		panic("prep_restore_syscall() returned: %d\n", rc);
	while((rc = syscall_trylock(name, syscall)) == -EAGAIN){
		/*
		 * Having some blocking syscalls? Don't just busy wait,
		 * wake them up, sleep a bit in the mean time.
		 */
	}
	if (rc < 0)
		panic("syscall_trylock() returned: %d\n", rc);
	rc = dyn_syscall_unreg(name, syscall);
	if (rc < 0)
		panic("dyn_syscall_unreg() returned: %d\n", rc);
}


module_init(init_foo);
module_exit(exit_foo);


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Dynamic System Calls & System Call Hijacking - demo user program
  2004-04-20  9:07 Dynamic System Calls & System Call Hijacking Zoltan Menyhart
  2004-04-20  9:08 ` Zoltan Menyhart
  2004-04-20  9:09 ` Dynamic System Calls & System Call Hijacking - demo syscall Zoltan Menyhart
@ 2004-04-20  9:09 ` Zoltan Menyhart
  2004-04-20 19:40 ` Dynamic System Calls & System Call Hijacking Pavel Machek
  3 siblings, 0 replies; 6+ messages in thread
From: Zoltan Menyhart @ 2004-04-20  9:09 UTC (permalink / raw)
  To: linux-kernel, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]

 

[-- Attachment #2: test.c --]
[-- Type: text/plain, Size: 1329 bytes --]

#include <linux/sys.h>		/* For NR_syscalls */
#include <asm/unistd.h>		/* For __NR_ni_syscall */
#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <asm/fcntl.h>		/* For O_RDONLY */


#define	MY_SYSCALL	"/proc/sys/kernel/dynamic_syscalls/foo"

/*
 * Read out my actual system call number from "/proc/...".
 *
 * On error "-1" is returned and "errno" is set accordingly.
 */
static inline
get_my_syscall_no(void)
{
	int		fd;
	int		tmp;
	char		buff[5];		/* Should be enough :-) */

	if ((fd = open(MY_SYSCALL, O_RDONLY)) < 0){
		errno = ENOSYS;
		return -1;
	}
	tmp = read(fd, buff, sizeof buff - 1);
	close(fd);
	if (tmp != sizeof buff - 1){
		errno = ENOSYS;
		return -1;
	}
	buff[sizeof buff - 1] = '\0';
	tmp = atoi(buff);
	if (tmp < __NR_ni_syscall || tmp >= __NR_ni_syscall + NR_syscalls){
		errno = ENOSYS;
		return -1;
	}
	return tmp;
}


/*
 * Wrapper function for my system call.
 */
long
my_syscall(const int arg, const long arg2, const long arg3, const int arg4,
							const int arg5)
{
	static int	syscall_no = -1;

	if (syscall_no == -1)
		if ((syscall_no = get_my_syscall_no())== -1)
			return -1;
	return syscall(syscall_no, arg, arg2, arg3, arg4, arg5);
}


main()
{
	if (my_syscall(1, 0, 1, 0, 2) == -1)
		perror("my syscall");
	if (my_syscall(2, 3, 4, 5, 6) == -1)
		perror("my syscall");
}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dynamic System Calls & System Call Hijacking
  2004-04-20  9:07 Dynamic System Calls & System Call Hijacking Zoltan Menyhart
                   ` (2 preceding siblings ...)
  2004-04-20  9:09 ` Dynamic System Calls & System Call Hijacking - demo user program Zoltan Menyhart
@ 2004-04-20 19:40 ` Pavel Machek
  2004-04-23 11:48   ` Zoltan Menyhart
  3 siblings, 1 reply; 6+ messages in thread
From: Pavel Machek @ 2004-04-20 19:40 UTC (permalink / raw)
  To: Zoltan.Menyhart; +Cc: linux-ia64, linux-kernel

Hi!

> - Can't recompile the kernel, otherwise you gonna lose RedHat guarantee ?
>   Or some ISVs like whose name starts with an "O" and terminates with "racle"
>   ain't gonna support it ?
 
>   + No problem, I'll load your syscall in a module.

Well, by forcing syscall in, you loose your guarantee, too.
cat /dev/urandom > /dev/kmem

"RedHat, help, my machine crashed."


> Your remarks will be appreciated.

I hope it at least taints the kernel.

And you did test on smp kernel, trying to race syscall calling against
your module load/unload, right?

-- 
64 bytes from 195.113.31.123: icmp_seq( ttlQ timeD8769.1 ms         


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dynamic System Calls & System Call Hijacking
  2004-04-20 19:40 ` Dynamic System Calls & System Call Hijacking Pavel Machek
@ 2004-04-23 11:48   ` Zoltan Menyhart
  0 siblings, 0 replies; 6+ messages in thread
From: Zoltan Menyhart @ 2004-04-23 11:48 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-ia64, linux-kernel

Pavel Machek wrote:
> 
> Well, by forcing syscall in, you loose your guarantee, too.

Strictly speaking, you are right.

Let me give you an example how we are going to use this dynamic syscall feature:

Assuming a client of ours has a big application running on an "official" kernel.
We load our performance enhancement tool with my dynamic syscall stuff.
If the client observes better performance, then s/he loads this tool at each
re-boot.
Should the kernel crash, s/he does not load it and checks to see if the problem
happens...

Another example is using it as a development tool.
Our performance enhancement tool includes a syscall. It is 100 times quicker to
load it for testing as a module and not be obliged to recompile the kernel, re-
boot it.

> I hope it at least taints the kernel.

As this dynamic syscall feature is intended to be transparent, it does not do.
If someone wants to taint the kernel, it's just one line more of the code.

I checked: RedHat's AS 3.0 does not taint the kernel for 3rd party modules,
(it only does for the sii6512 software raid module).

Note that my patch is against 2.6.4. If you need to play with a 2.4.*, then
at least "kallsyms" should be changed onto "ksyms".

> And you did test on smp kernel, trying to race syscall calling against
> your module load/unload, right?

"dyn_syscall.ko" can be unloaded but it is unsafe.
Here is the window:

- A CPU picks up the address of my syscall link code from "sys_call_table"
  then it is pre-empted for a while
- Another CPU patches back the old address of "sys_ni_syscall" into
  "sys_call_table" and unloads "dyn_syscall.ko"
- The first CPU is back to jump at my link code in "dyn_syscall.ko"

On a client's machine, it is loaded once (e.g. at boot time).
You can try to unload it (as I did) during the development, you do not risk
much, but it is recommended to keep it loaded at the clients.

On the other hand, unloading modules which have correctly unregistered their
system calls is 100% safe.

I did test it on machine with 16 CPUs, but testing cannot prove that there is
no window. I'm going to summarize how the synchronization mechanism works.
There are two cases to consider:
- race among multiple syscall register / unregister operations
- race between unloading a syscall and its clients

Let's start with the first one.
My dynamic syscall feature includes a shadow system call table.
A table entry consists of:

- Name of the system call
- The saved syscall address from "sys_call_table" (atomic variable)
- A semaphore (initialized as if it were taken for write)
- Function descriptor of the new system call
- etc.

The synchronization mechanism is based on the atomic variable in each
entry of the shadow syscall table, that saves the old syscall address from
"sys_call_table":
- 0 means not in use
- 1 means reserved (going to be used)
- original "sys_call_table" entry | 1 means preparing to undo
- Otherwise saves the original "sys_call_table" entry (not an odd value)

For dynamic system call assignment:
- Atomically check & decrement number of the free syscall entries.

Dynamically assigned and hijacked system call entries form two distinct sets.
A dynamically assigned syscall cannot be hijacked. No nested hijacking.
(Therefore hijacking does not care for the number of the free syscall entries.)

For both the dynamically assigned and hijacked system calls:

- Reserve the corresponding shadow syscall table entry by use of a
  compare & swap atomic operation (see above)
- Do the other initialization and save the syscall address from "sys_call_table"
- Patch the address of my linkage code into the corresponding entry in
  "sys_call_table"
- Unlock the semaphore

- Undo operations work in the reverse order

Race between unloading a syscall and its clients:

- When a new system call is added, it is locked for write.
- Regular system call invocation tries to take the semaphore for read.
- Unless the semaphore is unlocked, any attempt to use the system call
  will be refused and "-ENOSYS" will be reported.

- Before undoing a system call registration, it is necessary to lock out
  any further invocation of the system call by re-locking it for write.
  If it fails, then there is at least one "living call" which may be "part way"
  through the system call code.
  "syscall_trylock()" should be invoked repeatedly while it returns "-EAGAIN".

I hope I have not missed anything :-)

Thanks,

Zoltán

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-04-23 11:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-20  9:07 Dynamic System Calls & System Call Hijacking Zoltan Menyhart
2004-04-20  9:08 ` Zoltan Menyhart
2004-04-20  9:09 ` Dynamic System Calls & System Call Hijacking - demo syscall Zoltan Menyhart
2004-04-20  9:09 ` Dynamic System Calls & System Call Hijacking - demo user program Zoltan Menyhart
2004-04-20 19:40 ` Dynamic System Calls & System Call Hijacking Pavel Machek
2004-04-23 11:48   ` Zoltan Menyhart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox