public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Dynamic System Calls & System Call Hijacking
@ 2004-04-20  9:07 Zoltan Menyhart
  2004-04-20 19:40 ` Pavel Machek
  0 siblings, 1 reply; 3+ messages in thread
From: Zoltan Menyhart @ 2004-04-20  9:07 UTC (permalink / raw)
  To: linux-ia64, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3652 bytes --]

- Disappointed, 'cause they don't wanna take your brand new syscall into the
  kernel ?

  + No problem, I'll do it for you.

- Can't recompile the kernel, otherwise you gonna lose RedHat guarantee ?
  Or some ISVs like whose name starts with an "O" and terminates with "racle"
  ain't gonna support it ?

  + No problem, I'll load your syscall in a module.

- Got a syscall number conflict 'cause of an exotic patch slipped in before
  your one ?

  + No problem, I'll find a free syscall number for you dynamically.

- Wanna try your own version of a syscall without recompiling the kernel or
  rebooting it ?

  + No problem, I'll hijack the syscall for you.

- Fed up with the infinite number of different kernel configurations ?
  Can't follow any more what .config you've done for which of your clients ?

  + No problem, make a minimal kernel with almost nothing in it and load
    dynamically the syscalls actually needed.

My loadable kernel module "dyn_syscall.ko" provides for
registering / unregistering or hijacking / restoring system calls.

Sure, it's a loadable kernel module, who wants to modify the kernel ? :-)

My patch is against the version 2.6.4. As there is not much in the way of
direct dependency on the kernel, it should work with more recent versions, too.

Playing with the system call mechanism is very much architecture dependent.
Its key element is written in assembly.
I've got an IA64 version only.


How can it be used ?
--------------------

Assuming you've got a system call like "asmlinkage long sys_foo(...)" in a
loadable kernel module.
You can register it with an unused system call number:

        const char name[] = "foo";
        rc = dyn_syscall_reg(name, syscall_no, (dyn_syscall_t) sys_foo);

If "syscall_no" is zero, I'll find a free system call number for you.
(Do check the return code. On success, it's your system call number.)
Or you can register your system call over an existing one:

        rc = hijack_syscall(name, syscall_no, (dyn_syscall_t) sys_foo);

Having fully initialized your system call, you can make it available:

        rc = syscall_unlock(name, syscall_no);

This sequence is usually included in the "module_init(...)" function.

User applications can find out what your system call number is by consulting
"/proc/sys/kernel/dynamic_syscalls/foo" or
"/proc/sys/kernel/hijacked_syscalls/foo", respectively.

Having played enough with your system call, you can launch the module unload
procedure, without worrying about the "living calls" which may be "part way"
through your module:

        rc = prep_restore_syscall(name, syscall_no);

This function locks out further calls to the "syscall_no" (they will be refused
with the return code "-ENOSYS"). It returns "-EAGAIN" if there is still someone
inside your system call.
In this latter case you can wait until your last client leaves:

        while((rc = syscall_trylock(name, syscall)) == -EAGAIN)
                schedule();

If you have a blocking system call, then instead of busy waiting, wake up the
waiting tasks and go to sleep a bit in the mean time.
Finally, you can invoke:

        rc = dyn_syscall_unreg(name, syscall_no);

or

        rc = restore_syscall(name, syscall_no);

to remove completely your registered or hijacked system call, respectively.

This sequence is usually included in the "module_exit(...)" function.

The function prototypes are in "asm/dyn_syscall.h".

In order to configure this module, say "m" in:

        Processor type and features:
                Support for dynamic system calls

The patch & the demos arrive in the next mails.

Your remarks will be appreciated.

Zoltán Menyhárt

[-- Attachment #2: dyn_syscall_man.txt --]
[-- Type: text/plain, Size: 5285 bytes --]



NAME

	dyn_syscall_reg, hijack_syscall - Register a system call

SYNOPSIS

	#include <asm/dyn_syscall.h>

	int
	dyn_syscall_reg(const char *name,
			const unsigned int syscall_no,
			const dyn_syscall_t fp);
	int
	hijack_syscall(const char *name,
			const unsigned int syscall_no,
			const dyn_syscall_t fp);

DESCRIPTION

	"dyn_syscall_reg()" and "hijack_syscall()" are exported services
	available for loadable kernel modules.

	"dyn_syscall_reg()" registers a new, dynamic system call.
	If "syscall_no" is zero, then an otherwise unused system call number
	will be assigned.

	"hijack_syscall()" registers a system call which overloads an
	existing one.

	"name" points to a string that shall persist while the system call is
	alive.

	"syscall_no" should be in the range of
	[__NR_ni_syscall + 1... __NR_ni_syscall + NR_syscalls).

	"fp" refers to the new system call.
	For the IA64 architecture, the function descriptor "dyn_syscall_t"
	refers to a structure containing the program counter and the global
	pointer.

	User applications can find this system call number in
	"/proc/sys/kernel/dynamic_syscalls/<name>" or in
	"/proc/sys/kernel/hijacked_syscalls/<name>", respectively.
	On read, each of these files contains a 4 digit decimal number
	terminated with a '\n' character.

RETURN VALUE

	On success, the system call number accepted / assigned is returned.

	On error, the following codes may be returned:

	-ENOENT:	No more free system call is available -
			"dyn_syscall_reg()" only
	-EINVAL:	Illegal system call number - both
	-EBUSY:		System call is already in use - "dyn_syscall_reg()" only
	-ENOMEM:	Cannot create "/proc/..." - both

SEE ALSO

	syscall_unlock, prep_restore_syscall, syscall_trylock,
	dyn_syscall_unreg, restore_syscall


--------------------------------------------------------------------------------


NAME

	syscall_unlock, syscall_trylock - Unlock / try to lock a system call
	prep_restore_syscall - Prepare to unregister a system call

SYNOPSIS

	#include <asm/dyn_syscall.h>

	int
	syscall_unlock(const char *name,
			const unsigned int syscall_no);
	int
	syscall_trylock(const char *name,
			const unsigned int syscall_no);

	int
	prep_restore_syscall(const char *name,
			const unsigned int syscall_no);

DESCRIPTION

	"syscall_unlock()", "syscall_trylock()" and "prep_restore_syscall()"
	are exported services available for loadable kernel modules.

	Each system call is protected by a semaphore.

	When a new system call is added, it is locked for write.
	Regular system call invocation tries to take the semaphore for read.
	Unless it is "syscall_unlock()"-ed, any attempt to use the system call
         will be refused and "-ENOSYS" will be reported.

	Before undoing a system call registration, it is necessary to lock out
	any further invocation of the system call by re-locking it for write.
	(They will be refused by returning "-ENOSYS".)
	Apart from some small administration task, "prep_restore_syscall()"
	attempts to do it. If it fails (indicated by "-EAGAIN" returned), then
	there is at least one "living call" which may be "part way" through
	the system call code.

	"syscall_trylock()" should be invoked repeatedly while it returns
	"-EAGAIN". In order not to over penalise other tasks, "schedule()"
	should be invoked at each iteration. If the system call is blocking,
         i.e. there can be tasks sleeping inside the system call, then they have
         to be woke up. In such a case, it is recommended to sleep a bit
         between two iterations of "syscall_trylock()".

	"name" should be the same as that was used during the registration.

	"syscall_no" should be in the range of
	[__NR_ni_syscall + 1... __NR_ni_syscall + NR_syscalls).

RETURN VALUE

	On success, zero is returned.

	"syscall_trylock()" and "prep_restore_syscall()" return "-EAGAIN" if
         they have failed to take the semaphore for write.
	
	On error, the following codes can be returned:

	-EBADF:		Name or system call number does not match the parameters
			which was used during the system call registration
	-EINVAL:	Illegal system call number

SEE ALSO

	dyn_syscall_reg, hijack_syscall, dyn_syscall_unreg, restore_syscall


--------------------------------------------------------------------------------


NAME

	dyn_syscall_unreg, restore_syscall - Unregister a system call

SYNOPSIS

	#include <asm/dyn_syscall.h>

	int
	dyn_syscall_unreg(const char *name,
			const unsigned int syscall_no);
	int
	restore_syscall(const char *name,
			const unsigned int syscall_no);

DESCRIPTION

	"dyn_syscall_unreg()" and "restore_syscall()" are exported services
	available for loadable kernel modules.

	"dyn_syscall_unreg()" unregisters a dynamic system call.

	"restore_syscall()" restores a hijacked system call.

	"name" should be the same as that was used during the registration.

	"syscall_no" should be in the range of
	[__NR_ni_syscall + 1... __NR_ni_syscall + NR_syscalls).

RETURN VALUE

	On success, zero is returned.

	On error, the following codes can be returned:

	-EBADF:		Name or system call number does not match the parameters
			which was used during the system call registration
	-EINVAL:	Illegal system call number

SEE ALSO

	dyn_syscall_reg, hijack_syscall,
	syscall_unlock, syscall_trylock,  prep_restore_syscall



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Dynamic System Calls & System Call Hijacking
  2004-04-20  9:07 Dynamic System Calls & System Call Hijacking Zoltan Menyhart
@ 2004-04-20 19:40 ` Pavel Machek
  2004-04-23 11:48   ` Zoltan Menyhart
  0 siblings, 1 reply; 3+ messages in thread
From: Pavel Machek @ 2004-04-20 19:40 UTC (permalink / raw)
  To: Zoltan.Menyhart; +Cc: linux-ia64, linux-kernel

Hi!

> - Can't recompile the kernel, otherwise you gonna lose RedHat guarantee ?
>   Or some ISVs like whose name starts with an "O" and terminates with "racle"
>   ain't gonna support it ?
 
>   + No problem, I'll load your syscall in a module.

Well, by forcing syscall in, you loose your guarantee, too.
cat /dev/urandom > /dev/kmem

"RedHat, help, my machine crashed."


> Your remarks will be appreciated.

I hope it at least taints the kernel.

And you did test on smp kernel, trying to race syscall calling against
your module load/unload, right?

-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms         


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Dynamic System Calls & System Call Hijacking
  2004-04-20 19:40 ` Pavel Machek
@ 2004-04-23 11:48   ` Zoltan Menyhart
  0 siblings, 0 replies; 3+ messages in thread
From: Zoltan Menyhart @ 2004-04-23 11:48 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-ia64, linux-kernel

Pavel Machek wrote:
> 
> Well, by forcing syscall in, you loose your guarantee, too.

Strictly speaking, you are right.

Let me give you an example how we are going to use this dynamic syscall feature:

Assuming a client of ours has a big application running on an "official" kernel.
We load our performance enhancement tool with my dynamic syscall stuff.
If the client observes better performance, then s/he loads this tool at each
re-boot.
Should the kernel crash, s/he does not load it and checks to see if the problem
happens...

Another example is using it as a development tool.
Our performance enhancement tool includes a syscall. It is 100 times quicker to
load it for testing as a module and not be obliged to recompile the kernel, re-
boot it.

> I hope it at least taints the kernel.

As this dynamic syscall feature is intended to be transparent, it does not do.
If someone wants to taint the kernel, it's just one line more of the code.

I checked: RedHat's AS 3.0 does not taint the kernel for 3rd party modules,
(it only does for the sii6512 software raid module).

Note that my patch is against 2.6.4. If you need to play with a 2.4.*, then
at least "kallsyms" should be changed onto "ksyms".

> And you did test on smp kernel, trying to race syscall calling against
> your module load/unload, right?

"dyn_syscall.ko" can be unloaded but it is unsafe.
Here is the window:

- A CPU picks up the address of my syscall link code from "sys_call_table"
  then it is pre-empted for a while
- Another CPU patches back the old address of "sys_ni_syscall" into
  "sys_call_table" and unloads "dyn_syscall.ko"
- The first CPU is back to jump at my link code in "dyn_syscall.ko"

On a client's machine, it is loaded once (e.g. at boot time).
You can try to unload it (as I did) during the development, you do not risk
much, but it is recommended to keep it loaded at the clients.

On the other hand, unloading modules which have correctly unregistered their
system calls is 100% safe.

I did test it on machine with 16 CPUs, but testing cannot prove that there is
no window. I'm going to summarize how the synchronization mechanism works.
There are two cases to consider:
- race among multiple syscall register / unregister operations
- race between unloading a syscall and its clients

Let's start with the first one.
My dynamic syscall feature includes a shadow system call table.
A table entry consists of:

- Name of the system call
- The saved syscall address from "sys_call_table" (atomic variable)
- A semaphore (initialized as if it were taken for write)
- Function descriptor of the new system call
- etc.

The synchronization mechanism is based on the atomic variable in each
entry of the shadow syscall table, that saves the old syscall address from
"sys_call_table":
- 0 means not in use
- 1 means reserved (going to be used)
- original "sys_call_table" entry | 1 means preparing to undo
- Otherwise saves the original "sys_call_table" entry (not an odd value)

For dynamic system call assignment:
- Atomically check & decrement number of the free syscall entries.

Dynamically assigned and hijacked system call entries form two distinct sets.
A dynamically assigned syscall cannot be hijacked. No nested hijacking.
(Therefore hijacking does not care for the number of the free syscall entries.)

For both the dynamically assigned and hijacked system calls:

- Reserve the corresponding shadow syscall table entry by use of a
  compare & swap atomic operation (see above)
- Do the other initialization and save the syscall address from "sys_call_table"
- Patch the address of my linkage code into the corresponding entry in
  "sys_call_table"
- Unlock the semaphore

- Undo operations work in the reverse order

Race between unloading a syscall and its clients:

- When a new system call is added, it is locked for write.
- Regular system call invocation tries to take the semaphore for read.
- Unless the semaphore is unlocked, any attempt to use the system call
  will be refused and "-ENOSYS" will be reported.

- Before undoing a system call registration, it is necessary to lock out
  any further invocation of the system call by re-locking it for write.
  If it fails, then there is at least one "living call" which may be "part way"
  through the system call code.
  "syscall_trylock()" should be invoked repeatedly while it returns "-EAGAIN".

I hope I have not missed anything :-)

Thanks,

Zoltán

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-04-23 11:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-20  9:07 Dynamic System Calls & System Call Hijacking Zoltan Menyhart
2004-04-20 19:40 ` Pavel Machek
2004-04-23 11:48   ` Zoltan Menyhart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox