On Sun, 27 Dec 2009 17:36:48 +0900, Tetsuo Handa said:

> What about defining two types of masks, one is applied throughout the rest of
> the task_struct's lifetime (inheritable mask), the other is cleared when
> execve() succeeds (local mask)?

A mask of permitted syscalls. You've re-invented SECCOMP. ;)

> When an application is sure that "I know I don't need to call execve()" or

OK, you *might* know that. Or more likely you just *think* you know that - ever
had a library routine do an execve() call behind your back?).  Or glibc
decides to do a clone2() call behind your back instead of execve(),
except on ARM where it does either a clone_nommu47() or clone_backflip() :)

> "I know execve()d programs need not to call ...()"

Unless you've done a code review of the exec'ed program, you don't know.

The big problem is that it's *not* sufficient to just run an strace or two
of normal runs and proclaim "this is the set of syscalls I need" - you need
to check all the error paths in all the shared libraries too.  It's no fun
when a program errors out, tries to do a syslog() of the fact - and then
*that* errors out too, causing the program to go into an infinite loop trying
to report the previous syslog() call just failed...

> "I want execve()d programs not to call ...()", 

Congrats - you just re-invented the Sendmail capabilities bug. ;)

This stuff is harder than it looks, especially when you realize that
syscall-granularity is almost certainly not the right security model.

> Application writers know better what syscalls the application will call than
> application users.

But the application user will know better than the writer what *actual*
security constraints need to be applied.  "I don't care *what* syscalls the
program uses, it's not allowed to access resource XYZ".