* [kernel-hardening] Re: procfs mount options [not found] <20110603191153.GB514@openwall.com> @ 2011-06-04 5:47 ` Vasiliy Kulikov 2011-06-04 13:20 ` [kernel-hardening] " Solar Designer 2011-06-05 18:36 ` [kernel-hardening] Re: [owl-dev] " Vasiliy Kulikov 1 sibling, 1 reply; 29+ messages in thread From: Vasiliy Kulikov @ 2011-06-04 5:47 UTC (permalink / raw) To: kernel-hardening; +Cc: Eugene Teo [-- Attachment #1: Type: text/plain, Size: 2243 bytes --] Solar, On Fri, Jun 03, 2011 at 23:11 +0400, Solar Designer wrote: > I welcome suggestions on how to achieve the desired functionality for > procfs in a non-confusing and generic way. It should support the > following reasonable configuration: > > /proc/PID directories restricted to group proc (except for owners and > root, indeed). However, /proc/cpuinfo and the like unrestricted. > Here's what this looks like on Linux 2.4.x-ow: > > dr-xr-x--- 3 root proc 0 Jun 3 22:59 1 > ... > dr-xr-x--- 3 syslogd proc 0 Jun 3 22:59 205 > dr-xr-x--- 3 klogd proc 0 Jun 3 22:59 211 > ... > -r--r--r-- 1 root proc 0 Jun 3 23:00 cpuinfo > ... > -r-------- 1 root proc 536743936 Jun 3 23:00 kcore > -r-------- 1 root proc 0 May 5 20:36 kmsg > ... > dr-xr-x--- 5 root proc 0 Jun 3 23:00 net > ... > -r--r--r-- 1 root proc 0 Jun 3 23:00 uptime > -r--r--r-- 1 root proc 0 Jun 3 23:00 version > > Perhaps gid=proc,umask=007 should result in the above for /proc/PID, but > how do we justify it not affecting /proc/cpuinfo, uptime, version (and > many others)? How do we justify it nevertheless affecting /proc/net (or > should another option do that)? I think it should be done with separate mount options for /proc/self/net (/proc/net is a symlink to /proc/self/net since net namespaces introduction) and for /proc/PID. All other files should be e.g. chmod'ed go= and then some white list should be chmod'ed to the relaxed perms. > Indeed, we could set some of these perms with chmod post-mount, but as > discussed this has drawbacks. Where its drawbacks were discussed? I cannot find anything on owl-dev. Do you mean some possible diffirences between procfs files among different kernel versions? If so, white list instead of black list should partly solve it. > So ideally our preferred configuration > (which will be the default on Owl) should be achievable with mount > options alone. At least for sysfs it is unreachable if we go in the current direction - umask doesn't change perms of already created files, and additional "chmod -R" is needed anyway. Thanks, -- Vasiliy [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options 2011-06-04 5:47 ` [kernel-hardening] Re: procfs mount options Vasiliy Kulikov @ 2011-06-04 13:20 ` Solar Designer 2011-06-04 20:09 ` Vasiliy Kulikov 0 siblings, 1 reply; 29+ messages in thread From: Solar Designer @ 2011-06-04 13:20 UTC (permalink / raw) To: kernel-hardening Vasiliy, On Sat, Jun 04, 2011 at 09:47:58AM +0400, Vasiliy Kulikov wrote: > I think it should be done with separate mount options for /proc/self/net > (/proc/net is a symlink to /proc/self/net since net namespaces > introduction) and for /proc/PID. OK. What do we call these? pmask (for "p"rocesses) and nmask (for "n"etwork)? Doesn't this deviation from umask reduce the chances of the patch getting accepted?.. And is there really much reason to let a user see others' processes, but not network connections, or vice versa? (Maybe there is.) > All other files should be e.g. > chmod'ed go= and then some white list should be chmod'ed to the relaxed > perms. Where would this logic be implemented - hard-coded in the kernel (enabled with some mount option?) or done by a script in the userspace post-mount? > > Indeed, we could set some of these perms with chmod post-mount, but as > > discussed this has drawbacks. > > Where its drawbacks were discussed? IIRC, you, I, and some others discussed them via Jabber. > I cannot find anything on > owl-dev. Do you mean some possible diffirences between procfs files > among different kernel versions? If so, white list instead of black > list should partly solve it. No, I meant race conditions, which you had to deal with by mounting under a parent directory with restricted perms, then mount --move'ing. Part of the intent of us patching the kernel is to avoid having to do things like that. Also, an admin might just "mount /proc", not knowing that special userspace magic was implemented by the distro. A possible approach is to implement mount options gid=... and restricted (name suggested by spender), where the latter would enable a hard-coded set of permissions. This is not generic, but at least it's simple, not confusing, and it allows for future changes (we may change what exactly restricted means). > > So ideally our preferred configuration > > (which will be the default on Owl) should be achievable with mount > > options alone. > > At least for sysfs it is unreachable if we go in the current direction - > umask doesn't change perms of already created files, and additional > "chmod -R" is needed anyway. Hmm. I guess I still don't understand your umask vs. mode stuff. Ideally, I'd have umask apply to each instance of sysfs (and other special filesystems) individually, affecting all files under that instance (both those already existing in the kernel at the time of mount and those appearing after mount). But perhaps that's not how the kernel keeps track of permissions for those filesystems currently? (Sorry, I haven't played with this since Linux 2.4, which didn't even have sysfs.) Thanks, Alexander ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options 2011-06-04 13:20 ` [kernel-hardening] " Solar Designer @ 2011-06-04 20:09 ` Vasiliy Kulikov 2011-06-04 20:59 ` Solar Designer 0 siblings, 1 reply; 29+ messages in thread From: Vasiliy Kulikov @ 2011-06-04 20:09 UTC (permalink / raw) To: kernel-hardening Solar, On Sat, Jun 04, 2011 at 17:20 +0400, Solar Designer wrote: > On Sat, Jun 04, 2011 at 09:47:58AM +0400, Vasiliy Kulikov wrote: > > I think it should be done with separate mount options for /proc/self/net > > (/proc/net is a symlink to /proc/self/net since net namespaces > > introduction) and for /proc/PID. > > OK. What do we call these? pmask (for "p"rocesses) and nmask (for > "n"etwork)? Doesn't this deviation from umask reduce the chances of the > patch getting accepted?.. And is there really much reason to let a user > see others' processes, but not network connections, or vice versa? > (Maybe there is.) While implementing this PID permissions restricting mount opt (don't know how to call it better), I've noticed that only 2 modes make sense - full view (current behaviour) and restricted view (similar to umask=07). So, I propose to implement & call the options as follows: (no)safepid - instead of umask/mode; if it is on, only owner/root/special group may read /proc/PID. (no)hidepid - hide restricted /proc/PID from *stat*(2), getdents*() and similar. Needs safepid. (no)hidenet - hide /proc/PID/net contents (restricting /proc/PID/net itself is a bit more difficult to implement). gid - group to evade safepid,hidepid,hidenet. Also as hidepid needs safepid, maybe it makes sense to implement one option instead of safepid and hidepid: hidepid=X where: X = 0 means nohidepid,nosafepid above X = 1 means safepid,nohidepid above X = 2 means safepid,hidepid above > > All other files should be e.g. > > chmod'ed go= and then some white list should be chmod'ed to the relaxed > > perms. > > Where would this logic be implemented - hard-coded in the kernel > (enabled with some mount option?) or done by a script in the userspace > post-mount? I'm sure if we try to implement it in kernel, we'd surely get NAK because it may be implemented much simplier in userspace. Compare: a single chmod -R vs. some in-kernel filtering of core procfs files and driver-specific files (yes, there are rare driver files in /proc, it is blamed, but anyway). > > > Indeed, we could set some of these perms with chmod post-mount, but as > > > discussed this has drawbacks. > > > > Where its drawbacks were discussed? > > IIRC, you, I, and some others discussed them via Jabber. Ah, OK ;) > > I cannot find anything on > > owl-dev. Do you mean some possible diffirences between procfs files > > among different kernel versions? If so, white list instead of black > > list should partly solve it. > > No, I meant race conditions, which you had to deal with by mounting > under a parent directory with restricted perms, then mount --move'ing. > Part of the intent of us patching the kernel is to avoid having to do > things like that. For distros with sysv init there is no race if it is implemented in a boot script before any login/cron/etc. It's a rare situation now, though :) > Also, an admin might just "mount /proc", not knowing > that special userspace magic was implemented by the distro. Agreed, but I still don't think it is a good argument for upstream... > A possible approach is to implement mount options gid=... and restricted > (name suggested by spender), where the latter would enable a hard-coded > set of permissions. To what files? All /proc/* files or recursively? Is there a white list of o=r/o=rw files or all of them would be chmod'ed "o="? > This is not generic, but at least it's simple, not > confusing, and it allows for future changes (we may change what exactly > restricted means). Changing what some option means after implementing it is a ABI breakage, which is terrible for upstream. However, it might be acceptable for Owl. > > > So ideally our preferred configuration > > > (which will be the default on Owl) should be achievable with mount > > > options alone. > > > > At least for sysfs it is unreachable if we go in the current direction - > > umask doesn't change perms of already created files, and additional > > "chmod -R" is needed anyway. > > Hmm. I guess I still don't understand your umask vs. mode stuff. > > Ideally, I'd have umask apply to each instance of sysfs (and other > special filesystems) individually, affecting all files under that > instance What instance? All files (=kobjects) are global, there is almost no division among mount points. The only division is among net namespaces, but it is mostly the exception than the rule and there will be always files global to all namespaces (e.g. modules list, devices list, etc.) > (both those already existing in the kernel at the time of mount > and those appearing after mount). But perhaps that's not how the kernel > keeps track of permissions for those filesystems currently? I'd not change permissions of all currently existing files. What would you do with already created and chmod'ed files? And what you suggest is strictly speaking not "umask". "umask" is for newly created files. "mode" is almost everything judging by mount(8) :) Currently I have only one TODO in my list for procfs - restricted /proc/PID/net returns -EINVAL instead of -ENOENT as it is done in grsecurity :) Need to make it say true. Thanks, Vasiliy. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options 2011-06-04 20:09 ` Vasiliy Kulikov @ 2011-06-04 20:59 ` Solar Designer 2011-06-05 18:24 ` [kernel-hardening] [RFC v1] " Vasiliy Kulikov 2011-06-05 19:17 ` [kernel-hardening] " Vasiliy Kulikov 0 siblings, 2 replies; 29+ messages in thread From: Solar Designer @ 2011-06-04 20:59 UTC (permalink / raw) To: kernel-hardening On Sun, Jun 05, 2011 at 12:09:51AM +0400, Vasiliy Kulikov wrote: > While implementing this PID permissions restricting mount opt (don't > know how to call it better), I've noticed that only 2 modes make sense - > full view (current behaviour) and restricted view (similar to umask=07). > So, I propose to implement & call the options as follows: > > (no)safepid - instead of umask/mode; if it is on, only > owner/root/special group may read /proc/PID. > > (no)hidepid - hide restricted /proc/PID from *stat*(2), getdents*() > and similar. Needs safepid. > > (no)hidenet - hide /proc/PID/net contents (restricting /proc/PID/net > itself is a bit more difficult to implement). > > gid - group to evade safepid,hidepid,hidenet. > > Also as hidepid needs safepid, maybe it makes sense to implement one > option instead of safepid and hidepid: hidepid=X where: > > X = 0 means nohidepid,nosafepid above > X = 1 means safepid,nohidepid above > X = 2 means safepid,hidepid above This makes sense to me, and I like the tri-state option better. > On Sat, Jun 04, 2011 at 17:20 +0400, Solar Designer wrote: > > A possible approach is to implement mount options gid=... and restricted > > (name suggested by spender), where the latter would enable a hard-coded > > set of permissions. > > To what files? All /proc/* files or recursively? Is there a white list > of o=r/o=rw files or all of them would be chmod'ed "o="? Oh, that's not what I was referring to. The restricted option would essentially be the same as your hidepid=2 - enabling whatever grsecurity normally does. But having a tri-state option as you propose makes more sense to me. > > This is not generic, but at least it's simple, not > > confusing, and it allows for future changes (we may change what exactly > > restricted means). > > Changing what some option means after implementing it is a ABI breakage, > which is terrible for upstream. This depends on how the option is defined initially. It is possible to define the option as "enabling unspecified version-dependent restrictions, which are subject to change across kernel versions and builds". ;-) OK, the word "unspecified" may be dropped. > > Ideally, I'd have umask apply to each instance of sysfs (and other > > special filesystems) individually, affecting all files under that > > instance > > What instance? All files (=kobjects) are global, there is almost no > division among mount points. The only division is among net namespaces, > but it is mostly the exception than the rule and there will be always > files global to all namespaces (e.g. modules list, devices list, etc.) > > > (both those already existing in the kernel at the time of mount > > and those appearing after mount). But perhaps that's not how the kernel > > keeps track of permissions for those filesystems currently? > > I'd not change permissions of all currently existing files. What would > you do with already created and chmod'ed files? The reality (kobjects being global) doesn't quite match my desired behavior (being able to mount more than one instance of procfs, etc. and set their mount options affecting perms differently). However, this is not important enough to warrant changing the reality. ;-) Here's a related thought: if these mount options happen to affect all instances of the filesystem (in the same container), maybe they should be sysctl's instead? Here's another thought: if the gid is not specified but hidepid is enabled, maybe the perms should be 700 rather than 750 (don't grant any privs to group root). > And what you suggest is strictly speaking not "umask". "umask" is for newly > created files. "mode" is almost everything judging by mount(8) :) mount(8) mentions both "umask" and "mode" for different filesystems, with slightly different semantics. > Currently I have only one TODO in my list for procfs - restricted > /proc/PID/net returns -EINVAL instead of -ENOENT as it is done in > grsecurity :) Need to make it say true. Do you mean you have already implemented the mount options as you have proposed above? Thanks, Alexander ^ permalink raw reply [flat|nested] 29+ messages in thread
* [kernel-hardening] [RFC v1] procfs mount options 2011-06-04 20:59 ` Solar Designer @ 2011-06-05 18:24 ` Vasiliy Kulikov 2011-06-05 19:26 ` Solar Designer 2011-06-05 19:17 ` [kernel-hardening] " Vasiliy Kulikov 1 sibling, 1 reply; 29+ messages in thread From: Vasiliy Kulikov @ 2011-06-05 18:24 UTC (permalink / raw) To: kernel-hardening [-- Attachment #1: Type: text/plain, Size: 13695 bytes --] This patch introduces support of procfs mount options and adds mount options to restrict access to /proc/PID/ directories and /proc/PID/net/ contents. The default backward-compatible behaviour is left untouched. The first mount option is called "hidepid" and its value defines how much info about processes we want to be available for non-owners: hidepid=0 (default) means the current behaviour - anybody may read all /proc/PID/* files. hidepid=1 means all files not running as current user and group are unaccessible (directories' permissions are 0550). Sensitive files like cmdline, io, sched*, status, wchan are now protected against other users. hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other users. It doesn't mean that it hides a fact whether a process exists (it can be learned by other means, e.g. by sending signals), but it hides process' uid and gid. It greatly compicates intruder's task of gathering info about running processes, whether some daemon runs with elevated privileges, whether other user runs some sensitive program, whether other users run any program at all, etc. hidenet means /proc/PID/net will be accessible to processes with CAP_NET_ADMIN capability or to members of a special group. gid=XXX defines a group that will be able to gather all processes' info and network connections info. TODO/thoughs: - /proc/pid/net/ currently doesn't show ANYTHING, even "." and "..". This is confusing :) - copy options description to Documenataion/filesystems/procfs.txt - need to alter "(inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO))" checks to honour non-pid directories. - what if one keeps open /proc/PID/ while executing set*id/capable binary? - maybe hidenet belongs to net namespace, not pid? However, it is seen through procfs, which is per-pid_namespace. diff --git a/fs/proc/base.c b/fs/proc/base.c index 9d096e8..dd28832 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1688,6 +1688,7 @@ static struct inode *proc_pid_make_inode(struct super_block * sb, struct task_st struct inode * inode; struct proc_inode *ei; const struct cred *cred; + struct nsproxy *ns; /* We need a new inode */ @@ -1712,7 +1713,12 @@ static struct inode *proc_pid_make_inode(struct super_block * sb, struct task_st rcu_read_lock(); cred = __task_cred(task); inode->i_uid = cred->euid; - inode->i_gid = cred->egid; + ns = task_nsproxy(task); + if (ns && ns->pid_ns->hide_pid) { + inode->i_gid = ns->pid_ns->pid_gid; + } else { + inode->i_gid = cred->egid; + } rcu_read_unlock(); } security_task_to_inode(task, inode); @@ -1725,11 +1731,22 @@ out_unlock: return NULL; } +/* TODO: ugly, need something more generic */ +static bool proc_pid_may_getattr(struct pid_namespace* pid, struct task_struct *task) +{ + if (pid->hide_pid < 2) + return true; + if (ptrace_may_access(task, PTRACE_MODE_READ)) + return true; + return in_group_p(pid->pid_gid); +} + static int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) { struct inode *inode = dentry->d_inode; struct task_struct *task; const struct cred *cred; + struct pid_namespace *pid = dentry->d_sb->s_fs_info; generic_fillattr(inode, stat); @@ -1738,11 +1755,17 @@ static int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat stat->gid = 0; task = pid_task(proc_pid(inode), PIDTYPE_PID); if (task) { - if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || - task_dumpable(task)) { + /* TODO: if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) */ + if ((((inode->i_mode & S_IFDIR) == S_IFDIR) || + task_dumpable(task))) { cred = __task_cred(task); - stat->uid = cred->euid; - stat->gid = cred->egid; + if (proc_pid_may_getattr(pid, task)) { + stat->uid = cred->euid; + if (pid->hide_pid) + stat->gid = pid->pid_gid; + else + stat->gid = cred->egid; + } } } rcu_read_unlock(); @@ -1771,6 +1794,7 @@ static int pid_revalidate(struct dentry *dentry, struct nameidata *nd) struct inode *inode; struct task_struct *task; const struct cred *cred; + struct pid_namespace *pid = dentry->d_sb->s_fs_info; if (nd && nd->flags & LOOKUP_RCU) return -ECHILD; @@ -1779,12 +1803,16 @@ static int pid_revalidate(struct dentry *dentry, struct nameidata *nd) task = get_proc_task(inode); if (task) { - if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || + /* TODO: if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) */ + if (((inode->i_mode & S_IFDIR) == S_IFDIR) || task_dumpable(task)) { rcu_read_lock(); cred = __task_cred(task); inode->i_uid = cred->euid; - inode->i_gid = cred->egid; + if (pid->hide_pid) + inode->i_gid = pid->pid_gid; + else + inode->i_gid = cred->egid; rcu_read_unlock(); } else { inode->i_uid = 0; @@ -2986,12 +3014,21 @@ static struct dentry *proc_pid_instantiate(struct inode *dir, { struct dentry *error = ERR_PTR(-ENOENT); struct inode *inode; + struct nsproxy *ns; + unsigned int mode; inode = proc_pid_make_inode(dir->i_sb, task); if (!inode) goto out; - inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO; + rcu_read_lock(); + ns = task_nsproxy(task); + if (ns && ns->pid_ns->hide_pid) + mode = 07; + else + mode = 0; + rcu_read_unlock(); + inode->i_mode = S_IFDIR | ((S_IRUGO|S_IXUGO) & ~mode); inode->i_op = &proc_tgid_base_inode_operations; inode->i_fop = &proc_tgid_base_operations; inode->i_flags|=S_IMMUTABLE; @@ -3093,6 +3130,11 @@ static int proc_pid_fill_cache(struct file *filp, void *dirent, filldir_t filldi proc_pid_instantiate, iter.task, NULL); } +static int fake_filldir(void *buf, const char *name, int namelen, loff_t offset, u64 ino, unsigned d_type) +{ + return 0; +} + /* for the /proc/ directory itself, after non-process stuff has been done */ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir) { @@ -3100,6 +3142,7 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir) struct task_struct *reaper = get_proc_task(filp->f_path.dentry->d_inode); struct tgid_iter iter; struct pid_namespace *ns; + filldir_t __filldir; if (!reaper) goto out_no_task; @@ -3116,8 +3159,13 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir) for (iter = next_tgid(ns, iter); iter.task; iter.tgid += 1, iter = next_tgid(ns, iter)) { + if (proc_pid_may_getattr(ns, iter.task)) + __filldir = filldir; + else + __filldir = fake_filldir; + filp->f_pos = iter.tgid + TGID_OFFSET; - if (proc_pid_fill_cache(filp, dirent, filldir, iter) < 0) { + if (proc_pid_fill_cache(filp, dirent, __filldir, iter) < 0) { put_task_struct(iter.task); goto out; } diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 176ce4c..698fb41 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -7,6 +7,7 @@ #include <linux/time.h> #include <linux/proc_fs.h> #include <linux/kernel.h> +#include <linux/pid_namespace.h> #include <linux/mm.h> #include <linux/string.h> #include <linux/stat.h> @@ -17,7 +18,9 @@ #include <linux/init.h> #include <linux/module.h> #include <linux/sysctl.h> +#include <linux/seq_file.h> #include <linux/slab.h> +#include <linux/mount.h> #include <asm/system.h> #include <asm/uaccess.h> @@ -93,12 +96,29 @@ void __init proc_init_inodecache(void) init_once); } +static int proc_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct super_block *sb = vfs->mnt_sb; + struct pid_namespace *pid = sb->s_fs_info; + + if (pid->pid_gid) + seq_printf(seq, ",gid=%lu", (unsigned long)pid->pid_gid); + if (pid->hide_pid != 0) + seq_printf(seq, ",hidepid=%u", pid->hide_pid); + if (pid->hide_net) + seq_printf(seq, ",hidenet"); + + return 0; +} + static const struct super_operations proc_sops = { .alloc_inode = proc_alloc_inode, .destroy_inode = proc_destroy_inode, .drop_inode = generic_delete_inode, .evict_inode = proc_evict_inode, .statfs = simple_statfs, + .remount_fs = proc_remount, + .show_options = proc_show_options, }; static void __pde_users_dec(struct proc_dir_entry *pde) @@ -468,7 +488,7 @@ int proc_fill_super(struct super_block *s) s->s_magic = PROC_SUPER_MAGIC; s->s_op = &proc_sops; s->s_time_gran = 1; - + pde_get(&proc_root); root_inode = proc_get_inode(s, &proc_root); if (!root_inode) diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 9ad561d..1cacb6a 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -110,6 +110,7 @@ void pde_put(struct proc_dir_entry *pde); extern struct vfsmount *proc_mnt; int proc_fill_super(struct super_block *); struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *); +int proc_remount(struct super_block *sb, int *flags, char *data); /* * These are generic /proc routines that use the internal diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c index 9020ac1..7dc0b96 100644 --- a/fs/proc/proc_net.c +++ b/fs/proc/proc_net.c @@ -22,6 +22,7 @@ #include <linux/mount.h> #include <linux/nsproxy.h> #include <net/net_namespace.h> +#include <linux/pid_namespace.h> #include <linux/seq_file.h> #include "internal.h" @@ -105,6 +106,11 @@ static struct net *get_proc_task_net(struct inode *dir) struct task_struct *task; struct nsproxy *ns; struct net *net = NULL; + struct pid_namespace *pid = dir->i_sb->s_fs_info; + + if (pid->hide_net && + (!capable(CAP_NET_ADMIN) && !in_group_p(pid->pid_gid))) + return net; rcu_read_lock(); task = pid_task(proc_pid(dir), PIDTYPE_PID); @@ -162,7 +168,7 @@ static int proc_tgid_net_readdir(struct file *filp, void *dirent, int ret; struct net *net; - ret = -EINVAL; + ret = -ENOENT; net = get_proc_task_net(filp->f_path.dentry->d_inode); if (net != NULL) { ret = proc_readdir_de(net->proc_net, filp, dirent, filldir); diff --git a/fs/proc/root.c b/fs/proc/root.c index ef9fa8e..269067c 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -18,6 +18,7 @@ #include <linux/bitops.h> #include <linux/mount.h> #include <linux/pid_namespace.h> +#include <linux/parser.h> #include "internal.h" @@ -35,6 +36,75 @@ static int proc_set_super(struct super_block *sb, void *data) return set_anon_super(sb, NULL); } +enum { + Opt_gid, Opt_hidepid, Opt_hidenet, Opt_nohidenet, Opt_err, +}; + +static const match_table_t tokens = { + {Opt_hidepid, "hidepid=%u"}, + {Opt_gid, "gid=%u"}, + {Opt_hidenet, "hidenet"}, + {Opt_nohidenet, "nohidenet"}, + {Opt_err, NULL}, +}; + +static int proc_parse_options(char *options, struct pid_namespace *pid) +{ + char *p; + substring_t args[MAX_OPT_ARGS]; + int option; + + pr_debug("proc: options = %s\n", options); + + if (!options) + return 1; + + while ((p = strsep(&options, ",")) != NULL) { + int token; + if (!*p) + continue; + + args[0].to = args[0].from = 0; + token = match_token(p, tokens, args); + switch (token) { + case Opt_gid: + if (match_int(&args[0], &option)) + return 0; + pid->pid_gid = option; + break; + case Opt_hidepid: + if (match_int(&args[0], &option)) + return 0; + if (option < 0 || option > 2) { + pr_err("proc: hidepid value must be between 0 and 2.\n"); + return 0; + } + pid->hide_pid = option; + break; + case Opt_hidenet: + pid->hide_net = true; + break; + case Opt_nohidenet: + pid->hide_net = false; + break; + default: + pr_err("proc: unrecognized mount option \"%s\" " + "or missing value", p); + return 0; + } + } + + pr_debug("proc: gid = %u, hidepid = %o, hidenet = %d\n", pid->pid_gid, pid->hide_pid, (int)pid->hide_net); + + return 1; +} + +int proc_remount(struct super_block *sb, int *flags, char *data) +{ + struct pid_namespace *pid = sb->s_fs_info; + return !proc_parse_options(data, pid); +} + static struct dentry *proc_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { @@ -42,6 +112,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, struct super_block *sb; struct pid_namespace *ns; struct proc_inode *ei; + char *options; if (proc_mnt) { /* Seed the root directory with a pid so it doesn't need @@ -54,10 +125,13 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, ei->pid = find_get_pid(1); } - if (flags & MS_KERNMOUNT) + if (flags & MS_KERNMOUNT) { ns = (struct pid_namespace *)data; - else + options = NULL; + } else { ns = current->nsproxy->pid_ns; + options = data; + } sb = sget(fs_type, proc_test_super, proc_set_super, ns); if (IS_ERR(sb)) @@ -65,6 +139,10 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, if (!sb->s_root) { sb->s_flags = flags; + if (!proc_parse_options(options, ns)) { + deactivate_locked_super(sb); + return ERR_PTR(-EINVAL); + } err = proc_fill_super(sb); if (err) { deactivate_locked_super(sb); diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 38d1032..1c33094 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -30,6 +30,9 @@ struct pid_namespace { #ifdef CONFIG_BSD_PROCESS_ACCT struct bsd_acct_struct *bacct; #endif + gid_t pid_gid; + int hide_pid; + bool hide_net; }; extern struct pid_namespace init_pid_ns; -- [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options 2011-06-05 18:24 ` [kernel-hardening] [RFC v1] " Vasiliy Kulikov @ 2011-06-05 19:26 ` Solar Designer 2011-06-05 19:47 ` Vasiliy Kulikov 0 siblings, 1 reply; 29+ messages in thread From: Solar Designer @ 2011-06-05 19:26 UTC (permalink / raw) To: kernel-hardening Vasiliy, all - To comment on this in detail (such as on specific source code changes), I'd need to dive into it myself, effectively duplicating Vasiliy's work. So I am not doing that, but I'd appreciate it if others in here review and comment (preferably, folks with more up-to-date experience of patching these areas of the kernel - such as after the introduction of namespaces; my own experience is too rusty). I think it's going to be similar for other tasks Vasiliy will be working on under this project - I am happy to provide overall guidance, but I am too rusty on many of the details involved and I am not going to have time to (re-)learn them this summer. So any help is appreciated. On Sun, Jun 05, 2011 at 10:24:31PM +0400, Vasiliy Kulikov wrote: > This patch introduces support of procfs mount options and adds mount > options to restrict access to /proc/PID/ directories and /proc/PID/net/ > contents. The default backward-compatible behaviour is left untouched. > > The first mount option is called "hidepid" and its value defines how much > info about processes we want to be available for non-owners: > > hidepid=0 (default) means the current behaviour - anybody may read all > /proc/PID/* files. Aren't some /prod/PID/* files restricted by default, in stock kernels? I think several are (auxv, fd/, mem). So perhaps re-word this. > hidepid=1 means all files not running as current user and group are "... files not running ..." needs to be re-worded. > TODO/thoughs: > - /proc/pid/net/ currently doesn't show ANYTHING, even "." and "..". > This is confusing :) Ouch. Can't you simply restrict its perms such that this directory can't be listed unless you have privs? It should act as a normal mode 550 directory on a regular filesystem. > - copy options description to Documenataion/filesystems/procfs.txt Yes. > - need to alter "(inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO))" checks > to honour non-pid directories. I see this in the patch, but can't really comment without reviewing the code in full context myself. > - what if one keeps open /proc/PID/ while executing set*id/capable > binary? Then they deliberately grant this privilege to this process (and maybe to its heirs). I see no problem with that. However, if a set*id/capable binary can be tricked into opening /proc/PID/ or a /proc/PID/* entry on its own, and would leave the fd open for whatever other program(s) it invokes, then we have a problem. But that should probably be regarded as an issue of that specific program, unless the underlying cause enabling the specific attack is in the kernel or in a library. I find it unlikely that an issue like that would be specific to just /proc/PID; it'd probably be usable on other restricted-perms directories and/or files as well. > - maybe hidenet belongs to net namespace, not pid? However, it is > seen through procfs, which is per-pid_namespace. I am not the right person to comment on that. > + if (pid->hide_net && > + (!capable(CAP_NET_ADMIN) && !in_group_p(pid->pid_gid))) capable() sets a flag when it makes use of the capability (or at least it used to), visible via pacct. So, unless anything has changed in this area, it is best to check capable() last, such that it's only reached when it actually makes a difference. Thus, I'd write: if (pid->hide_net && !in_group_p(pid->pid_gid) && !capable(CAP_NET_ADMIN)) Also, what did you mean by the extra braces? Just separating the setting check from the permissions check for readability? I don't think this helps. Thanks, Alexander ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options 2011-06-05 19:26 ` Solar Designer @ 2011-06-05 19:47 ` Vasiliy Kulikov 2011-06-05 20:10 ` Solar Designer 0 siblings, 1 reply; 29+ messages in thread From: Vasiliy Kulikov @ 2011-06-05 19:47 UTC (permalink / raw) To: kernel-hardening [-- Attachment #1: Type: text/plain, Size: 3611 bytes --] Solar, On Sun, Jun 05, 2011 at 23:26 +0400, Solar Designer wrote: > On Sun, Jun 05, 2011 at 10:24:31PM +0400, Vasiliy Kulikov wrote: > > This patch introduces support of procfs mount options and adds mount > > options to restrict access to /proc/PID/ directories and /proc/PID/net/ > > contents. The default backward-compatible behaviour is left untouched. > > > > The first mount option is called "hidepid" and its value defines how much > > info about processes we want to be available for non-owners: > > > > hidepid=0 (default) means the current behaviour - anybody may read all > > /proc/PID/* files. > > Aren't some /prod/PID/* files restricted by default, in stock kernels? > I think several are (auxv, fd/, mem). So perhaps re-word this. Yes, I've mentioned still accessible files in hidepid=1. Will do. > > hidepid=1 means all files not running as current user and group are > > "... files not running ..." needs to be re-worded. Oops, made a mistake while rewording. > > TODO/thoughs: > > - /proc/pid/net/ currently doesn't show ANYTHING, even "." and "..". > > This is confusing :) > > Ouch. Can't you simply restrict its perms such that this directory > can't be listed unless you have privs? Well, yes, but it would touch too many code - currently it is handled as an entry in a static table. Changing this would touch many high level loops of the table handling. Hiding its contents is just simplier. Another solution - create a fake net namespace and process this namespace if not enough permissions :) It also removes weird netstat errors like "seems like networking was disabled for this kernel". > It should act as a normal mode > 550 directory on a regular filesystem. What 550 perms would give? /proc/pid/net/ contains all network information about _all_ network connections in current net namespace. So, /proc/1/net and /proc/2/net are logically the same directory. However, changing the mode to 550 _and_ changing uid and gid will help. > > - need to alter "(inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO))" checks > > to honour non-pid directories. > > I see this in the patch, but can't really comment without reviewing the > code in full context myself. As Spender commented, this is a weird thing to recognize pids from other files, which is needed due to some weird caching :) > > - what if one keeps open /proc/PID/ while executing set*id/capable > > binary? > > Then they deliberately grant this privilege to this process (and maybe > to its heirs). I see no problem with that. Ehh... I mean another thing: Process A with UID=1000 opens /proc/123/, while 123 has UID=1000. 123 exec's setuid binary, /proc/123/ becomes unaccessible to A. However, A still keeps the directory opened and may read its contents. > > + if (pid->hide_net && > > + (!capable(CAP_NET_ADMIN) && !in_group_p(pid->pid_gid))) > > capable() sets a flag when it makes use of the capability (or at least > it used to), visible via pacct. So, unless anything has changed in this > area, it is best to check capable() last, such that it's only reached > when it actually makes a difference. Thus, I'd write: > > if (pid->hide_net && !in_group_p(pid->pid_gid) && > !capable(CAP_NET_ADMIN)) OK. > Also, what did you mean by the extra braces? Just separating the > setting check from the permissions check for readability? Initially I wrote some ||, so the braces were needed. Will remove. Thanks you for the comments, will fix and repost, -- Vasiliy [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options 2011-06-05 19:47 ` Vasiliy Kulikov @ 2011-06-05 20:10 ` Solar Designer 2011-06-06 18:08 ` Vasiliy Kulikov 2011-06-06 19:20 ` [kernel-hardening] [RFC v1] procfs " Vasiliy Kulikov 0 siblings, 2 replies; 29+ messages in thread From: Solar Designer @ 2011-06-05 20:10 UTC (permalink / raw) To: kernel-hardening On Sun, Jun 05, 2011 at 11:47:46PM +0400, Vasiliy Kulikov wrote: > On Sun, Jun 05, 2011 at 23:26 +0400, Solar Designer wrote: > > On Sun, Jun 05, 2011 at 10:24:31PM +0400, Vasiliy Kulikov wrote: > > > TODO/thoughs: > > > - /proc/pid/net/ currently doesn't show ANYTHING, even "." and "..". > > > This is confusing :) > > > > Ouch. Can't you simply restrict its perms such that this directory > > can't be listed unless you have privs? > > Well, yes, but it would touch too many code - currently it is handled as > an entry in a static table. Changing this would touch many high level > loops of the table handling. Hiding its contents is just simplier. > > Another solution - create a fake net namespace and process this > namespace if not enough permissions :) It also removes weird netstat > errors like "seems like networking was disabled for this kernel". This sounds good to me, although I am not familiar with namespaces. > > It should act as a normal mode 550 directory on a regular filesystem. > > What 550 perms would give? /proc/pid/net/ contains all network > information about _all_ network connections in current net namespace. > So, /proc/1/net and /proc/2/net are logically the same directory. > > However, changing the mode to 550 _and_ changing uid and gid will help. I implied that the gid would be the one set by our gid option. > > > - what if one keeps open /proc/PID/ while executing set*id/capable > > > binary? > > > > Then they deliberately grant this privilege to this process (and maybe > > to its heirs). I see no problem with that. > > Ehh... I mean another thing: > > Process A with UID=1000 opens /proc/123/, while 123 has UID=1000. > > 123 exec's setuid binary, /proc/123/ becomes unaccessible to A. > > However, A still keeps the directory opened and may read its contents. Oh, this is a valid concern. Please research this. Perhaps there should be a may-ptrace check (or maybe more than one). As it relates to privacy, there's little need for this, because the user started those processes on their own (in fact, this is a reason why the user may want to be able to see those processes, despite of procfs perms). However, if our concern is that the processes may leak sensitive data the user is not supposed to see, then we need to restrict this. Overall, it makes sense either to relax things such that /proc/PID have real UID rather than effective UID as their owner (so we make such access to almost-own processes legitimate) or to have the proper may-ptrace checks in place (such that permissions are actually enforced). Maybe this should be another mount option. But this feels like too much complexity for an obscure option that few people will understand and use. Thanks, Alexander ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options 2011-06-05 20:10 ` Solar Designer @ 2011-06-06 18:08 ` Vasiliy Kulikov 2011-06-06 18:33 ` Solar Designer 2011-06-06 19:20 ` [kernel-hardening] [RFC v1] procfs " Vasiliy Kulikov 1 sibling, 1 reply; 29+ messages in thread From: Vasiliy Kulikov @ 2011-06-06 18:08 UTC (permalink / raw) To: kernel-hardening [-- Attachment #1: Type: text/plain, Size: 797 bytes --] On Mon, Jun 06, 2011 at 00:10 +0400, Solar Designer wrote: > > Process A with UID=1000 opens /proc/123/, while 123 has UID=1000. > > > > 123 exec's setuid binary, /proc/123/ becomes unaccessible to A. > > > > However, A still keeps the directory opened and may read its contents. > > Oh, this is a valid concern. Please research this. Perhaps there > should be a may-ptrace check (or maybe more than one). This is similar to CVE-2011-1020: https://lkml.org/lkml/2011/2/7/368 http://seclists.org/fulldisclosure/2011/Jan/421 The proposed solution for separate procfs files is implementing additional runtime checks (besides POSIX perms), however, it probably doesn't scale for the whole PID directory. Will try to invent some simple way to deal with it. -- Vasiliy [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options 2011-06-06 18:08 ` Vasiliy Kulikov @ 2011-06-06 18:33 ` Solar Designer 2011-06-08 17:23 ` [kernel-hardening] [RFC v2] " Vasiliy Kulikov 0 siblings, 1 reply; 29+ messages in thread From: Solar Designer @ 2011-06-06 18:33 UTC (permalink / raw) To: kernel-hardening On Mon, Jun 06, 2011 at 10:08:06PM +0400, Vasiliy Kulikov wrote: > On Mon, Jun 06, 2011 at 00:10 +0400, Solar Designer wrote: > > > Process A with UID=1000 opens /proc/123/, while 123 has UID=1000. > > > > > > 123 exec's setuid binary, /proc/123/ becomes unaccessible to A. > > > > > > However, A still keeps the directory opened and may read its contents. > > > > Oh, this is a valid concern. Please research this. Perhaps there > > should be a may-ptrace check (or maybe more than one). > > This is similar to CVE-2011-1020: > > https://lkml.org/lkml/2011/2/7/368 > http://seclists.org/fulldisclosure/2011/Jan/421 > > The proposed solution for separate procfs files is implementing > additional runtime checks (besides POSIX perms), Right. IIRC, the approach with selectively adding may-ptrace checks to sensitive proc files dates back to Linux 2.0. Apparently, some newer entries such as auxv were not covered by it (there's no /proc/PID/auxv on 2.4). > however, it probably doesn't scale for the whole PID directory. > > Will try to invent some simple way to deal with it. Maybe have a may-ptrace check automatically applied to all entries that are not world-readable? ...Here's a trickier attack: open a sensitive /proc/PID/something file on fd 0, invoke a SUID/SGID/fscap'ed program. Depending on that program's functionality, it might read its stdin and it might report what it read from there (e.g., quote it in an error message complaining about invalid input data). I don't recall if this has been dealt with in any way or not. While these are good to research and fix, I am concerned about the scope creep. We might prefer to get a basic procfs mount options patch in first, then propose these enhancements. Thanks, Alexander ^ permalink raw reply [flat|nested] 29+ messages in thread
* [kernel-hardening] [RFC v2] procfs mount options 2011-06-06 18:33 ` Solar Designer @ 2011-06-08 17:23 ` Vasiliy Kulikov 2011-06-08 17:43 ` Vasiliy Kulikov 2011-06-12 2:39 ` Solar Designer 0 siblings, 2 replies; 29+ messages in thread From: Vasiliy Kulikov @ 2011-06-08 17:23 UTC (permalink / raw) To: kernel-hardening [-- Attachment #1: Type: text/plain, Size: 16085 bytes --] In this version I've completely discarded grsecurity way to chech permissions. Implementing customized inode_operations->permission() is MUCH simplier. The problem with old way, changing gid, is not backward-compatible - procfs files have tricky owners depending on their permissions, and some programs depend on perms AND uid/gid. So, in this version I don't actually change gid, but match with this gid in runtime (proc_pid_permission()). Additionally it solves the problem with suid binaries, but only for operations that use permissions checks like readdir(), open(), but not read(), write(), etc. *stat*() are denied too. The generic problem is not solved, though. This patch is as much compatible with the current behaviour as possible. It leaves permissions and changes runtime behaviour only. Nothing should be broken. The only problem with the patch is using net namespaces. If CONFIG_NET_NS=n then fake_net is created, but it is blank. So, in this case /proc/PID/net doesn't contain common files and netstat and other programs might spit with warnings. I think this version of the patch is ready for LKML review. diff --git a/fs/proc/base.c b/fs/proc/base.c index 9d096e8..78fbcdc 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -568,8 +568,38 @@ static int proc_setattr(struct dentry *dentry, struct iattr *attr) return 0; } +static int proc_pid_permission(struct inode *inode, int mask, unsigned int flags) +{ + struct pid_namespace *pid = inode->i_sb->s_fs_info; + struct task_struct *task = get_proc_task(inode); + + if (pid->hide_pid && + !ptrace_may_access(task, PTRACE_MODE_READ) && + !in_group_p(pid->pid_gid)) { + if (pid->hide_pid == 2) + return -ENOENT; + else + return -EPERM; + } + return generic_permission(inode, mask, flags, NULL); +} + +/* + * May current process learn task's euid/egid? + */ +static bool proc_pid_may_getattr(struct pid_namespace* pid, struct task_struct *task) +{ + if (pid->hide_pid < 2) + return true; + if (ptrace_may_access(task, PTRACE_MODE_READ)) + return true; + return in_group_p(pid->pid_gid); +} + + static const struct inode_operations proc_def_inode_operations = { .setattr = proc_setattr, + .permission = proc_pid_permission, }; static int mounts_open_common(struct inode *inode, struct file *file, @@ -1662,6 +1692,7 @@ static const struct inode_operations proc_pid_link_inode_operations = { .readlink = proc_pid_readlink, .follow_link = proc_pid_follow_link, .setattr = proc_setattr, + .permission = proc_pid_permission, }; @@ -1730,6 +1761,7 @@ static int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat struct inode *inode = dentry->d_inode; struct task_struct *task; const struct cred *cred; + struct pid_namespace *pid = dentry->d_sb->s_fs_info; generic_fillattr(inode, stat); @@ -1738,6 +1770,14 @@ static int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat stat->gid = 0; task = pid_task(proc_pid(inode), PIDTYPE_PID); if (task) { + if (!proc_pid_may_getattr(pid, task)) { + rcu_read_unlock(); + /* + * This doesn't prevent learning whether PID exists, + * it only makes getattr() consistent with readdir(). + */ + return -ENOENT; + } if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || task_dumpable(task)) { cred = __task_cred(task); @@ -2184,6 +2224,7 @@ static const struct inode_operations proc_fd_inode_operations = { .lookup = proc_lookupfd, .permission = proc_fd_permission, .setattr = proc_setattr, + .permission = proc_pid_permission, }; static struct dentry *proc_fdinfo_instantiate(struct inode *dir, @@ -2236,6 +2277,7 @@ static const struct file_operations proc_fdinfo_operations = { static const struct inode_operations proc_fdinfo_inode_operations = { .lookup = proc_lookupfdinfo, .setattr = proc_setattr, + .permission = proc_pid_permission, }; @@ -2473,6 +2515,7 @@ static const struct inode_operations proc_attr_dir_inode_operations = { .lookup = proc_attr_dir_lookup, .getattr = pid_getattr, .setattr = proc_setattr, + .permission = proc_pid_permission, }; #endif @@ -2890,6 +2933,7 @@ static const struct inode_operations proc_tgid_base_inode_operations = { .lookup = proc_tgid_base_lookup, .getattr = pid_getattr, .setattr = proc_setattr, + .permission = proc_pid_permission, }; static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid) @@ -3093,6 +3137,11 @@ static int proc_pid_fill_cache(struct file *filp, void *dirent, filldir_t filldi proc_pid_instantiate, iter.task, NULL); } +static int fake_filldir(void *buf, const char *name, int namelen, loff_t offset, u64 ino, unsigned d_type) +{ + return 0; +} + /* for the /proc/ directory itself, after non-process stuff has been done */ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir) { @@ -3100,6 +3149,7 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir) struct task_struct *reaper = get_proc_task(filp->f_path.dentry->d_inode); struct tgid_iter iter; struct pid_namespace *ns; + filldir_t __filldir; if (!reaper) goto out_no_task; @@ -3116,8 +3166,13 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir) for (iter = next_tgid(ns, iter); iter.task; iter.tgid += 1, iter = next_tgid(ns, iter)) { + if (proc_pid_may_getattr(ns, iter.task)) + __filldir = filldir; + else + __filldir = fake_filldir; + filp->f_pos = iter.tgid + TGID_OFFSET; - if (proc_pid_fill_cache(filp, dirent, filldir, iter) < 0) { + if (proc_pid_fill_cache(filp, dirent, __filldir, iter) < 0) { put_task_struct(iter.task); goto out; } @@ -3223,6 +3278,7 @@ static const struct inode_operations proc_tid_base_inode_operations = { .lookup = proc_tid_base_lookup, .getattr = pid_getattr, .setattr = proc_setattr, + .permission = proc_pid_permission, }; static struct dentry *proc_task_instantiate(struct inode *dir, @@ -3448,6 +3504,7 @@ static const struct inode_operations proc_task_inode_operations = { .lookup = proc_task_lookup, .getattr = proc_task_getattr, .setattr = proc_setattr, + .permission = proc_pid_permission, }; static const struct file_operations proc_task_operations = { diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 176ce4c..e4452bf 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -7,6 +7,7 @@ #include <linux/time.h> #include <linux/proc_fs.h> #include <linux/kernel.h> +#include <linux/pid_namespace.h> #include <linux/mm.h> #include <linux/string.h> #include <linux/stat.h> @@ -17,7 +18,9 @@ #include <linux/init.h> #include <linux/module.h> #include <linux/sysctl.h> +#include <linux/seq_file.h> #include <linux/slab.h> +#include <linux/mount.h> #include <asm/system.h> #include <asm/uaccess.h> @@ -93,12 +96,29 @@ void __init proc_init_inodecache(void) init_once); } +static int proc_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct super_block *sb = vfs->mnt_sb; + struct pid_namespace *pid = sb->s_fs_info; + + if (pid->pid_gid) + seq_printf(seq, ",gid=%lu", (unsigned long)pid->pid_gid); + if (pid->hide_pid != 0) + seq_printf(seq, ",hidepid=%u", pid->hide_pid); + if (pid->hide_net) + seq_printf(seq, ",hidenet"); + + return 0; +} + static const struct super_operations proc_sops = { .alloc_inode = proc_alloc_inode, .destroy_inode = proc_destroy_inode, .drop_inode = generic_delete_inode, .evict_inode = proc_evict_inode, .statfs = simple_statfs, + .remount_fs = proc_remount, + .show_options = proc_show_options, }; static void __pde_users_dec(struct proc_dir_entry *pde) @@ -423,6 +443,7 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de) inode = iget_locked(sb, de->low_ino); if (!inode) return NULL; + if (inode->i_state & I_NEW) { inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME; PROC_I(inode)->fd = 0; @@ -468,7 +489,7 @@ int proc_fill_super(struct super_block *s) s->s_magic = PROC_SUPER_MAGIC; s->s_op = &proc_sops; s->s_time_gran = 1; - + pde_get(&proc_root); root_inode = proc_get_inode(s, &proc_root); if (!root_inode) diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 9ad561d..1cacb6a 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -110,6 +110,7 @@ void pde_put(struct proc_dir_entry *pde); extern struct vfsmount *proc_mnt; int proc_fill_super(struct super_block *); struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *); +int proc_remount(struct super_block *sb, int *flags, char *data); /* * These are generic /proc routines that use the internal diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c index 9020ac1..3b4e89e 100644 --- a/fs/proc/proc_net.c +++ b/fs/proc/proc_net.c @@ -22,10 +22,13 @@ #include <linux/mount.h> #include <linux/nsproxy.h> #include <net/net_namespace.h> +#include <linux/pid_namespace.h> #include <linux/seq_file.h> #include "internal.h" +static struct net *fake_net; + static struct net *get_proc_net(const struct inode *inode) { @@ -105,6 +108,13 @@ static struct net *get_proc_task_net(struct inode *dir) struct task_struct *task; struct nsproxy *ns; struct net *net = NULL; + struct pid_namespace *pid = dir->i_sb->s_fs_info; + + if (pid->hide_net && + (!capable(CAP_NET_ADMIN) && !in_group_p(pid->pid_gid))) { + pr_err("return fake_net = %p\n", fake_net); + return get_net(fake_net); + } rcu_read_lock(); task = pid_task(proc_pid(dir), PIDTYPE_PID); @@ -236,6 +246,18 @@ static struct pernet_operations __net_initdata proc_net_ns_ops = { int __init proc_net_init(void) { proc_symlink("net", NULL, "self/net"); - return register_pernet_subsys(&proc_net_ns_ops); } + +int __init proc_net_initcall(void) +{ + fake_net = net_create(); + pr_err("fake_net = %p\n", fake_net); + if (fake_net == NULL) + return -ENOMEM; + + get_net(fake_net); + return 0; +} + +late_initcall(proc_net_initcall); diff --git a/fs/proc/root.c b/fs/proc/root.c index ef9fa8e..269067c 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -18,6 +18,7 @@ #include <linux/bitops.h> #include <linux/mount.h> #include <linux/pid_namespace.h> +#include <linux/parser.h> #include "internal.h" @@ -35,6 +36,75 @@ static int proc_set_super(struct super_block *sb, void *data) return set_anon_super(sb, NULL); } +enum { + Opt_gid, Opt_hidepid, Opt_hidenet, Opt_nohidenet, Opt_err, +}; + +static const match_table_t tokens = { + {Opt_hidepid, "hidepid=%u"}, + {Opt_gid, "gid=%u"}, + {Opt_hidenet, "hidenet"}, + {Opt_nohidenet, "nohidenet"}, + {Opt_err, NULL}, +}; + +static int proc_parse_options(char *options, struct pid_namespace *pid) +{ + char *p; + substring_t args[MAX_OPT_ARGS]; + int option; + + pr_debug("proc: options = %s\n", options); + + if (!options) + return 1; + + while ((p = strsep(&options, ",")) != NULL) { + int token; + if (!*p) + continue; + + args[0].to = args[0].from = 0; + token = match_token(p, tokens, args); + switch (token) { + case Opt_gid: + if (match_int(&args[0], &option)) + return 0; + pid->pid_gid = option; + break; + case Opt_hidepid: + if (match_int(&args[0], &option)) + return 0; + if (option < 0 || option > 2) { + pr_err("proc: hidepid value must be between 0 and 2.\n"); + return 0; + } + pid->hide_pid = option; + break; + case Opt_hidenet: + pid->hide_net = true; + break; + case Opt_nohidenet: + pid->hide_net = false; + break; + default: + pr_err("proc: unrecognized mount option \"%s\" " + "or missing value", p); + return 0; + } + } + + pr_debug("proc: gid = %u, hidepid = %o, hidenet = %d\n", pid->pid_gid, pid->hide_pid, (int)pid->hide_net); + + return 1; +} + +int proc_remount(struct super_block *sb, int *flags, char *data) +{ + struct pid_namespace *pid = sb->s_fs_info; + return !proc_parse_options(data, pid); +} + static struct dentry *proc_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { @@ -42,6 +112,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, struct super_block *sb; struct pid_namespace *ns; struct proc_inode *ei; + char *options; if (proc_mnt) { /* Seed the root directory with a pid so it doesn't need @@ -54,10 +125,13 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, ei->pid = find_get_pid(1); } - if (flags & MS_KERNMOUNT) + if (flags & MS_KERNMOUNT) { ns = (struct pid_namespace *)data; - else + options = NULL; + } else { ns = current->nsproxy->pid_ns; + options = data; + } sb = sget(fs_type, proc_test_super, proc_set_super, ns); if (IS_ERR(sb)) @@ -65,6 +139,10 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, if (!sb->s_root) { sb->s_flags = flags; + if (!proc_parse_options(options, ns)) { + deactivate_locked_super(sb); + return ERR_PTR(-EINVAL); + } err = proc_fill_super(sb); if (err) { deactivate_locked_super(sb); diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 38d1032..1c33094 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -30,6 +30,9 @@ struct pid_namespace { #ifdef CONFIG_BSD_PROCESS_ACCT struct bsd_acct_struct *bacct; #endif + gid_t pid_gid; + int hide_pid; + bool hide_net; }; extern struct pid_namespace init_pid_ns; diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 1bf812b..d40c61c 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -113,6 +113,8 @@ static inline struct net *copy_net_ns(unsigned long flags, struct net *net_ns) } #endif /* CONFIG_NET */ +extern struct net *net_create(void); + extern struct list_head net_namespace_list; diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 3f86026..51c3442 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -177,6 +177,7 @@ static struct net_generic *net_alloc_generic(void) return ng; } + #ifdef CONFIG_NET_NS static struct kmem_cache *net_cachep; static struct workqueue_struct *netns_wq; @@ -216,7 +217,7 @@ static void net_free(struct net *net) kmem_cache_free(net_cachep, net); } -static struct net *net_create(void) +struct net *net_create(void) { struct net *net; int rv; @@ -315,6 +316,64 @@ void __put_net(struct net *net) EXPORT_SYMBOL_GPL(__put_net); #else +static struct net *net_alloc(void) +{ + struct net *net = NULL; + struct net_generic *ng; + + ng = net_alloc_generic(); + if (!ng) + goto out; + + net = kzalloc(sizeof(*net), GFP_KERNEL); + if (!net) + goto out_free; + + rcu_assign_pointer(net->gen, ng); +out: + return net; + +out_free: + kfree(ng); + goto out; +} + +static void net_free(struct net *net) +{ +#ifdef NETNS_REFCNT_DEBUG + if (unlikely(atomic_read(&net->use_count) != 0)) { + printk(KERN_EMERG "network namespace not free! Usage: %d\n", + atomic_read(&net->use_count)); + return; + } +#endif + kfree(net->gen); + kfree(net); +} + +struct net *net_create(void) +{ + struct net *net; + int rv; + + net = net_alloc(); + if (!net) + return ERR_PTR(-ENOMEM); + mutex_lock(&net_mutex); + rv = setup_net(net); + if (rv == 0) { + rtnl_lock(); + list_add_tail_rcu(&net->list, &net_namespace_list); + rtnl_unlock(); + } + mutex_unlock(&net_mutex); + if (rv < 0) { + net_free(net); + return ERR_PTR(rv); + } + return net; +} + struct net *copy_net_ns(unsigned long flags, struct net *old_net) { if (flags & CLONE_NEWNET) -- [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v2] procfs mount options 2011-06-08 17:23 ` [kernel-hardening] [RFC v2] " Vasiliy Kulikov @ 2011-06-08 17:43 ` Vasiliy Kulikov 2011-06-12 2:39 ` Solar Designer 1 sibling, 0 replies; 29+ messages in thread From: Vasiliy Kulikov @ 2011-06-08 17:43 UTC (permalink / raw) To: kernel-hardening [-- Attachment #1: Type: text/plain, Size: 302 bytes --] On Wed, Jun 08, 2011 at 21:23 +0400, Vasiliy Kulikov wrote: > I think this version of the patch is ready for LKML review. I mean the patch is OK except net part, that's why LKML is needed now :) -- Vasiliy Kulikov http://www.openwall.com - bringing security into open computing environments [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v2] procfs mount options 2011-06-08 17:23 ` [kernel-hardening] [RFC v2] " Vasiliy Kulikov 2011-06-08 17:43 ` Vasiliy Kulikov @ 2011-06-12 2:39 ` Solar Designer 2011-07-24 18:55 ` Vasiliy Kulikov [not found] ` <20110724185036.GC3510@albatros> 1 sibling, 2 replies; 29+ messages in thread From: Solar Designer @ 2011-06-12 2:39 UTC (permalink / raw) To: kernel-hardening Vasiliy, On Wed, Jun 08, 2011 at 09:23:08PM +0400, Vasiliy Kulikov wrote: > I think this version of the patch is ready for LKML review. OK, post it - and CC kernel-hardening on your posting, as planned. (We'll see if we can afford the traffic, or if we need to split this list in two...) Unfortunately, I don't have time to review this more closely (in context), and no one else in here provided any comments. ;-( > + if (pid->hide_net && > + (!capable(CAP_NET_ADMIN) && !in_group_p(pid->pid_gid))) { As discussed, capable() should be the very last check. Thanks, Alexander ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v2] procfs mount options 2011-06-12 2:39 ` Solar Designer @ 2011-07-24 18:55 ` Vasiliy Kulikov [not found] ` <20110724185036.GC3510@albatros> 1 sibling, 0 replies; 29+ messages in thread From: Vasiliy Kulikov @ 2011-07-24 18:55 UTC (permalink / raw) To: kernel-hardening Hi, A small follow-up of my current work. I'm implementing the scheme I've proposed on LKML. Instead of on/off semantics a mask of allowed files is passed as a mount option. Like this: mount -o tgid_allowed=.;cmdline;mounts;smaps /proc All files not enumerated in tgid_allowed= will be restricted to processes being able to ptrace the task. Files enumerated in tgid_allowed= are not additionally restricted and classic POSIX permission checking rules are applied here. So, it's impossible to _relax_ procfs permissions with my scheme. ";" is used instead of "," as a comma is already used as options separator. Besides tgid_allowed= there will be attr_allowed= and tid_allowed= to enumerated files in appropriate subdirectories. As a "side effect" it will effectively fix all holes of poor permission checking of setuid binaries (including check and execve() race) without massive code refactoring. (All future procfs files are already protected too ;-)) Currently I have a working implementation of per-files restriction, but without mount option parsing. I'll post the code when I have a working implementation of at least parsing and handling tgid_allowed=XXX. Also there is a (minor, I hope) issue with /proc/PID/maps handling, where the ptrace check is too restrictive. TODO: net restriction, privileged monitoring gid. Thanks, -- Vasiliy Kulikov http://www.openwall.com - bringing security into open computing environments ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <20110724185036.GC3510@albatros>]
* Re: [kernel-hardening] [RFC v2] procfs mount options [not found] ` <20110724185036.GC3510@albatros> @ 2011-07-26 14:50 ` Vasiliy Kulikov 2011-07-29 17:47 ` [kernel-hardening] procfs {tid,tgid,attr}_allowed " Vasiliy Kulikov 0 siblings, 1 reply; 29+ messages in thread From: Vasiliy Kulikov @ 2011-07-26 14:50 UTC (permalink / raw) To: kernel-hardening Hi, This is a demonstration of procfs feature I'm writing now. It is not yet fully ready (need more clean code and need to divide the patch into series), however, it's ready for some demonstration: # mount -o remount,tgid_allowed=none,tid_allowed=none,attr_allowed=none /proc root@albatros:~# ls -ld /proc/1 dr-x------ 8 root root 0 2011-07-26 17:57 /proc/1 # mount -o remount,tgid_allowed=stat /proc # ls -ld /proc/1 dr-xr-xr-x 8 root root 0 2011-07-26 18:03 /proc/1 # ls -ld /proc/1/{stat,status,sched} -rw------- 1 root root 0 2011-07-26 18:03 /proc/1/sched -r--r--r-- 1 root root 0 2011-07-26 18:03 /proc/1/stat -r-------- 1 root root 0 2011-07-26 18:03 /proc/1/status # mount -o remount,tgid_allowed=all /proc # ls -ld /proc/1/{stat,status,sched} -rw-r--r-- 1 root root 0 2011-07-26 18:03 /proc/1/sched -r--r--r-- 1 root root 0 2011-07-26 18:03 /proc/1/stat -r--r--r-- 1 root root 0 2011-07-26 18:03 /proc/1/status # ls -ld /proc/1/task/1/ dr-xr-xr-x 6 root root 0 2011-07-26 18:03 /proc/1/task/1/ # ls -ld /proc/1/task/1/{stat,status,sched} -rw------- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/sched -r-------- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/stat -r-------- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/status # mount -o remount,tid_allowed=sched /proc # ls -ld /proc/1/task/1/{stat,status,sched} -rw-r--r-- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/sched -r-------- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/stat -r-------- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/status # mount -o remount,tid_allowed=all /proc # ls -ld /proc/1/task/1/{stat,status,sched} -rw-r--r-- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/sched -r--r--r-- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/stat -r--r--r-- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/status # ls -l /proc/1/attr/ итого 0 -rw------- 1 root root 0 2011-07-26 18:04 current -rw------- 1 root root 0 2011-07-26 18:04 exec -rw------- 1 root root 0 2011-07-26 18:04 fscreate -rw------- 1 root root 0 2011-07-26 18:04 keycreate -r-------- 1 root root 0 2011-07-26 18:04 prev -rw------- 1 root root 0 2011-07-26 18:04 sockcreate # mount -o remount,attr_allowed=exec\;sockcreate /proc # ls -l /proc/1/attr/ итого 0 -rw------- 1 root root 0 2011-07-26 18:05 current -rw-rw-rw- 1 root root 0 2011-07-26 18:05 exec -rw------- 1 root root 0 2011-07-26 18:05 fscreate -rw------- 1 root root 0 2011-07-26 18:05 keycreate -r-------- 1 root root 0 2011-07-26 18:05 prev -rw-rw-rw- 1 root root 0 2011-07-26 18:05 sockcreate # mount -o remount,attr_allowed=all /proc # ls -l /proc/1/attr/ итого 0 -rw-rw-rw- 1 root root 0 2011-07-26 18:05 current -rw-rw-rw- 1 root root 0 2011-07-26 18:05 exec -rw-rw-rw- 1 root root 0 2011-07-26 18:05 fscreate -rw-rw-rw- 1 root root 0 2011-07-26 18:05 keycreate -r--r--r-- 1 root root 0 2011-07-26 18:05 prev -rw-rw-rw- 1 root root 0 2011-07-26 18:05 sockcreate # mount -o remount,attr_allowed=none /proc # ls -l /proc/1/attr/ итого 0 -rw------- 1 root root 0 2011-07-26 18:08 current -rw------- 1 root root 0 2011-07-26 18:08 exec -rw------- 1 root root 0 2011-07-26 18:08 fscreate -rw------- 1 root root 0 2011-07-26 18:08 keycreate -r-------- 1 root root 0 2011-07-26 18:08 prev -rw------- 1 root root 0 2011-07-26 18:08 sockcreate As you see, there are different sets for /proc/PID, /proc/PID/task/TID/, /proc/PID/attr. I've slightly changed the semantics and removed ".", it is implicitly enabled iff something inside is enabled. To deny "." XXX=none should be passed. Questions/comments: 1) it seems to me "\;" is hard to pass, maybe ":" is a better separator? 2) almost all tid files are copies of tgid files. coredump_filter, mountstats, net are missing in task/. So, maybe fully remove tid_allowed= and copy the tgid files' permissions to the appropriate tid files? 3) currently the implementation is somewhat slow as every file is wrapped into additional file operations handlers. It cannot be removed for "fast" cases as all permissions of all files should be dynamically recalculated for every access. Probably I could make it faster a bit. It is the most significant drawback of the patch :-( Thanks, -- Vasiliy Kulikov ^ permalink raw reply [flat|nested] 29+ messages in thread
* [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options 2011-07-26 14:50 ` Vasiliy Kulikov @ 2011-07-29 17:47 ` Vasiliy Kulikov 2011-08-04 11:23 ` [kernel-hardening] " Vasiliy Kulikov 0 siblings, 1 reply; 29+ messages in thread From: Vasiliy Kulikov @ 2011-07-29 17:47 UTC (permalink / raw) To: kernel-hardening Hi, On Tue, Jul 26, 2011 at 18:50 +0400, Vasiliy Kulikov wrote: > This is a demonstration of procfs feature I'm writing now. It is not > yet fully ready (need more clean code and need to divide the patch into > series), however, it's ready for some demonstration: Now the patch seems rather stable, more or less clean. It lacks some testing, after that I'll send it as RFC to LKML. BTW, it fully solves the problem of procfs files kept across execve() (CVE-2011-1020). --- fs/proc/Makefile | 2 +- fs/proc/base.c | 253 ++++++++++++++++----------------- fs/proc/base_perms.c | 314 +++++++++++++++++++++++++++++++++++++++++ fs/proc/inode.c | 18 +++ fs/proc/internal.h | 24 +++ fs/proc/root.c | 121 ++++++++++++++++- fs/proc/task_nommu.c | 2 +- include/linux/pid_namespace.h | 13 ++ include/linux/proc_fs.h | 12 ++ kernel/pid.c | 2 + kernel/pid_namespace.c | 10 ++- kernel/sysctl.c | 1 + 12 files changed, 635 insertions(+), 137 deletions(-) -- diff --git a/fs/proc/Makefile b/fs/proc/Makefile index c1c7293..81020f9 100644 --- a/fs/proc/Makefile +++ b/fs/proc/Makefile @@ -8,7 +8,7 @@ proc-y := nommu.o task_nommu.o proc-$(CONFIG_MMU) := mmu.o task_mmu.o proc-y += inode.o root.o base.o generic.o array.o \ - proc_tty.o + proc_tty.o base_perms.o proc-y += cmdline.o proc-y += consoles.o proc-y += cpuinfo.o diff --git a/fs/proc/base.c b/fs/proc/base.c index fc5bc27..0af542e 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -98,40 +98,43 @@ * in /proc for a task before it execs a suid executable. */ -struct pid_entry { - char *name; - int len; - mode_t mode; - const struct inode_operations *iop; - const struct file_operations *fop; - union proc_op op; -}; - -#define NOD(NAME, MODE, IOP, FOP, OP) { \ +#define NOD(NAME, MODE, IOP, FOP, OP, PERMS) { \ .name = (NAME), \ .len = sizeof(NAME) - 1, \ .mode = MODE, \ .iop = IOP, \ .fop = FOP, \ .op = OP, \ + .need_perms_check = PERMS, \ } +/* + * XX_PERMS() are files without any ptrace() check. + * However, they have updated uid/gid and permissions on each file operation. + */ #define DIR(NAME, MODE, iops, fops) \ - NOD(NAME, (S_IFDIR|(MODE)), &iops, &fops, {} ) + NOD(NAME, (S_IFDIR|(MODE)), &iops, &fops, {}, true) #define LNK(NAME, get_link) \ NOD(NAME, (S_IFLNK|S_IRWXUGO), \ &proc_pid_link_inode_operations, NULL, \ - { .proc_get_link = get_link } ) + { .proc_get_link = get_link }, false) #define REG(NAME, MODE, fops) \ - NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}) + NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}, true) +#define REG_PERMS(NAME, MODE, fops) \ + NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}, false) #define INF(NAME, MODE, read) \ NOD(NAME, (S_IFREG|(MODE)), \ NULL, &proc_info_file_operations, \ - { .proc_read = read } ) + { .proc_read = read }, true ) +#define INF_PERMS(NAME, MODE, read) \ + NOD(NAME, (S_IFREG|(MODE)), \ + NULL, &proc_info_file_operations, \ + { .proc_read = read }, false ) #define ONE(NAME, MODE, show) \ NOD(NAME, (S_IFREG|(MODE)), \ NULL, &proc_single_file_operations, \ - { .proc_show = show } ) + { .proc_show = show }, true ) + /* * Count the number of hardlinks for the pid_entry table, excluding the . @@ -229,35 +232,12 @@ static struct mm_struct *__check_mem_permission(struct task_struct *task) return ERR_PTR(-EPERM); } -/* - * If current may access user memory in @task return a reference to the - * corresponding mm, otherwise ERR_PTR. - */ -static struct mm_struct *check_mem_permission(struct task_struct *task) -{ - struct mm_struct *mm; - int err; - - /* - * Avoid racing if task exec's as we might get a new mm but validate - * against old credentials. - */ - err = mutex_lock_killable(&task->signal->cred_guard_mutex); - if (err) - return ERR_PTR(err); - - mm = __check_mem_permission(task); - mutex_unlock(&task->signal->cred_guard_mutex); - - return mm; -} - struct mm_struct *mm_for_maps(struct task_struct *task) { struct mm_struct *mm; int err; - err = mutex_lock_killable(&task->signal->cred_guard_mutex); + err = mutex_lock_killable(&task->signal->cred_guard_mutex); if (err) return ERR_PTR(err); @@ -327,7 +307,6 @@ static int proc_pid_auxv(struct task_struct *task, char *buffer) return res; } - #ifdef CONFIG_KALLSYMS /* * Provides a wchan file via kallsyms in a proper one-value-per-file format. @@ -350,23 +329,6 @@ static int proc_pid_wchan(struct task_struct *task, char *buffer) } #endif /* CONFIG_KALLSYMS */ -static int lock_trace(struct task_struct *task) -{ - int err = mutex_lock_killable(&task->signal->cred_guard_mutex); - if (err) - return err; - if (!ptrace_may_access(task, PTRACE_MODE_ATTACH)) { - mutex_unlock(&task->signal->cred_guard_mutex); - return -EPERM; - } - return 0; -} - -static void unlock_trace(struct task_struct *task) -{ - mutex_unlock(&task->signal->cred_guard_mutex); -} - #ifdef CONFIG_STACKTRACE #define MAX_STACK_TRACE_DEPTH 64 @@ -388,16 +350,13 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns, trace.entries = entries; trace.skip = 0; - err = lock_trace(task); - if (!err) { - save_stack_trace_tsk(task, &trace); + save_stack_trace_tsk(task, &trace); - for (i = 0; i < trace.nr_entries; i++) { - seq_printf(m, "[<%pK>] %pS\n", - (void *)entries[i], (void *)entries[i]); - } - unlock_trace(task); + for (i = 0; i < trace.nr_entries; i++) { + seq_printf(m, "[<%pK>] %pS\n", + (void *)entries[i], (void *)entries[i]); } + kfree(entries); return err; @@ -563,9 +522,7 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer) { long nr; unsigned long args[6], sp, pc; - int res = lock_trace(task); - if (res) - return res; + int res; if (task_current_syscall(task, &nr, args, 6, &sp, &pc)) res = sprintf(buffer, "running\n"); @@ -577,7 +534,7 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer) nr, args[0], args[1], args[2], args[3], args[4], args[5], sp, pc); - unlock_trace(task); + return res; } #endif /* CONFIG_HAVE_ARCH_TRACEHOOK */ @@ -586,23 +543,6 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer) /* Here the fs part begins */ /************************************************************************/ -/* permission checks */ -static int proc_fd_access_allowed(struct inode *inode) -{ - struct task_struct *task; - int allowed = 0; - /* Allow access to a task's file descriptors if it is us or we - * may use ptrace attach to the process and find out that - * information. - */ - task = get_proc_task(inode); - if (task) { - allowed = ptrace_may_access(task, PTRACE_MODE_READ); - put_task_struct(task); - } - return allowed; -} - int proc_setattr(struct dentry *dentry, struct iattr *attr) { int error; @@ -839,7 +779,7 @@ static ssize_t mem_read(struct file * file, char __user * buf, if (!page) goto out; - mm = check_mem_permission(task); + mm = __check_mem_permission(task); ret = PTR_ERR(mm); if (IS_ERR(mm)) goto out_free; @@ -902,7 +842,7 @@ static ssize_t mem_write(struct file * file, const char __user *buf, if (!page) goto out_task; - mm = check_mem_permission(task); + mm = __check_mem_permission(task); copied = PTR_ERR(mm); if (IS_ERR(mm)) goto out_free; @@ -1613,12 +1553,7 @@ static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd) /* We don't need a base pointer in the /proc filesystem */ path_put(&nd->path); - /* Are we allowed to snoop on the tasks file descriptors? */ - if (!proc_fd_access_allowed(inode)) - goto out; - error = PROC_I(inode)->op.proc_get_link(inode, &nd->path); -out: return ERR_PTR(error); } @@ -1652,9 +1587,6 @@ static int proc_pid_readlink(struct dentry * dentry, char __user * buffer, int b struct inode *inode = dentry->d_inode; struct path path; - /* Are we allowed to snoop on the tasks file descriptors? */ - if (!proc_fd_access_allowed(inode)) - goto out; error = PROC_I(inode)->op.proc_get_link(inode, &path); if (error) @@ -1758,6 +1690,8 @@ int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) /* dentry stuff */ +static void pid_revalidate_perms(struct dentry *dentry); + /* * Exceptional case: normally we are not allowed to unhash a busy * directory. In this case, however, we can do it - no aliasing problems @@ -1785,6 +1719,8 @@ int pid_revalidate(struct dentry *dentry, struct nameidata *nd) inode = dentry->d_inode; task = get_proc_task(inode); + pid_revalidate_perms(dentry); + if (task) { if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || task_dumpable(task)) { @@ -2200,7 +2136,9 @@ static struct dentry *proc_fdinfo_instantiate(struct inode *dir, ei = PROC_I(inode); ei->fd = fd; inode->i_mode = S_IFREG | S_IRUSR; - inode->i_fop = &proc_fdinfo_file_operations; + ei->real_fops = &proc_fdinfo_file_operations; + inode->i_fop = &proc_pid_perms_fops; + d_set_d_op(dentry, &tid_fd_dentry_operations); d_add(dentry, inode); /* Close the race of the process dying before we return the dentry */ @@ -2238,7 +2176,6 @@ static const struct inode_operations proc_fdinfo_inode_operations = { .setattr = proc_setattr, }; - static struct dentry *proc_pident_instantiate(struct inode *dir, struct dentry *dentry, struct task_struct *task, const void *ptr) { @@ -2255,11 +2192,16 @@ static struct dentry *proc_pident_instantiate(struct inode *dir, inode->i_mode = p->mode; if (S_ISDIR(inode->i_mode)) inode->i_nlink = 2; /* Use getattr to fix if necessary */ + if (p->iop) inode->i_op = p->iop; if (p->fop) - inode->i_fop = p->fop; + ei->real_fops = p->fop; ei->op = p->op; + inode->i_fop = &proc_pid_perms_fops; + + ei->dirent = p; + d_set_d_op(dentry, &pid_dentry_operations); d_add(dentry, inode); /* Close the race of the process dying before we return the dentry */ @@ -2417,15 +2359,9 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf, if (copy_from_user(page, buf, count)) goto out_free; - /* Guard against adverse ptrace interaction */ - length = mutex_lock_interruptible(&task->signal->cred_guard_mutex); - if (length < 0) - goto out_free; - length = security_setprocattr(task, (char*)file->f_path.dentry->d_name.name, (void*)page, count); - mutex_unlock(&task->signal->cred_guard_mutex); out_free: free_page((unsigned long) page); out: @@ -2469,7 +2405,7 @@ static struct dentry *proc_attr_dir_lookup(struct inode *dir, attr_dir_stuff, ARRAY_SIZE(attr_dir_stuff)); } -static const struct inode_operations proc_attr_dir_inode_operations = { +const struct inode_operations proc_attr_dir_inode_operations = { .lookup = proc_attr_dir_lookup, .getattr = pid_getattr, .setattr = proc_setattr, @@ -2617,7 +2553,7 @@ static const struct inode_operations proc_self_inode_operations = { */ static const struct pid_entry proc_base_stuff[] = { NOD("self", S_IFLNK|S_IRWXUGO, - &proc_self_inode_operations, NULL, {}), + &proc_self_inode_operations, NULL, {}, false), }; static struct dentry *proc_base_instantiate(struct inode *dir, @@ -2653,9 +2589,12 @@ static struct dentry *proc_base_instantiate(struct inode *dir, inode->i_size = 64; if (p->iop) inode->i_op = p->iop; + if (p->fop) - inode->i_fop = p->fop; + ei->real_fops = p->fop; ei->op = p->op; + inode->i_fop = &proc_pid_perms_fops; + d_add(dentry, inode); error = NULL; out: @@ -2708,9 +2647,6 @@ static int do_io_accounting(struct task_struct *task, char *buffer, int whole) struct task_io_accounting acct = task->ioac; unsigned long flags; - if (!ptrace_may_access(task, PTRACE_MODE_READ)) - return -EACCES; - if (whole && lock_task_sighand(task, &flags)) { struct task_struct *t = task; @@ -2751,12 +2687,8 @@ static int proc_tgid_io_accounting(struct task_struct *task, char *buffer) static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { - int err = lock_trace(task); - if (!err) { - seq_printf(m, "%08x\n", task->personality); - unlock_trace(task); - } - return err; + seq_printf(m, "%08x\n", task->personality); + return 0; } /* @@ -2773,8 +2705,8 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_NET DIR("net", S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations), #endif - REG("environ", S_IRUSR, proc_environ_operations), - INF("auxv", S_IRUSR, proc_pid_auxv), + REG_PERMS("environ", S_IRUSR, proc_environ_operations), + INF_PERMS("auxv", S_IRUSR, proc_pid_auxv), ONE("status", S_IRUGO, proc_pid_status), ONE("personality", S_IRUGO, proc_pid_personality), INF("limits", S_IRUGO, proc_pid_limits), @@ -2791,9 +2723,9 @@ static const struct pid_entry tgid_base_stuff[] = { INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tgid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG_PERMS("maps", S_IRUGO, proc_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG_PERMS("numa_maps", S_IRUGO, proc_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -2804,7 +2736,7 @@ static const struct pid_entry tgid_base_stuff[] = { REG("mountstats", S_IRUSR, proc_mountstats_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG_PERMS("smaps", S_IRUGO, proc_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY @@ -2867,7 +2799,7 @@ static struct dentry *proc_tgid_base_lookup(struct inode *dir, struct dentry *de tgid_base_stuff, ARRAY_SIZE(tgid_base_stuff)); } -static const struct inode_operations proc_tgid_base_inode_operations = { +const struct inode_operations proc_tgid_base_inode_operations = { .lookup = proc_tgid_base_lookup, .getattr = pid_getattr, .setattr = proc_setattr, @@ -2972,9 +2904,10 @@ static struct dentry *proc_pid_instantiate(struct inode *dir, if (!inode) goto out; - inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO; + inode->i_mode = S_IFDIR | S_IRUGO | S_IXUGO; inode->i_op = &proc_tgid_base_inode_operations; - inode->i_fop = &proc_tgid_base_operations; + inode->i_fop = &proc_pid_perms_fops; + PROC_I(inode)->real_fops = &proc_tgid_base_operations; inode->i_flags|=S_IMMUTABLE; inode->i_nlink = 2 + pid_entry_count_dirs(tgid_base_stuff, @@ -3122,8 +3055,8 @@ static const struct pid_entry tid_base_stuff[] = { DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations), DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations), DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations), - REG("environ", S_IRUSR, proc_environ_operations), - INF("auxv", S_IRUSR, proc_pid_auxv), + REG_PERMS("environ", S_IRUSR, proc_environ_operations), + INF_PERMS("auxv", S_IRUSR, proc_pid_auxv), ONE("status", S_IRUGO, proc_pid_status), ONE("personality", S_IRUGO, proc_pid_personality), INF("limits", S_IRUGO, proc_pid_limits), @@ -3137,9 +3070,9 @@ static const struct pid_entry tid_base_stuff[] = { INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG_PERMS("maps", S_IRUGO, proc_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG_PERMS("numa_maps", S_IRUGO, proc_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3149,7 +3082,7 @@ static const struct pid_entry tid_base_stuff[] = { REG("mountinfo", S_IRUGO, proc_mountinfo_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG_PERMS("smaps", S_IRUGO, proc_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY @@ -3209,7 +3142,7 @@ static const struct file_operations proc_tid_base_operations = { .llseek = default_llseek, }; -static const struct inode_operations proc_tid_base_inode_operations = { +const struct inode_operations proc_tid_base_inode_operations = { .lookup = proc_tid_base_lookup, .getattr = pid_getattr, .setattr = proc_setattr, @@ -3226,7 +3159,8 @@ static struct dentry *proc_task_instantiate(struct inode *dir, goto out; inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO; inode->i_op = &proc_tid_base_inode_operations; - inode->i_fop = &proc_tid_base_operations; + inode->i_fop = &proc_pid_perms_fops; + PROC_I(inode)->real_fops = &proc_tid_base_operations; inode->i_flags|=S_IMMUTABLE; inode->i_nlink = 2 + pid_entry_count_dirs(tid_base_stuff, @@ -3445,3 +3379,58 @@ static const struct file_operations proc_task_operations = { .readdir = proc_task_readdir, .llseek = default_llseek, }; + +const int perms_size[] = { + [PROC_PERMS_NTID] = ARRAY_SIZE(tid_base_stuff), + [PROC_PERMS_NTGID] = ARRAY_SIZE(tgid_base_stuff), + [PROC_PERMS_NATTR] = ARRAY_SIZE(attr_dir_stuff), +}; + +const struct pid_entry *dir_ents[] = { + [PROC_PERMS_NTID] = tid_base_stuff, + [PROC_PERMS_NTGID] = tgid_base_stuff, + [PROC_PERMS_NATTR] = attr_dir_stuff, +}; + +static void pid_revalidate_perms(struct dentry *dentry) +{ + struct inode *inode; + struct pid_namespace *pid_ns; + struct proc_perms *perms; + const struct inode_operations *parent_op; + struct proc_inode *ei; + + pid_ns = dentry->d_sb->s_fs_info; + perms = &pid_ns->proc_perms; + inode = dentry->d_inode; + parent_op = dentry->d_parent->d_inode->i_op; + ei = PROC_I(inode); + + /* We never ever chmod symlinks (XXX: symlinks must be protected too) */ + if (S_ISLNK(inode->i_mode)) + return; + + if (inode->i_op == &proc_tgid_base_inode_operations || + inode->i_op == &proc_tid_base_inode_operations) { + mode_t mode = S_IFDIR | S_IRUGO | S_IXUGO; + mode_t mask = perms->proc_pid_allowed ? ~0 : ~077; + inode->i_mode = mode & mask; + } else if (parent_op == &proc_tgid_base_inode_operations) { + inode->i_mode = get_ent_perms(ei->dirent, + perms, + PROC_PERMS_NTGID, + (const struct pid_entry *)ei->dirent - &tgid_base_stuff[0]); + } else if (parent_op == &proc_tid_base_inode_operations) { + inode->i_mode = get_ent_perms(ei->dirent, + perms, + PROC_PERMS_NTID, + (const struct pid_entry *)ei->dirent - &tid_base_stuff[0]); + } else if (parent_op == &proc_attr_dir_inode_operations) { + inode->i_mode = get_ent_perms(ei->dirent, + perms, + PROC_PERMS_NATTR, + (struct pid_entry *)ei->dirent - &attr_dir_stuff[0]); + } else { + pr_err("other not found (%s)\n", dentry->d_name.name); + } +} diff --git a/fs/proc/base_perms.c b/fs/proc/base_perms.c new file mode 100644 index 0000000..9d05e7a --- /dev/null +++ b/fs/proc/base_perms.c @@ -0,0 +1,314 @@ +/* + * linux/fs/proc/base_perms.c + * + * Copyright (C) 2011 Vasiliy Kulikov + * + * proc base directory permissions handling functions + */ + +#include <asm/uaccess.h> +#include <linux/proc_fs.h> +#include <linux/string.h> +#include <linux/ptrace.h> +#include <linux/poll.h> +#include <linux/pid_namespace.h> +#include "internal.h" + + +/* Similar to acl_permission_check(), but with procfs specific changes */ +static int proc_permission_check(struct task_struct *task, struct inode *inode, int mask) +{ + mode_t mode = inode->i_mode; + uid_t uid = inode->i_uid; + gid_t gid = inode->i_gid; + + if (mode & mask) + return 0; + + /* XXX: This should be updated conformably to user ns code changes. + * Or maybe just remove it as ns users may not have any access to procfs? */ + if (current_user_ns() != inode_userns(inode)) + return -EACCES; + + if (in_group_p(gid) && ((mode >> 3) & mask)) + return 0; + + /* + * The major changes compared to acl_permission_check(): + * 1) we check euid, not fsuid + * 2) we additionally check ptrace ability + */ + if (current_euid() == uid) { + int pmode = (mask == MAY_WRITE) ? PTRACE_MODE_ATTACH : PTRACE_MODE_READ; + if (ptrace_may_access(task, pmode)) + return 0; + } + + return -EACCES; +} + +static void unlock_pid_trace(struct task_struct *task); + +/* + * All operations with /proc/PID/ directory MUST be processed + * under lock_pid_trace() to avoid execve() races. + * + * It checks permissions similar to acl_permission_check() and probably holds + * (*task)->signal->cred_guard_mutex. If so, *task would point to the target + * task, NULL otherwise. To unlock the mutex call unlock_pid_trace() with the + * same *task argument. + */ +static int lock_pid_trace(struct dentry *dentry, + int mask, + struct task_struct **task) +{ + int err; + struct inode *inode = dentry->d_inode; + const struct pid_entry *pe = PROC_I(inode)->dirent; + + *task = get_proc_task(inode); + if (*task == NULL) + return -ESRCH; + + err = mutex_lock_killable(&(*task)->signal->cred_guard_mutex); + if (err) + goto put_task; + + /* + * We have to revalidate both uid/gid and permissions. + * uid/gid revalidation is racy against execve(), so we do it under + * ->cred_guard_mutex. + * + * We cannot just update uid/gid because of LSM. + */ + /* XXX: locking? */ +#if 1 + err = -ESRCH; + if (pid_revalidate(dentry, NULL) == 0) + goto free_mutex; +#endif + + if (!pe->need_perms_check) { + /* OK, the file checks permissions on his own */ + unlock_pid_trace(*task); + *task = NULL; + return 0; + } + + err = -EACCES; + if (proc_permission_check(*task, inode, mask)) + goto free_mutex; + + err = 0; + goto exit; + +free_mutex: + mutex_unlock(&(*task)->signal->cred_guard_mutex); +put_task: + put_task_struct(*task); + *task = NULL; +exit: + return err; +} + +static void unlock_pid_trace(struct task_struct *task) +{ + if (task) { + mutex_unlock(&task->signal->cred_guard_mutex); + put_task_struct(task); + } +} + +static int proc_pid_perms_open(struct inode *inode, struct file *file) +{ + struct proc_inode *pi = PROC_I(inode); + + /* No ptrace check here. All checks should be located + * in read/write/readdir/etc. */ + + if (pi->real_fops->open) + return pi->real_fops->open(inode, file); + return 0; +} + +static int proc_pid_perms_release(struct inode *inode, struct file *file) +{ + struct proc_inode *pi = PROC_I(inode); + + if (pi->real_fops->release) + return pi->real_fops->release(inode, file); + return 0; +} + +static ssize_t proc_pid_perms_read(struct file * file, char __user * buf, + size_t count, loff_t *ppos) +{ + struct dentry *dentry = file->f_dentry; + struct proc_inode *pi = PROC_I(dentry->d_inode); + int rc; + struct task_struct *task; + + if (pi->real_fops->read) { + rc = lock_pid_trace(dentry, MAY_READ, &task); + if (!rc) + rc = pi->real_fops->read(file, buf, count, ppos); + unlock_pid_trace(task); + return rc; + } + + /* Should never happen, but better safe than sorry */ + return -EPERM; +} + +static ssize_t proc_pid_perms_write(struct file *file, const char __user *buf, + size_t count, loff_t *offs) +{ + struct dentry *dentry = file->f_dentry; + struct proc_inode *pi = PROC_I(dentry->d_inode); + int rc; + struct task_struct *task; + + if (pi->real_fops->write) { + rc = lock_pid_trace(dentry, MAY_WRITE, &task); + if (!rc) + rc = pi->real_fops->write(file, buf, count, offs); + unlock_pid_trace(task); + return rc; + } + + /* Should never happen, but better safe than sorry */ + return -EPERM; +} + +static int proc_pid_perms_readdir(struct file *file, void *dirent, filldir_t filldir) +{ + struct dentry *dentry = file->f_dentry; + struct proc_inode *pi = PROC_I(dentry->d_inode); + int rc; + struct task_struct *task; + + if (pi->real_fops->readdir) { + rc = lock_pid_trace(dentry, MAY_READ, &task); + if (!rc) + rc = pi->real_fops->readdir(file, dirent, filldir); + unlock_pid_trace(task); + return rc; + } + + /* Should never happen, but better safe than sorry */ + return -ENOTDIR; +} + +static unsigned proc_pid_perms_poll(struct file *file, poll_table *wait) +{ + struct dentry *dentry = file->f_dentry; + struct proc_inode *pi = PROC_I(dentry->d_inode); + struct task_struct *task; + int rc; + + /* Check perms before the poll. There is a race against execve(), + * but hopefuly the infoleak is very minor */ + rc = lock_pid_trace(dentry, MAY_READ, &task); + unlock_pid_trace(task); + if (rc) + return rc; + + if (pi->real_fops->poll) + return pi->real_fops->poll(file, wait); + + /* The fallback for files not implementing poll */ + return DEFAULT_POLLMASK; +} + +static loff_t proc_pid_perms_llseek(struct file *file, loff_t offset, int orig) +{ + struct proc_inode *pi = PROC_I(file->f_dentry->d_inode); + + /* Looks like llseek() doesn't need any ptrace protection */ + + if (pi->real_fops->llseek) + return pi->real_fops->llseek(file, offset, orig); + + return no_llseek(file, offset, orig); +} + +/* + * Global wrapper fops for _ALL_ files inside /proc/PID/. + * + * It checks the permissions (including ptrace() check) of the current + * task on all file ops. The runtime checks protect from holding any fd + * across execve() of set*id binaries. task->signal->cred_guard_mutex + * is held during read(), readdir(), and write(). open(), release(), + * llseek() are not protected as they are harmless anyway. It makes + * effort to check permissions on poll(), but it doesn't hold the + * mutex during actual fops->poll() as poll() might sleep for much time. + * + * It makes sense to use proc_pid_perms_fops even for files doing their + * own checks (e.g. REG_PERMS()). + */ +const struct file_operations proc_pid_perms_fops = { + .open = proc_pid_perms_open, + .read = proc_pid_perms_read, + .readdir = proc_pid_perms_readdir, + .write = proc_pid_perms_write, + .llseek = proc_pid_perms_llseek, + .poll = proc_pid_perms_poll, + .release = proc_pid_perms_release, +}; + +mode_t get_ent_perms(const struct pid_entry *p, + struct proc_perms *perms, + int ndir, + int nent) +{ + mode_t umask = perms->proc_ent_allowed[ndir][nent] ? 0 : 077; + return p->mode & ~umask; +} + +int init_proc_perms(struct proc_perms *p) +{ + int i; + + p->proc_pid_allowed = true; + for (i = 0; i < PROC_PERMS_NMAX; i++) { + size_t size = perms_size[i]*sizeof(bool); + p->proc_ent_allowed[i] = kzalloc(size, GFP_KERNEL); + if (p->proc_ent_allowed[i] == NULL) + goto err; + } + + return 0; +err: + for (i-- ; i >= 0; i--) + kfree(p->proc_ent_allowed[i]); + return -ENOMEM; +} + +void free_proc_perms(struct proc_perms *p) +{ + int i; + + for (i = 0; i < PROC_PERMS_NMAX; i++) + kfree(p->proc_ent_allowed[i]); +} + +void fill_proc_perms_ent(struct proc_perms *p, int ndir, bool val) +{ + int i; + for (i = 0; i < perms_size[ndir]; i++) + p->proc_ent_allowed[ndir][i] = val; +} + +bool set_proc_perms_ent(struct proc_perms *p, int ndir, const char *fname, bool val) +{ + int i; + for (i = 0; i < perms_size[ndir]; i++) + if (!strcmp(dir_ents[ndir][i].name, fname)) { + pr_err("found %s (%d)\n", fname, i); + p->proc_ent_allowed[ndir][i] = val; + return true; + } + + pr_err("not found %s (%d)\n", fname, i); + return false; +} diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 74b48cf..0a02c79 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -18,6 +18,10 @@ #include <linux/module.h> #include <linux/sysctl.h> #include <linux/slab.h> +#include <linux/pid_namespace.h> +#include <linux/seq_file.h> +#include <linux/mount.h> + #include <asm/system.h> #include <asm/uaccess.h> @@ -69,6 +73,7 @@ static struct inode *proc_alloc_inode(struct super_block *sb) ei->sysctl_entry = NULL; ei->ns = NULL; ei->ns_ops = NULL; + ei->real_fops = NULL; inode = &ei->vfs_inode; inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME; return inode; @@ -102,12 +107,25 @@ void __init proc_init_inodecache(void) init_once); } +static int proc_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct super_block *sb = vfs->mnt_sb; + struct pid_namespace *pid = sb->s_fs_info; + + if (pid->proc_perms.proc_pid_allowed) + seq_printf(seq, ",proc_pid_allowed=."); + + return 0; +} + static const struct super_operations proc_sops = { .alloc_inode = proc_alloc_inode, .destroy_inode = proc_destroy_inode, .drop_inode = generic_delete_inode, .evict_inode = proc_evict_inode, .statfs = simple_statfs, + .remount_fs = proc_remount, + .show_options = proc_show_options, }; static void __pde_users_dec(struct proc_dir_entry *pde) diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 7838e5c..35c760a 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -141,7 +141,31 @@ struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct *t extern const struct dentry_operations pid_dentry_operations; int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat); int proc_setattr(struct dentry *dentry, struct iattr *attr); +int proc_remount(struct super_block *sb, int *flags, char *data); extern const struct inode_operations proc_ns_dir_inode_operations; extern const struct file_operations proc_ns_dir_operations; +extern const struct inode_operations proc_tgid_base_inode_operations; +extern const struct inode_operations proc_tid_base_inode_operations; +extern const struct inode_operations proc_attr_dir_inode_operations; + +extern const struct file_operations proc_pid_perms_fops; +extern const int perms_size[]; +extern const struct pid_entry *dir_ents[]; + +extern mode_t get_ent_perms(const struct pid_entry *p, + struct proc_perms *perms, + int ndir, + int nent); + +struct pid_entry { + char *name; + int len; + mode_t mode; + const struct inode_operations *iop; + const struct file_operations *fop; + union proc_op op; + bool need_perms_check; +}; + diff --git a/fs/proc/root.c b/fs/proc/root.c index d6c3b41..9193fa1 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -18,6 +18,7 @@ #include <linux/bitops.h> #include <linux/mount.h> #include <linux/pid_namespace.h> +#include <linux/parser.h> #include "internal.h" @@ -36,6 +37,112 @@ static int proc_set_super(struct super_block *sb, void *data) return err; } +enum { + Opt_err, Opt_tid_allowed, Opt_tgid_allowed, Opt_attr_allowed, +}; + +static const match_table_t tokens = { + {Opt_tgid_allowed, "tgid_allowed=%s"}, + {Opt_tid_allowed, "tid_allowed=%s"}, + {Opt_attr_allowed, "attr_allowed=%s"}, + {Opt_err, NULL}, +}; + +static bool proc_parse_xallowed(struct proc_perms *p, int ndir, char *val, bool set_pid_allowed) +{ + char *field; + + fill_proc_perms_ent(p, ndir, false); + + if (strcmp(val, "none") == 0) { + if (set_pid_allowed) + p->proc_pid_allowed = false; + return true; + } + if (strcmp(val, "all") == 0) { + fill_proc_perms_ent(p, ndir, true); + if (set_pid_allowed) + p->proc_pid_allowed = true; + return true; + } + + while ((field = strsep(&val, ";")) != NULL) { + if (!set_proc_perms_ent(p, ndir, field, true)) + return false; + } + + if (set_pid_allowed) + p->proc_pid_allowed = true; + return true; +} + +static int proc_parse_options(char *options, struct pid_namespace *pid) +{ + char *p, *val; + substring_t args[MAX_OPT_ARGS]; + + pr_debug("proc: options = %s\n", options); + + if (!options) + return 1; + + while ((p = strsep(&options, ",")) != NULL) { + int token; + if (!*p) + continue; + + args[0].to = args[0].from = 0; + token = match_token(p, tokens, args); + switch (token) { + case Opt_tgid_allowed: + val = match_strdup(&args[0]); + if (!val) + return 0; + if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NTGID, val, true)) { + kfree(val); + return 0; + } + kfree(val); + break; + case Opt_tid_allowed: + val = match_strdup(&args[0]); + if (!val) + return 0; + if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NTID, val, false)) { + kfree(val); + return 0; + } + kfree(val); + break; + case Opt_attr_allowed: + val = match_strdup(&args[0]); + if (!val) + return 0; + if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NATTR, val, false)) { + kfree(val); + return 0; + } + kfree(val); + break; + default: + pr_err("proc: unrecognized mount option \"%s\" " + "or missing value", p); + return 0; + } + } + + //pr_debug("proc: gid = %u, hidepid = %o, hidenet = %d\n", pid->pid_gid, pid->hide_pid, (int)pid->hide_net); + + return 1; +} + +int proc_remount(struct super_block *sb, int *flags, char *data) +{ + struct pid_namespace *pid = sb->s_fs_info; + return !proc_parse_options(data, pid); +} + + static struct dentry *proc_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { @@ -43,11 +150,16 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, struct super_block *sb; struct pid_namespace *ns; struct proc_inode *ei; + char *options; - if (flags & MS_KERNMOUNT) + + if (flags & MS_KERNMOUNT) { ns = (struct pid_namespace *)data; - else + options = NULL; + } else { ns = current->nsproxy->pid_ns; + options = data; + } sb = sget(fs_type, proc_test_super, proc_set_super, ns); if (IS_ERR(sb)) @@ -55,6 +167,10 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, if (!sb->s_root) { sb->s_flags = flags; + if (!proc_parse_options(options, ns)) { + deactivate_locked_super(sb); + return ERR_PTR(-EINVAL); + } err = proc_fill_super(sb); if (err) { deactivate_locked_super(sb); @@ -74,6 +190,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, return dget(sb->s_root); } + static void proc_kill_sb(struct super_block *sb) { struct pid_namespace *ns; diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 980de54..da98ba2 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -201,7 +201,7 @@ static void *m_start(struct seq_file *m, loff_t *pos) if (!priv->task) return ERR_PTR(-ESRCH); - mm = mm_for_maps(priv->task); + mm = get_task_mm(priv->task); if (!mm || IS_ERR(mm)) { put_task_struct(priv->task); priv->task = NULL; diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 38d1032..ed10329 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -16,6 +16,18 @@ struct pidmap { struct bsd_acct_struct; +enum { + PROC_PERMS_NTID, + PROC_PERMS_NTGID, + PROC_PERMS_NATTR, + PROC_PERMS_NMAX +}; + +struct proc_perms { + bool proc_pid_allowed; + bool *proc_ent_allowed[PROC_PERMS_NMAX]; +}; + struct pid_namespace { struct kref kref; struct pidmap pidmap[PIDMAP_ENTRIES]; @@ -30,6 +42,7 @@ struct pid_namespace { #ifdef CONFIG_BSD_PROCESS_ACCT struct bsd_acct_struct *bacct; #endif + struct proc_perms proc_perms; }; extern struct pid_namespace init_pid_ns; diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index e7576cf..a1525d7 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -115,10 +115,17 @@ struct proc_dir_entry *proc_create_data(const char *name, mode_t mode, extern void remove_proc_entry(const char *name, struct proc_dir_entry *parent); struct pid_namespace; +struct proc_perms; extern int pid_ns_prepare_proc(struct pid_namespace *ns); extern void pid_ns_release_proc(struct pid_namespace *ns); +extern int init_proc_perms(struct proc_perms *p); +extern void free_proc_perms(struct proc_perms *p); + +extern void fill_proc_perms_ent(struct proc_perms *p, int ndir, bool val); +extern bool set_proc_perms_ent(struct proc_perms *p, int ndir, const char *fname, bool val); + /* * proc_tty.c */ @@ -267,6 +274,11 @@ struct proc_inode { struct pid *pid; int fd; union proc_op op; + + /* Used by files inside /proc/PID/ only */ + const struct file_operations *real_fops; + const void *dirent; + struct proc_dir_entry *pde; struct ctl_table_header *sysctl; struct ctl_table *sysctl_entry; diff --git a/kernel/pid.c b/kernel/pid.c index 57a8346..ffa8b37 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -565,6 +565,8 @@ void __init pidmap_init(void) set_bit(0, init_pid_ns.pidmap[0].page); atomic_dec(&init_pid_ns.pidmap[0].nr_free); + BUG_ON(init_proc_perms(&init_pid_ns.proc_perms)); + init_pid_ns.pid_cachep = KMEM_CACHE(pid, SLAB_HWCACHE_ALIGN | SLAB_PANIC); } diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index e9c9adc..ab4261c 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -97,12 +97,18 @@ static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_p for (i = 1; i < PIDMAP_ENTRIES; i++) atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE); - err = pid_ns_prepare_proc(ns); + err = init_proc_perms(&ns->proc_perms); if (err) goto out_put_parent_pid_ns; + err = pid_ns_prepare_proc(ns); + if (err) + goto out_free_proc_perms; + return ns; +out_free_proc_perms: + free_proc_perms(&ns->proc_perms); out_put_parent_pid_ns: put_pid_ns(parent_pid_ns); out_free_map: @@ -119,6 +125,8 @@ static void destroy_pid_namespace(struct pid_namespace *ns) for (i = 0; i < PIDMAP_ENTRIES; i++) kfree(ns->pidmap[i].page); + + free_proc_perms(&ns->proc_perms); kmem_cache_free(pid_ns_cachep, ns); } diff --git a/kernel/sysctl.c b/kernel/sysctl.c index f175d98..745b71e 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -57,6 +57,7 @@ #include <linux/pipe_fs_i.h> #include <linux/oom.h> #include <linux/kmod.h> +#include <linux/pid_namespace.h> #include <asm/uaccess.h> #include <asm/processor.h> -- ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [kernel-hardening] Re: procfs {tid,tgid,attr}_allowed mount options 2011-07-29 17:47 ` [kernel-hardening] procfs {tid,tgid,attr}_allowed " Vasiliy Kulikov @ 2011-08-04 11:23 ` Vasiliy Kulikov 2011-08-10 10:02 ` Vasiliy Kulikov 2011-08-10 11:25 ` Solar Designer 0 siblings, 2 replies; 29+ messages in thread From: Vasiliy Kulikov @ 2011-08-04 11:23 UTC (permalink / raw) To: kernel-hardening Hi, New version. Cleanups/fixes here and there. It lacks net/ restriction, but IMO it is already complicated enough (more than 500 new lines). Such (relatively) simple thing as net_allowed= is an additional care. I'd achieve at least processes restrictions in upstream, after it will come networking. --- fs/proc/Makefile | 2 +- fs/proc/base.c | 286 +++++++++++++++++++++------------------ fs/proc/base_perms.c | 305 +++++++++++++++++++++++++++++++++++++++++ fs/proc/inode.c | 19 +++ fs/proc/internal.h | 25 ++++ fs/proc/root.c | 121 ++++++++++++++++- fs/proc/task_nommu.c | 2 +- include/linux/pid_namespace.h | 13 ++ include/linux/proc_fs.h | 12 ++ kernel/pid.c | 2 + kernel/pid_namespace.c | 10 ++- kernel/sysctl.c | 1 + 12 files changed, 660 insertions(+), 138 deletions(-) --- diff --git a/fs/proc/Makefile b/fs/proc/Makefile index c1c7293..81020f9 100644 --- a/fs/proc/Makefile +++ b/fs/proc/Makefile @@ -8,7 +8,7 @@ proc-y := nommu.o task_nommu.o proc-$(CONFIG_MMU) := mmu.o task_mmu.o proc-y += inode.o root.o base.o generic.o array.o \ - proc_tty.o + proc_tty.o base_perms.o proc-y += cmdline.o proc-y += consoles.o proc-y += cpuinfo.o diff --git a/fs/proc/base.c b/fs/proc/base.c index fc5bc27..33684f7 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -98,40 +98,43 @@ * in /proc for a task before it execs a suid executable. */ -struct pid_entry { - char *name; - int len; - mode_t mode; - const struct inode_operations *iop; - const struct file_operations *fop; - union proc_op op; -}; - -#define NOD(NAME, MODE, IOP, FOP, OP) { \ +#define NOD(NAME, MODE, IOP, FOP, OP, PERMS) { \ .name = (NAME), \ .len = sizeof(NAME) - 1, \ .mode = MODE, \ .iop = IOP, \ .fop = FOP, \ .op = OP, \ + .need_perms_check = PERMS, \ } +/* + * XX_PERMS() are files without any ptrace() check. + * However, they have updated uid/gid and permissions on each file operation. + */ #define DIR(NAME, MODE, iops, fops) \ - NOD(NAME, (S_IFDIR|(MODE)), &iops, &fops, {} ) + NOD(NAME, (S_IFDIR|(MODE)), &iops, &fops, {}, true) #define LNK(NAME, get_link) \ NOD(NAME, (S_IFLNK|S_IRWXUGO), \ &proc_pid_link_inode_operations, NULL, \ - { .proc_get_link = get_link } ) + { .proc_get_link = get_link }, false) #define REG(NAME, MODE, fops) \ - NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}) + NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}, true) +#define REG_PERMS(NAME, MODE, fops) \ + NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}, false) #define INF(NAME, MODE, read) \ NOD(NAME, (S_IFREG|(MODE)), \ NULL, &proc_info_file_operations, \ - { .proc_read = read } ) + { .proc_read = read }, true ) +#define INF_PERMS(NAME, MODE, read) \ + NOD(NAME, (S_IFREG|(MODE)), \ + NULL, &proc_info_file_operations, \ + { .proc_read = read }, false ) #define ONE(NAME, MODE, show) \ NOD(NAME, (S_IFREG|(MODE)), \ NULL, &proc_single_file_operations, \ - { .proc_show = show } ) + { .proc_show = show }, true ) + /* * Count the number of hardlinks for the pid_entry table, excluding the . @@ -229,35 +232,12 @@ static struct mm_struct *__check_mem_permission(struct task_struct *task) return ERR_PTR(-EPERM); } -/* - * If current may access user memory in @task return a reference to the - * corresponding mm, otherwise ERR_PTR. - */ -static struct mm_struct *check_mem_permission(struct task_struct *task) -{ - struct mm_struct *mm; - int err; - - /* - * Avoid racing if task exec's as we might get a new mm but validate - * against old credentials. - */ - err = mutex_lock_killable(&task->signal->cred_guard_mutex); - if (err) - return ERR_PTR(err); - - mm = __check_mem_permission(task); - mutex_unlock(&task->signal->cred_guard_mutex); - - return mm; -} - struct mm_struct *mm_for_maps(struct task_struct *task) { struct mm_struct *mm; int err; - err = mutex_lock_killable(&task->signal->cred_guard_mutex); + err = mutex_lock_killable(&task->signal->cred_guard_mutex); if (err) return ERR_PTR(err); @@ -327,7 +307,6 @@ static int proc_pid_auxv(struct task_struct *task, char *buffer) return res; } - #ifdef CONFIG_KALLSYMS /* * Provides a wchan file via kallsyms in a proper one-value-per-file format. @@ -350,23 +329,6 @@ static int proc_pid_wchan(struct task_struct *task, char *buffer) } #endif /* CONFIG_KALLSYMS */ -static int lock_trace(struct task_struct *task) -{ - int err = mutex_lock_killable(&task->signal->cred_guard_mutex); - if (err) - return err; - if (!ptrace_may_access(task, PTRACE_MODE_ATTACH)) { - mutex_unlock(&task->signal->cred_guard_mutex); - return -EPERM; - } - return 0; -} - -static void unlock_trace(struct task_struct *task) -{ - mutex_unlock(&task->signal->cred_guard_mutex); -} - #ifdef CONFIG_STACKTRACE #define MAX_STACK_TRACE_DEPTH 64 @@ -388,16 +350,13 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns, trace.entries = entries; trace.skip = 0; - err = lock_trace(task); - if (!err) { - save_stack_trace_tsk(task, &trace); + save_stack_trace_tsk(task, &trace); - for (i = 0; i < trace.nr_entries; i++) { - seq_printf(m, "[<%pK>] %pS\n", - (void *)entries[i], (void *)entries[i]); - } - unlock_trace(task); + for (i = 0; i < trace.nr_entries; i++) { + seq_printf(m, "[<%pK>] %pS\n", + (void *)entries[i], (void *)entries[i]); } + kfree(entries); return err; @@ -563,9 +522,7 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer) { long nr; unsigned long args[6], sp, pc; - int res = lock_trace(task); - if (res) - return res; + int res; if (task_current_syscall(task, &nr, args, 6, &sp, &pc)) res = sprintf(buffer, "running\n"); @@ -577,7 +534,7 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer) nr, args[0], args[1], args[2], args[3], args[4], args[5], sp, pc); - unlock_trace(task); + return res; } #endif /* CONFIG_HAVE_ARCH_TRACEHOOK */ @@ -589,18 +546,18 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer) /* permission checks */ static int proc_fd_access_allowed(struct inode *inode) { - struct task_struct *task; - int allowed = 0; - /* Allow access to a task's file descriptors if it is us or we - * may use ptrace attach to the process and find out that - * information. - */ - task = get_proc_task(inode); - if (task) { - allowed = ptrace_may_access(task, PTRACE_MODE_READ); - put_task_struct(task); - } - return allowed; + struct task_struct *task; + int allowed = 0; + /* Allow access to a task's file descriptors if it is us or we + * may use ptrace attach to the process and find out that + * information. + */ + task = get_proc_task(inode); + if (task) { + allowed = ptrace_may_access(task, PTRACE_MODE_READ); + put_task_struct(task); + } + return allowed; } int proc_setattr(struct dentry *dentry, struct iattr *attr) @@ -839,7 +796,7 @@ static ssize_t mem_read(struct file * file, char __user * buf, if (!page) goto out; - mm = check_mem_permission(task); + mm = __check_mem_permission(task); ret = PTR_ERR(mm); if (IS_ERR(mm)) goto out_free; @@ -902,7 +859,7 @@ static ssize_t mem_write(struct file * file, const char __user *buf, if (!page) goto out_task; - mm = check_mem_permission(task); + mm = __check_mem_permission(task); copied = PTR_ERR(mm); if (IS_ERR(mm)) goto out_free; @@ -1758,6 +1715,27 @@ int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) /* dentry stuff */ +static void pid_revalidate_perms(struct dentry *dentry); + +void __pid_revalidate(struct inode *inode, struct task_struct *task) +{ + const struct cred *cred; + + if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || + task_dumpable(task)) { + rcu_read_lock(); + cred = __task_cred(task); + inode->i_uid = cred->euid; + inode->i_gid = cred->egid; + rcu_read_unlock(); + } else { + inode->i_uid = 0; + inode->i_gid = 0; + } + inode->i_mode &= ~(S_ISUID | S_ISGID); + security_task_to_inode(task, inode); +} + /* * Exceptional case: normally we are not allowed to unhash a busy * directory. In this case, however, we can do it - no aliasing problems @@ -1777,7 +1755,6 @@ int pid_revalidate(struct dentry *dentry, struct nameidata *nd) { struct inode *inode; struct task_struct *task; - const struct cred *cred; if (nd && nd->flags & LOOKUP_RCU) return -ECHILD; @@ -1785,20 +1762,10 @@ int pid_revalidate(struct dentry *dentry, struct nameidata *nd) inode = dentry->d_inode; task = get_proc_task(inode); + pid_revalidate_perms(dentry); + if (task) { - if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || - task_dumpable(task)) { - rcu_read_lock(); - cred = __task_cred(task); - inode->i_uid = cred->euid; - inode->i_gid = cred->egid; - rcu_read_unlock(); - } else { - inode->i_uid = 0; - inode->i_gid = 0; - } - inode->i_mode &= ~(S_ISUID | S_ISGID); - security_task_to_inode(task, inode); + __pid_revalidate(inode, task); put_task_struct(task); return 1; } @@ -2200,7 +2167,9 @@ static struct dentry *proc_fdinfo_instantiate(struct inode *dir, ei = PROC_I(inode); ei->fd = fd; inode->i_mode = S_IFREG | S_IRUSR; - inode->i_fop = &proc_fdinfo_file_operations; + ei->real_fops = &proc_fdinfo_file_operations; + inode->i_fop = &proc_pid_perms_fops; + d_set_d_op(dentry, &tid_fd_dentry_operations); d_add(dentry, inode); /* Close the race of the process dying before we return the dentry */ @@ -2238,7 +2207,6 @@ static const struct inode_operations proc_fdinfo_inode_operations = { .setattr = proc_setattr, }; - static struct dentry *proc_pident_instantiate(struct inode *dir, struct dentry *dentry, struct task_struct *task, const void *ptr) { @@ -2255,11 +2223,16 @@ static struct dentry *proc_pident_instantiate(struct inode *dir, inode->i_mode = p->mode; if (S_ISDIR(inode->i_mode)) inode->i_nlink = 2; /* Use getattr to fix if necessary */ + if (p->iop) inode->i_op = p->iop; if (p->fop) - inode->i_fop = p->fop; + ei->real_fops = p->fop; ei->op = p->op; + inode->i_fop = &proc_pid_perms_fops; + + ei->dirent = p; + d_set_d_op(dentry, &pid_dentry_operations); d_add(dentry, inode); /* Close the race of the process dying before we return the dentry */ @@ -2417,15 +2390,9 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf, if (copy_from_user(page, buf, count)) goto out_free; - /* Guard against adverse ptrace interaction */ - length = mutex_lock_interruptible(&task->signal->cred_guard_mutex); - if (length < 0) - goto out_free; - length = security_setprocattr(task, (char*)file->f_path.dentry->d_name.name, (void*)page, count); - mutex_unlock(&task->signal->cred_guard_mutex); out_free: free_page((unsigned long) page); out: @@ -2469,7 +2436,7 @@ static struct dentry *proc_attr_dir_lookup(struct inode *dir, attr_dir_stuff, ARRAY_SIZE(attr_dir_stuff)); } -static const struct inode_operations proc_attr_dir_inode_operations = { +const struct inode_operations proc_attr_dir_inode_operations = { .lookup = proc_attr_dir_lookup, .getattr = pid_getattr, .setattr = proc_setattr, @@ -2617,7 +2584,7 @@ static const struct inode_operations proc_self_inode_operations = { */ static const struct pid_entry proc_base_stuff[] = { NOD("self", S_IFLNK|S_IRWXUGO, - &proc_self_inode_operations, NULL, {}), + &proc_self_inode_operations, NULL, {}, false), }; static struct dentry *proc_base_instantiate(struct inode *dir, @@ -2653,9 +2620,12 @@ static struct dentry *proc_base_instantiate(struct inode *dir, inode->i_size = 64; if (p->iop) inode->i_op = p->iop; + if (p->fop) - inode->i_fop = p->fop; + ei->real_fops = p->fop; ei->op = p->op; + inode->i_fop = &proc_pid_perms_fops; + d_add(dentry, inode); error = NULL; out: @@ -2708,9 +2678,6 @@ static int do_io_accounting(struct task_struct *task, char *buffer, int whole) struct task_io_accounting acct = task->ioac; unsigned long flags; - if (!ptrace_may_access(task, PTRACE_MODE_READ)) - return -EACCES; - if (whole && lock_task_sighand(task, &flags)) { struct task_struct *t = task; @@ -2751,12 +2718,8 @@ static int proc_tgid_io_accounting(struct task_struct *task, char *buffer) static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { - int err = lock_trace(task); - if (!err) { - seq_printf(m, "%08x\n", task->personality); - unlock_trace(task); - } - return err; + seq_printf(m, "%08x\n", task->personality); + return 0; } /* @@ -2773,8 +2736,8 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_NET DIR("net", S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations), #endif - REG("environ", S_IRUSR, proc_environ_operations), - INF("auxv", S_IRUSR, proc_pid_auxv), + REG_PERMS("environ", S_IRUSR, proc_environ_operations), + INF_PERMS("auxv", S_IRUSR, proc_pid_auxv), ONE("status", S_IRUGO, proc_pid_status), ONE("personality", S_IRUGO, proc_pid_personality), INF("limits", S_IRUGO, proc_pid_limits), @@ -2791,9 +2754,9 @@ static const struct pid_entry tgid_base_stuff[] = { INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tgid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG_PERMS("maps", S_IRUGO, proc_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG_PERMS("numa_maps", S_IRUGO, proc_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -2804,7 +2767,7 @@ static const struct pid_entry tgid_base_stuff[] = { REG("mountstats", S_IRUSR, proc_mountstats_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG_PERMS("smaps", S_IRUGO, proc_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY @@ -2867,7 +2830,7 @@ static struct dentry *proc_tgid_base_lookup(struct inode *dir, struct dentry *de tgid_base_stuff, ARRAY_SIZE(tgid_base_stuff)); } -static const struct inode_operations proc_tgid_base_inode_operations = { +const struct inode_operations proc_tgid_base_inode_operations = { .lookup = proc_tgid_base_lookup, .getattr = pid_getattr, .setattr = proc_setattr, @@ -2972,9 +2935,10 @@ static struct dentry *proc_pid_instantiate(struct inode *dir, if (!inode) goto out; - inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO; + inode->i_mode = S_IFDIR | S_IRUGO | S_IXUGO; inode->i_op = &proc_tgid_base_inode_operations; - inode->i_fop = &proc_tgid_base_operations; + inode->i_fop = &proc_pid_perms_fops; + PROC_I(inode)->real_fops = &proc_tgid_base_operations; inode->i_flags|=S_IMMUTABLE; inode->i_nlink = 2 + pid_entry_count_dirs(tgid_base_stuff, @@ -3122,8 +3086,8 @@ static const struct pid_entry tid_base_stuff[] = { DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations), DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations), DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations), - REG("environ", S_IRUSR, proc_environ_operations), - INF("auxv", S_IRUSR, proc_pid_auxv), + REG_PERMS("environ", S_IRUSR, proc_environ_operations), + INF_PERMS("auxv", S_IRUSR, proc_pid_auxv), ONE("status", S_IRUGO, proc_pid_status), ONE("personality", S_IRUGO, proc_pid_personality), INF("limits", S_IRUGO, proc_pid_limits), @@ -3137,9 +3101,9 @@ static const struct pid_entry tid_base_stuff[] = { INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG_PERMS("maps", S_IRUGO, proc_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG_PERMS("numa_maps", S_IRUGO, proc_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3149,7 +3113,7 @@ static const struct pid_entry tid_base_stuff[] = { REG("mountinfo", S_IRUGO, proc_mountinfo_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG_PERMS("smaps", S_IRUGO, proc_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY @@ -3209,7 +3173,7 @@ static const struct file_operations proc_tid_base_operations = { .llseek = default_llseek, }; -static const struct inode_operations proc_tid_base_inode_operations = { +const struct inode_operations proc_tid_base_inode_operations = { .lookup = proc_tid_base_lookup, .getattr = pid_getattr, .setattr = proc_setattr, @@ -3226,7 +3190,8 @@ static struct dentry *proc_task_instantiate(struct inode *dir, goto out; inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO; inode->i_op = &proc_tid_base_inode_operations; - inode->i_fop = &proc_tid_base_operations; + inode->i_fop = &proc_pid_perms_fops; + PROC_I(inode)->real_fops = &proc_tid_base_operations; inode->i_flags|=S_IMMUTABLE; inode->i_nlink = 2 + pid_entry_count_dirs(tid_base_stuff, @@ -3445,3 +3410,58 @@ static const struct file_operations proc_task_operations = { .readdir = proc_task_readdir, .llseek = default_llseek, }; + +const int perms_size[] = { + [PROC_PERMS_NTID] = ARRAY_SIZE(tid_base_stuff), + [PROC_PERMS_NTGID] = ARRAY_SIZE(tgid_base_stuff), + [PROC_PERMS_NATTR] = ARRAY_SIZE(attr_dir_stuff), +}; + +const struct pid_entry *dir_ents[] = { + [PROC_PERMS_NTID] = tid_base_stuff, + [PROC_PERMS_NTGID] = tgid_base_stuff, + [PROC_PERMS_NATTR] = attr_dir_stuff, +}; + +static void pid_revalidate_perms(struct dentry *dentry) +{ + struct inode *inode; + struct pid_namespace *pid_ns; + struct proc_perms *perms; + const struct inode_operations *parent_op; + struct proc_inode *ei; + + pid_ns = dentry->d_sb->s_fs_info; + perms = &pid_ns->proc_perms; + inode = dentry->d_inode; + parent_op = dentry->d_parent->d_inode->i_op; + ei = PROC_I(inode); + + /* We never ever chmod symlinks (XXX: symlinks must be protected too) */ + if (S_ISLNK(inode->i_mode)) + return; + + if (inode->i_op == &proc_tgid_base_inode_operations || + inode->i_op == &proc_tid_base_inode_operations) { + mode_t mode = S_IFDIR | S_IRUGO | S_IXUGO; + mode_t mask = perms->proc_pid_allowed ? ~0 : ~077; + inode->i_mode = mode & mask; + } else if (parent_op == &proc_tgid_base_inode_operations) { + inode->i_mode = get_ent_perms(ei->dirent, + perms, + PROC_PERMS_NTGID, + (const struct pid_entry *)ei->dirent - &tgid_base_stuff[0]); + } else if (parent_op == &proc_tid_base_inode_operations) { + inode->i_mode = get_ent_perms(ei->dirent, + perms, + PROC_PERMS_NTID, + (const struct pid_entry *)ei->dirent - &tid_base_stuff[0]); + } else if (parent_op == &proc_attr_dir_inode_operations) { + inode->i_mode = get_ent_perms(ei->dirent, + perms, + PROC_PERMS_NATTR, + (struct pid_entry *)ei->dirent - &attr_dir_stuff[0]); + } else { + pr_err("other not found (%s)\n", dentry->d_name.name); + } +} diff --git a/fs/proc/base_perms.c b/fs/proc/base_perms.c new file mode 100644 index 0000000..26e157a --- /dev/null +++ b/fs/proc/base_perms.c @@ -0,0 +1,305 @@ +/* + * linux/fs/proc/base_perms.c + * + * Copyright (C) 2011 Vasiliy Kulikov + * + * proc base directory permissions handling functions + */ + +#include <asm/uaccess.h> +#include <linux/proc_fs.h> +#include <linux/string.h> +#include <linux/ptrace.h> +#include <linux/poll.h> +#include <linux/pid_namespace.h> +#include "internal.h" + + +/* Similar to acl_permission_check(), but with procfs specific changes */ +static int proc_permission_check(struct task_struct *task, struct inode *inode, int mask) +{ + mode_t mode = inode->i_mode; + //uid_t uid = inode->i_uid; + gid_t gid = inode->i_gid; + int pmode = (mask == MAY_WRITE) ? PTRACE_MODE_ATTACH : PTRACE_MODE_READ; + + if (mode & mask) + return 0; + + /* XXX: This should be updated conformably to user ns code changes. + * Or maybe just remove it as ns users may not have any access to procfs? */ + if (current_user_ns() != inode_userns(inode)) + return -EACCES; + + if (in_group_p(gid) && ((mode >> 3) & mask)) + return 0; + + /* + * The major changes compared to acl_permission_check(): + * We check ptrace ability instead of uid comparison. + */ + if (ptrace_may_access(task, pmode)) + return 0; + + return -EACCES; +} + +static void unlock_pid_trace(struct task_struct *task); + +/* + * All operations with /proc/PID/ directory MUST be processed + * under lock_pid_trace() to avoid execve() races. + * + * It checks permissions similar to acl_permission_check() and probably holds + * (*task)->signal->cred_guard_mutex. If so, *task would point to the target + * task, NULL otherwise. To unlock the mutex call unlock_pid_trace() with the + * same *task argument. + */ +static int lock_pid_trace(struct dentry *dentry, + int mask, + struct task_struct **task) +{ + int err; + struct inode *inode = dentry->d_inode; + const struct pid_entry *pe = PROC_I(inode)->dirent; + + if (pe && !pe->need_perms_check) { + /* OK, the file checks permissions on his own */ + *task = NULL; + return 0; + } + + *task = get_proc_task(inode); + if (*task == NULL) + return -ESRCH; + + err = mutex_lock_killable(&(*task)->signal->cred_guard_mutex); + if (err) + goto put_task; + + /* + * We have to revalidate both uid/gid and permissions. + * uid/gid revalidation is racy against execve(), so we do it under + * ->cred_guard_mutex. + * + * We cannot just update uid/gid because of LSM. + */ + __pid_revalidate(inode, *task); + + err = -EACCES; + if (proc_permission_check(*task, inode, mask)) + goto free_mutex; + + err = 0; + goto exit; + +free_mutex: + mutex_unlock(&(*task)->signal->cred_guard_mutex); +put_task: + put_task_struct(*task); + *task = NULL; +exit: + return err; +} + +static void unlock_pid_trace(struct task_struct *task) +{ + if (task) { + mutex_unlock(&task->signal->cred_guard_mutex); + put_task_struct(task); + } +} + +static int proc_pid_perms_open(struct inode *inode, struct file *file) +{ + struct proc_inode *pi = PROC_I(inode); + + /* No ptrace check here. All checks should be located + * in read/write/readdir/etc. */ + + if (pi->real_fops->open) + return pi->real_fops->open(inode, file); + return 0; +} + +static int proc_pid_perms_release(struct inode *inode, struct file *file) +{ + struct proc_inode *pi = PROC_I(inode); + + if (pi->real_fops->release) + return pi->real_fops->release(inode, file); + return 0; +} + +static ssize_t proc_pid_perms_read(struct file * file, char __user * buf, + size_t count, loff_t *ppos) +{ + struct dentry *dentry = file->f_dentry; + struct proc_inode *pi = PROC_I(dentry->d_inode); + int rc; + struct task_struct *task; + + if (pi->real_fops->read) { + rc = lock_pid_trace(dentry, MAY_READ, &task); + if (!rc) + rc = pi->real_fops->read(file, buf, count, ppos); + unlock_pid_trace(task); + return rc; + } + + /* Should never happen, but better safe than sorry */ + return -EPERM; +} + +static ssize_t proc_pid_perms_write(struct file *file, const char __user *buf, + size_t count, loff_t *offs) +{ + struct dentry *dentry = file->f_dentry; + struct proc_inode *pi = PROC_I(dentry->d_inode); + int rc; + struct task_struct *task; + + if (pi->real_fops->write) { + rc = lock_pid_trace(dentry, MAY_WRITE, &task); + if (!rc) + rc = pi->real_fops->write(file, buf, count, offs); + unlock_pid_trace(task); + return rc; + } + + /* Should never happen, but better safe than sorry */ + return -EPERM; +} + +static int proc_pid_perms_readdir(struct file *file, void *dirent, filldir_t filldir) +{ + struct dentry *dentry = file->f_dentry; + struct proc_inode *pi = PROC_I(dentry->d_inode); + int rc; + struct task_struct *task; + + if (pi->real_fops->readdir) { + rc = lock_pid_trace(dentry, MAY_READ, &task); + if (!rc) + rc = pi->real_fops->readdir(file, dirent, filldir); + unlock_pid_trace(task); + return rc; + } + + /* Should never happen, but better safe than sorry */ + return -ENOTDIR; +} + +static unsigned proc_pid_perms_poll(struct file *file, poll_table *wait) +{ + struct dentry *dentry = file->f_dentry; + struct proc_inode *pi = PROC_I(dentry->d_inode); + struct task_struct *task; + int rc; + + /* Check perms before the poll. There is a race against execve(), + * but hopefuly the infoleak is very minor */ + rc = lock_pid_trace(dentry, MAY_READ, &task); + unlock_pid_trace(task); + if (rc) + return rc; + + if (pi->real_fops->poll) + return pi->real_fops->poll(file, wait); + + /* The fallback for files not implementing poll */ + return DEFAULT_POLLMASK; +} + +static loff_t proc_pid_perms_llseek(struct file *file, loff_t offset, int orig) +{ + struct proc_inode *pi = PROC_I(file->f_dentry->d_inode); + + /* Looks like llseek() doesn't need any ptrace protection */ + + if (pi->real_fops->llseek) + return pi->real_fops->llseek(file, offset, orig); + + return no_llseek(file, offset, orig); +} + +/* + * Global wrapper fops for _ALL_ files inside /proc/PID/. + * + * It checks the permissions (including ptrace() check) of the current + * task on all file ops. The runtime checks protect from holding any fd + * across execve() of set*id binaries. task->signal->cred_guard_mutex + * is held during read(), readdir(), and write(). open(), release(), + * llseek() are not protected as they are harmless anyway. It makes + * effort to check permissions on poll(), but it doesn't hold the + * mutex during actual fops->poll() as poll() might sleep for much time. + * + * It makes sense to use proc_pid_perms_fops even for files doing their + * own checks (e.g. REG_PERMS()). + */ +const struct file_operations proc_pid_perms_fops = { + .open = proc_pid_perms_open, + .read = proc_pid_perms_read, + .readdir = proc_pid_perms_readdir, + .write = proc_pid_perms_write, + .llseek = proc_pid_perms_llseek, + .poll = proc_pid_perms_poll, + .release = proc_pid_perms_release, +}; + +mode_t get_ent_perms(const struct pid_entry *p, + struct proc_perms *perms, + int ndir, + int nent) +{ + mode_t umask = perms->proc_ent_allowed[ndir][nent] ? 0 : 077; + return p->mode & ~umask; +} + +int init_proc_perms(struct proc_perms *p) +{ + int i; + + p->proc_pid_allowed = true; + for (i = 0; i < PROC_PERMS_NMAX; i++) { + size_t size = perms_size[i]*sizeof(bool); + p->proc_ent_allowed[i] = kzalloc(size, GFP_KERNEL); + if (p->proc_ent_allowed[i] == NULL) + goto err; + } + + return 0; +err: + for (i-- ; i >= 0; i--) + kfree(p->proc_ent_allowed[i]); + return -ENOMEM; +} + +void free_proc_perms(struct proc_perms *p) +{ + int i; + + for (i = 0; i < PROC_PERMS_NMAX; i++) + kfree(p->proc_ent_allowed[i]); +} + +void fill_proc_perms_ent(struct proc_perms *p, int ndir, bool val) +{ + int i; + for (i = 0; i < perms_size[ndir]; i++) + p->proc_ent_allowed[ndir][i] = val; +} + +bool set_proc_perms_ent(struct proc_perms *p, int ndir, const char *fname, bool val) +{ + int i; + for (i = 0; i < perms_size[ndir]; i++) + if (!strcmp(dir_ents[ndir][i].name, fname)) { + pr_err("found %s (%d)\n", fname, i); + p->proc_ent_allowed[ndir][i] = val; + return true; + } + + pr_err("not found %s (%d)\n", fname, i); + return false; +} diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 74b48cf..022ff46 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -18,6 +18,10 @@ #include <linux/module.h> #include <linux/sysctl.h> #include <linux/slab.h> +#include <linux/pid_namespace.h> +#include <linux/seq_file.h> +#include <linux/mount.h> + #include <asm/system.h> #include <asm/uaccess.h> @@ -69,6 +73,8 @@ static struct inode *proc_alloc_inode(struct super_block *sb) ei->sysctl_entry = NULL; ei->ns = NULL; ei->ns_ops = NULL; + ei->real_fops = NULL; + ei->dirent = NULL; inode = &ei->vfs_inode; inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME; return inode; @@ -102,12 +108,25 @@ void __init proc_init_inodecache(void) init_once); } +static int proc_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct super_block *sb = vfs->mnt_sb; + struct pid_namespace *pid = sb->s_fs_info; + + if (pid->proc_perms.proc_pid_allowed) + seq_printf(seq, ",proc_pid_allowed=."); + + return 0; +} + static const struct super_operations proc_sops = { .alloc_inode = proc_alloc_inode, .destroy_inode = proc_destroy_inode, .drop_inode = generic_delete_inode, .evict_inode = proc_evict_inode, .statfs = simple_statfs, + .remount_fs = proc_remount, + .show_options = proc_show_options, }; static void __pde_users_dec(struct proc_dir_entry *pde) diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 7838e5c..fe9ed27 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -137,11 +137,36 @@ int proc_fill_cache(struct file *filp, void *dirent, filldir_t filldir, const char *name, int len, instantiate_t instantiate, struct task_struct *task, const void *ptr); int pid_revalidate(struct dentry *dentry, struct nameidata *nd); +extern void __pid_revalidate(struct inode *inode, struct task_struct *task); struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct *task); extern const struct dentry_operations pid_dentry_operations; int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat); int proc_setattr(struct dentry *dentry, struct iattr *attr); +int proc_remount(struct super_block *sb, int *flags, char *data); extern const struct inode_operations proc_ns_dir_inode_operations; extern const struct file_operations proc_ns_dir_operations; +extern const struct inode_operations proc_tgid_base_inode_operations; +extern const struct inode_operations proc_tid_base_inode_operations; +extern const struct inode_operations proc_attr_dir_inode_operations; + +extern const struct file_operations proc_pid_perms_fops; +extern const int perms_size[]; +extern const struct pid_entry *dir_ents[]; + +extern mode_t get_ent_perms(const struct pid_entry *p, + struct proc_perms *perms, + int ndir, + int nent); + +struct pid_entry { + char *name; + int len; + mode_t mode; + const struct inode_operations *iop; + const struct file_operations *fop; + union proc_op op; + bool need_perms_check; +}; + diff --git a/fs/proc/root.c b/fs/proc/root.c index d6c3b41..9193fa1 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -18,6 +18,7 @@ #include <linux/bitops.h> #include <linux/mount.h> #include <linux/pid_namespace.h> +#include <linux/parser.h> #include "internal.h" @@ -36,6 +37,112 @@ static int proc_set_super(struct super_block *sb, void *data) return err; } +enum { + Opt_err, Opt_tid_allowed, Opt_tgid_allowed, Opt_attr_allowed, +}; + +static const match_table_t tokens = { + {Opt_tgid_allowed, "tgid_allowed=%s"}, + {Opt_tid_allowed, "tid_allowed=%s"}, + {Opt_attr_allowed, "attr_allowed=%s"}, + {Opt_err, NULL}, +}; + +static bool proc_parse_xallowed(struct proc_perms *p, int ndir, char *val, bool set_pid_allowed) +{ + char *field; + + fill_proc_perms_ent(p, ndir, false); + + if (strcmp(val, "none") == 0) { + if (set_pid_allowed) + p->proc_pid_allowed = false; + return true; + } + if (strcmp(val, "all") == 0) { + fill_proc_perms_ent(p, ndir, true); + if (set_pid_allowed) + p->proc_pid_allowed = true; + return true; + } + + while ((field = strsep(&val, ";")) != NULL) { + if (!set_proc_perms_ent(p, ndir, field, true)) + return false; + } + + if (set_pid_allowed) + p->proc_pid_allowed = true; + return true; +} + +static int proc_parse_options(char *options, struct pid_namespace *pid) +{ + char *p, *val; + substring_t args[MAX_OPT_ARGS]; + + pr_debug("proc: options = %s\n", options); + + if (!options) + return 1; + + while ((p = strsep(&options, ",")) != NULL) { + int token; + if (!*p) + continue; + + args[0].to = args[0].from = 0; + token = match_token(p, tokens, args); + switch (token) { + case Opt_tgid_allowed: + val = match_strdup(&args[0]); + if (!val) + return 0; + if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NTGID, val, true)) { + kfree(val); + return 0; + } + kfree(val); + break; + case Opt_tid_allowed: + val = match_strdup(&args[0]); + if (!val) + return 0; + if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NTID, val, false)) { + kfree(val); + return 0; + } + kfree(val); + break; + case Opt_attr_allowed: + val = match_strdup(&args[0]); + if (!val) + return 0; + if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NATTR, val, false)) { + kfree(val); + return 0; + } + kfree(val); + break; + default: + pr_err("proc: unrecognized mount option \"%s\" " + "or missing value", p); + return 0; + } + } + + //pr_debug("proc: gid = %u, hidepid = %o, hidenet = %d\n", pid->pid_gid, pid->hide_pid, (int)pid->hide_net); + + return 1; +} + +int proc_remount(struct super_block *sb, int *flags, char *data) +{ + struct pid_namespace *pid = sb->s_fs_info; + return !proc_parse_options(data, pid); +} + + static struct dentry *proc_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { @@ -43,11 +150,16 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, struct super_block *sb; struct pid_namespace *ns; struct proc_inode *ei; + char *options; - if (flags & MS_KERNMOUNT) + + if (flags & MS_KERNMOUNT) { ns = (struct pid_namespace *)data; - else + options = NULL; + } else { ns = current->nsproxy->pid_ns; + options = data; + } sb = sget(fs_type, proc_test_super, proc_set_super, ns); if (IS_ERR(sb)) @@ -55,6 +167,10 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, if (!sb->s_root) { sb->s_flags = flags; + if (!proc_parse_options(options, ns)) { + deactivate_locked_super(sb); + return ERR_PTR(-EINVAL); + } err = proc_fill_super(sb); if (err) { deactivate_locked_super(sb); @@ -74,6 +190,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, return dget(sb->s_root); } + static void proc_kill_sb(struct super_block *sb) { struct pid_namespace *ns; diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 980de54..da98ba2 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -201,7 +201,7 @@ static void *m_start(struct seq_file *m, loff_t *pos) if (!priv->task) return ERR_PTR(-ESRCH); - mm = mm_for_maps(priv->task); + mm = get_task_mm(priv->task); if (!mm || IS_ERR(mm)) { put_task_struct(priv->task); priv->task = NULL; diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 38d1032..ed10329 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -16,6 +16,18 @@ struct pidmap { struct bsd_acct_struct; +enum { + PROC_PERMS_NTID, + PROC_PERMS_NTGID, + PROC_PERMS_NATTR, + PROC_PERMS_NMAX +}; + +struct proc_perms { + bool proc_pid_allowed; + bool *proc_ent_allowed[PROC_PERMS_NMAX]; +}; + struct pid_namespace { struct kref kref; struct pidmap pidmap[PIDMAP_ENTRIES]; @@ -30,6 +42,7 @@ struct pid_namespace { #ifdef CONFIG_BSD_PROCESS_ACCT struct bsd_acct_struct *bacct; #endif + struct proc_perms proc_perms; }; extern struct pid_namespace init_pid_ns; diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index e7576cf..a1525d7 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -115,10 +115,17 @@ struct proc_dir_entry *proc_create_data(const char *name, mode_t mode, extern void remove_proc_entry(const char *name, struct proc_dir_entry *parent); struct pid_namespace; +struct proc_perms; extern int pid_ns_prepare_proc(struct pid_namespace *ns); extern void pid_ns_release_proc(struct pid_namespace *ns); +extern int init_proc_perms(struct proc_perms *p); +extern void free_proc_perms(struct proc_perms *p); + +extern void fill_proc_perms_ent(struct proc_perms *p, int ndir, bool val); +extern bool set_proc_perms_ent(struct proc_perms *p, int ndir, const char *fname, bool val); + /* * proc_tty.c */ @@ -267,6 +274,11 @@ struct proc_inode { struct pid *pid; int fd; union proc_op op; + + /* Used by files inside /proc/PID/ only */ + const struct file_operations *real_fops; + const void *dirent; + struct proc_dir_entry *pde; struct ctl_table_header *sysctl; struct ctl_table *sysctl_entry; diff --git a/kernel/pid.c b/kernel/pid.c index 57a8346..ffa8b37 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -565,6 +565,8 @@ void __init pidmap_init(void) set_bit(0, init_pid_ns.pidmap[0].page); atomic_dec(&init_pid_ns.pidmap[0].nr_free); + BUG_ON(init_proc_perms(&init_pid_ns.proc_perms)); + init_pid_ns.pid_cachep = KMEM_CACHE(pid, SLAB_HWCACHE_ALIGN | SLAB_PANIC); } diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index e9c9adc..ab4261c 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -97,12 +97,18 @@ static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_p for (i = 1; i < PIDMAP_ENTRIES; i++) atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE); - err = pid_ns_prepare_proc(ns); + err = init_proc_perms(&ns->proc_perms); if (err) goto out_put_parent_pid_ns; + err = pid_ns_prepare_proc(ns); + if (err) + goto out_free_proc_perms; + return ns; +out_free_proc_perms: + free_proc_perms(&ns->proc_perms); out_put_parent_pid_ns: put_pid_ns(parent_pid_ns); out_free_map: @@ -119,6 +125,8 @@ static void destroy_pid_namespace(struct pid_namespace *ns) for (i = 0; i < PIDMAP_ENTRIES; i++) kfree(ns->pidmap[i].page); + + free_proc_perms(&ns->proc_perms); kmem_cache_free(pid_ns_cachep, ns); } diff --git a/kernel/sysctl.c b/kernel/sysctl.c index f175d98..745b71e 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -57,6 +57,7 @@ #include <linux/pipe_fs_i.h> #include <linux/oom.h> #include <linux/kmod.h> +#include <linux/pid_namespace.h> #include <asm/uaccess.h> #include <asm/processor.h> -- ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [kernel-hardening] Re: procfs {tid,tgid,attr}_allowed mount options 2011-08-04 11:23 ` [kernel-hardening] " Vasiliy Kulikov @ 2011-08-10 10:02 ` Vasiliy Kulikov 2011-08-10 11:22 ` [kernel-hardening] " Solar Designer 2011-08-10 11:25 ` Solar Designer 1 sibling, 1 reply; 29+ messages in thread From: Vasiliy Kulikov @ 2011-08-10 10:02 UTC (permalink / raw) To: kernel-hardening Solar, On Thu, Aug 04, 2011 at 15:23 +0400, Vasiliy Kulikov wrote: > New version. Cleanups/fixes here and there. One question: do we really need gid= option? The only user I know is identd, but does anybody use it nowadays? With gid= I see 2 drawbacks: 1) Code becomes worse because of additional permission checks. 2) From the upstream's point of view it is very limited and unextendable feature. So, I'd go further without gid=, at least for the beginning. Thanks, -- Vasiliy Kulikov http://www.openwall.com - bringing security into open computing environments ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options 2011-08-10 10:02 ` Vasiliy Kulikov @ 2011-08-10 11:22 ` Solar Designer 0 siblings, 0 replies; 29+ messages in thread From: Solar Designer @ 2011-08-10 11:22 UTC (permalink / raw) To: kernel-hardening On Wed, Aug 10, 2011 at 02:02:27PM +0400, Vasiliy Kulikov wrote: > One question: do we really need gid= option? The only user I know is > identd, but does anybody use it nowadays? identd is mostly obsolete, but I am using gid= with 2.4.x-ow kernels to let a group of sysadmins see all users' processes and network connections without having to use su. In fact, I use it on my very own computers - again, to let my main desktop user account see everything, while not letting my pseudo-user accounts (that I use for things such as a web browser) also see everything (they're not in group proc). > With gid= I see 2 drawbacks: > > 1) Code becomes worse because of additional permission checks. > > 2) From the upstream's point of view it is very limited and unextendable > feature. This feature is precisely what I needed and used for over a decade. I never needed more flexibility. Maybe this says something. > So, I'd go further without gid=, at least for the beginning. I don't know what works best for upstream acceptance in the beginning, but I definitely want this feature to get in, and it is a must for Owl. Thanks, Alexander ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options 2011-08-04 11:23 ` [kernel-hardening] " Vasiliy Kulikov 2011-08-10 10:02 ` Vasiliy Kulikov @ 2011-08-10 11:25 ` Solar Designer 2011-08-10 12:04 ` Vasiliy Kulikov 1 sibling, 1 reply; 29+ messages in thread From: Solar Designer @ 2011-08-10 11:25 UTC (permalink / raw) To: kernel-hardening Vasiliy, On Thu, Aug 04, 2011 at 03:23:31PM +0400, Vasiliy Kulikov wrote: > New version. Cleanups/fixes here and there. > > It lacks net/ restriction, but IMO it is already complicated enough > (more than 500 new lines). Such (relatively) simple thing as > net_allowed= is an additional care. I'd achieve at least processes > restrictions in upstream, after it will come networking. That's a lot of code already. I tried reviewing it, but I felt that I need an English description of its functionality first. Perhaps you're preparing one for posting to LKML anyway - can you please post it in here first? Thanks, Alexander ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options 2011-08-10 11:25 ` Solar Designer @ 2011-08-10 12:04 ` Vasiliy Kulikov 2011-08-10 13:34 ` Solar Designer 0 siblings, 1 reply; 29+ messages in thread From: Vasiliy Kulikov @ 2011-08-10 12:04 UTC (permalink / raw) To: kernel-hardening Solar, On Wed, Aug 10, 2011 at 15:25 +0400, Solar Designer wrote: > On Thu, Aug 04, 2011 at 03:23:31PM +0400, Vasiliy Kulikov wrote: > > New version. Cleanups/fixes here and there. > > > > It lacks net/ restriction, but IMO it is already complicated enough > > (more than 500 new lines). Such (relatively) simple thing as > > net_allowed= is an additional care. I'd achieve at least processes > > restrictions in upstream, after it will come networking. > > That's a lot of code already. Yes, so I'd try to get comments from net-less and gid-less patch first. Probably the whole way of permissions handling would be ridicules for kernel forlk :) > I tried reviewing it, but I felt that I > need an English description of its functionality first. Perhaps you're > preparing one for posting to LKML anyway - can you please post it in > here first? Something like this: This patch adds support of pid_allowed=XX and attr_allowed=YY mount options for procfs. When set, all /proc/PID/ files are restricted to the owner, except filenames passed via pid_allowed= argument. E.g. with pid_allowed=sched sched file would be world readable and other files would be restricted to the task owner. The same for /proc/PID/attr/ files and attr_allowed=YY. The new struct file_operations proc_pid_perms_fops was created to deal with permissions keeping in mind pid_allowed=. It is a wrapper for all /proc/PID/* and /proc/PID/task/TID/* file_operations. It checks current task permissions against the target task on each struct file access to avoid races against execve(). All pid_entry's gain the following permission checking algorithm by default: 1) if the filename is not passed via pid_allowed= then ptrace check is applied. 2) Otherwise, if file POSIX permissions deny access to the world, ptrace check is applied. 3) Otherwise, generic permission checking scheme is applied. For procfs files it additionally includes ptrace check if current euid and the procfs inode uid are equal. For the simplicity meta values are allowed: "none" and "all". "all" means all files are a subject of (2) and (3) checks. "none" means all files are a subject of (1) check. "none" additionally restricts access to the /proc/PID/ directory itself. Old ptrace checks are removed from the handlers as they are now obsoleted by proc_pid_perms_fops checks. Handlers may further restrict the permission model by introducing additional checks. For some files the ptrace check is too strong (environ, auxv, maps, numa_maps, and smaps), so no implicit permission checking is applied for them. These files should use pid_entry macros with "_PERMS" suffix. Only /proc/PID/, /proc/PID/*, /proc/PID/task/TID/, /proc/PID/task/TID/*, /proc/PID/attr/* files' permission checking is changed. fd/ and fdinfo/ are outstanding files and check the permissions on their own. Thanks, --- Vasiliy Kulikov http://www.openwall.com - bringing security into open computing environments ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options 2011-08-10 12:04 ` Vasiliy Kulikov @ 2011-08-10 13:34 ` Solar Designer 2011-08-12 18:14 ` Simon Marechal 0 siblings, 1 reply; 29+ messages in thread From: Solar Designer @ 2011-08-10 13:34 UTC (permalink / raw) To: kernel-hardening On Wed, Aug 10, 2011 at 04:04:39PM +0400, Vasiliy Kulikov wrote: > On Wed, Aug 10, 2011 at 15:25 +0400, Solar Designer wrote: > > That's a lot of code already. > > Yes, so I'd try to get comments from net-less and gid-less patch first. > Probably the whole way of permissions handling would be ridicules for > kernel forlk :) I feel that you may have treated Andrew Morton's suggestion too seriously. I think he was hoping for something not only more general, but also simple. With your invasive changes, even checking for (lack of) potential new vulnerabilities (such as lack of ptrace check where it previously existed and was needed) feels non-trivial. Or maybe I just did not look closely enough. Perhaps run this by LKML as RFC and see what they think? And be willing to revert to your old approach, with more hard-coding, now that you have this arguably overly complicated alternative. Maybe it will convince Andrew Morton that something simpler and less flexible would be better. Thanks, Alexander ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options 2011-08-10 13:34 ` Solar Designer @ 2011-08-12 18:14 ` Simon Marechal 0 siblings, 0 replies; 29+ messages in thread From: Simon Marechal @ 2011-08-12 18:14 UTC (permalink / raw) To: kernel-hardening Le 10/08/2011 15:34, Solar Designer a écrit : > Perhaps run this by LKML as RFC and see what they think? And be willing > to revert to your old approach, with more hard-coding, now that you have > this arguably overly complicated alternative. Maybe it will convince > Andrew Morton that something simpler and less flexible would be better. Just my opinion, but the gid option is simple and to the point. More complex solution will likely : * not be used at all * not be relevant to people with very specific needs anyway * introduce bugs and/or vulnerabilities, either from the code or from misconfigurations Point #2 is important. Very specific needs should not be addressed in this specific patch, it should be configured in something with a global scope, such as a LSM. I believe having effective security systems enabled by default is more important than having generalistic and configurable systems nobody care about. For example, being able to let a process choose the set of system calls it should use is more useful to me than having SELinux loaded. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options 2011-06-05 20:10 ` Solar Designer 2011-06-06 18:08 ` Vasiliy Kulikov @ 2011-06-06 19:20 ` Vasiliy Kulikov 1 sibling, 0 replies; 29+ messages in thread From: Vasiliy Kulikov @ 2011-06-06 19:20 UTC (permalink / raw) To: kernel-hardening [-- Attachment #1: Type: text/plain, Size: 1361 bytes --] On Mon, Jun 06, 2011 at 00:10 +0400, Solar Designer wrote: > On Sun, Jun 05, 2011 at 11:47:46PM +0400, Vasiliy Kulikov wrote: > > On Sun, Jun 05, 2011 at 23:26 +0400, Solar Designer wrote: > > > On Sun, Jun 05, 2011 at 10:24:31PM +0400, Vasiliy Kulikov wrote: > > > > TODO/thoughs: > > > > - /proc/pid/net/ currently doesn't show ANYTHING, even "." and "..". > > > > This is confusing :) > > > > > > Ouch. Can't you simply restrict its perms such that this directory > > > can't be listed unless you have privs? ... > > Another solution - create a fake net namespace and process this > > namespace if not enough permissions :) It also removes weird netstat > > errors like "seems like networking was disabled for this kernel". A fake net namespace works perfect: $ LANG=C netstat -nlp4 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name No warning from netstat. I remember brctl didn't properly handle missing sysfs files, so fake files make sense. Will repost the patch after I'm sure that changing hidepid works well with inode caching (I see a bug in my current implementation). Thanks, -- Vasiliy [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options 2011-06-04 20:59 ` Solar Designer 2011-06-05 18:24 ` [kernel-hardening] [RFC v1] " Vasiliy Kulikov @ 2011-06-05 19:17 ` Vasiliy Kulikov 2011-06-05 19:40 ` Solar Designer 1 sibling, 1 reply; 29+ messages in thread From: Vasiliy Kulikov @ 2011-06-05 19:17 UTC (permalink / raw) To: kernel-hardening [-- Attachment #1: Type: text/plain, Size: 1284 bytes --] Solar, On Sun, Jun 05, 2011 at 00:59 +0400, Solar Designer wrote: > > > This is not generic, but at least it's simple, not > > > confusing, and it allows for future changes (we may change what exactly > > > restricted means). > > > > Changing what some option means after implementing it is a ABI breakage, > > which is terrible for upstream. > > This depends on how the option is defined initially. It is possible to > define the option as "enabling unspecified version-dependent restrictions, > which are subject to change across kernel versions and builds". ;-) OK, > the word "unspecified" may be dropped. Everything that is implemented should behave the same way it was initially implemented, regardless of what is explicitly marked as "ABI". This is upsteam's way to define "ABI" term. Often they deliberately break such ABI, but normally this is not encouraged. > Here's a related thought: if these mount options happen to affect all > instances of the filesystem (in the same container), maybe they should > be sysctl's instead? AFAIR, only net namespaces have their own sysctl sets. Other sysctls are global. So, implementing pid_namespace-specific sysctl would be a bit weird (according to current policies). Thanks, -- Vasiliy [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options 2011-06-05 19:17 ` [kernel-hardening] " Vasiliy Kulikov @ 2011-06-05 19:40 ` Solar Designer 2011-06-05 19:53 ` Vasiliy Kulikov 0 siblings, 1 reply; 29+ messages in thread From: Solar Designer @ 2011-06-05 19:40 UTC (permalink / raw) To: kernel-hardening On Sun, Jun 05, 2011 at 11:17:47PM +0400, Vasiliy Kulikov wrote: > On Sun, Jun 05, 2011 at 00:59 +0400, Solar Designer wrote: > > Here's a related thought: if these mount options happen to affect all > > instances of the filesystem (in the same container), maybe they should > > be sysctl's instead? > > AFAIR, only net namespaces have their own sysctl sets. Other sysctls > are global. So, implementing pid_namespace-specific sysctl would be a > bit weird (according to current policies). Here's what we have immediate need for, in practice: We need to be able to mount /proc with different permission settings in different OpenVZ containers (perhaps running different distros, which have their different defaults - e.g., Owl will use the restricted proc options by default, but other distros mostly won't). Since recent versions of OpenVZ build upon the namespaces code that has been upstream'ed, I guess this will rely on upstream's namespaces code (once we move to RHEL6'ish OpenVZ kernels and beyond), correct? Now, leaving sysctl's aside and speaking of mount options only for now, what happens when a container mounts /proc with umask=007, but then another container mounts /proc without that option or with umask=0? Does the first container retain its restricted perms, including for newly appearing entries under its /proc? If so, where is this different setting stored? Is it per mount (preferable)? Is it per pid namespace (OK)? Or per net namespace (weird)? Thanks, Alexander ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options 2011-06-05 19:40 ` Solar Designer @ 2011-06-05 19:53 ` Vasiliy Kulikov 0 siblings, 0 replies; 29+ messages in thread From: Vasiliy Kulikov @ 2011-06-05 19:53 UTC (permalink / raw) To: kernel-hardening [-- Attachment #1: Type: text/plain, Size: 944 bytes --] Solar, On Sun, Jun 05, 2011 at 23:40 +0400, Solar Designer wrote: > Since recent versions of OpenVZ build upon the namespaces code that has > been upstream'ed, I guess this will rely on upstream's namespaces code > (once we move to RHEL6'ish OpenVZ kernels and beyond), correct? Namespaces - yes. But AFAIK, only a pair of sysctls is accessible in containers, IIRC, only net sysctls. > Now, leaving sysctl's aside and speaking of mount options only for now, > what happens when a container mounts /proc with umask=007, but then > another container mounts /proc without that option or with umask=0? > Does the first container retain its restricted perms, including for > newly appearing entries under its /proc? If so, where is this different > setting stored? Is it per mount (preferable)? Is it per pid namespace > (OK)? It is per-pid_namespace, it should be fully consistent with OpenVZ. Thanks, -- Vasiliy [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* [kernel-hardening] Re: [owl-dev] procfs mount options [not found] <20110603191153.GB514@openwall.com> 2011-06-04 5:47 ` [kernel-hardening] Re: procfs mount options Vasiliy Kulikov @ 2011-06-05 18:36 ` Vasiliy Kulikov 2011-06-05 18:47 ` [kernel-hardening] " Solar Designer 1 sibling, 1 reply; 29+ messages in thread From: Vasiliy Kulikov @ 2011-06-05 18:36 UTC (permalink / raw) To: kernel-hardening [-- Attachment #1: Type: text/plain, Size: 661 bytes --] On Fri, Jun 03, 2011 at 23:11 +0400, Solar Designer wrote: > Indeed, we could set some of these perms with chmod post-mount, but as > discussed this has drawbacks. So ideally our preferred configuration > (which will be the default on Owl) should be achievable with mount > options alone. What if implement mode=XXX option to alter root directory permissions only, like tmpfs? Then all non-pid files may be chmod'ed without any race due to distro-specific policy and then "chmod a+rx /proc" to allow nonroot users to see procfs files. Thanks, -- Vasiliy Kulikov http://www.openwall.com - bringing security into open computing environments [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options 2011-06-05 18:36 ` [kernel-hardening] Re: [owl-dev] " Vasiliy Kulikov @ 2011-06-05 18:47 ` Solar Designer 0 siblings, 0 replies; 29+ messages in thread From: Solar Designer @ 2011-06-05 18:47 UTC (permalink / raw) To: kernel-hardening On Sun, Jun 05, 2011 at 10:36:20PM +0400, Vasiliy Kulikov wrote: > On Fri, Jun 03, 2011 at 23:11 +0400, Solar Designer wrote: > > Indeed, we could set some of these perms with chmod post-mount, but as > > discussed this has drawbacks. So ideally our preferred configuration > > (which will be the default on Owl) should be achievable with mount > > options alone. > > What if implement mode=XXX option to alter root directory permissions > only, like tmpfs? Then all non-pid files may be chmod'ed without any > race due to distro-specific policy and then "chmod a+rx /proc" to allow > nonroot users to see procfs files. This makes sense to me, although other mount options that you implemented appear to be sufficient to implement the desired default policy for Owl. Alexander ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2011-08-12 18:14 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20110603191153.GB514@openwall.com>
2011-06-04 5:47 ` [kernel-hardening] Re: procfs mount options Vasiliy Kulikov
2011-06-04 13:20 ` [kernel-hardening] " Solar Designer
2011-06-04 20:09 ` Vasiliy Kulikov
2011-06-04 20:59 ` Solar Designer
2011-06-05 18:24 ` [kernel-hardening] [RFC v1] " Vasiliy Kulikov
2011-06-05 19:26 ` Solar Designer
2011-06-05 19:47 ` Vasiliy Kulikov
2011-06-05 20:10 ` Solar Designer
2011-06-06 18:08 ` Vasiliy Kulikov
2011-06-06 18:33 ` Solar Designer
2011-06-08 17:23 ` [kernel-hardening] [RFC v2] " Vasiliy Kulikov
2011-06-08 17:43 ` Vasiliy Kulikov
2011-06-12 2:39 ` Solar Designer
2011-07-24 18:55 ` Vasiliy Kulikov
[not found] ` <20110724185036.GC3510@albatros>
2011-07-26 14:50 ` Vasiliy Kulikov
2011-07-29 17:47 ` [kernel-hardening] procfs {tid,tgid,attr}_allowed " Vasiliy Kulikov
2011-08-04 11:23 ` [kernel-hardening] " Vasiliy Kulikov
2011-08-10 10:02 ` Vasiliy Kulikov
2011-08-10 11:22 ` [kernel-hardening] " Solar Designer
2011-08-10 11:25 ` Solar Designer
2011-08-10 12:04 ` Vasiliy Kulikov
2011-08-10 13:34 ` Solar Designer
2011-08-12 18:14 ` Simon Marechal
2011-06-06 19:20 ` [kernel-hardening] [RFC v1] procfs " Vasiliy Kulikov
2011-06-05 19:17 ` [kernel-hardening] " Vasiliy Kulikov
2011-06-05 19:40 ` Solar Designer
2011-06-05 19:53 ` Vasiliy Kulikov
2011-06-05 18:36 ` [kernel-hardening] Re: [owl-dev] " Vasiliy Kulikov
2011-06-05 18:47 ` [kernel-hardening] " Solar Designer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox