* [kernel-hardening] Re: procfs mount options
[not found] <20110603191153.GB514@openwall.com>
@ 2011-06-04 5:47 ` Vasiliy Kulikov
2011-06-04 13:20 ` [kernel-hardening] " Solar Designer
2011-06-05 18:36 ` [kernel-hardening] Re: [owl-dev] " Vasiliy Kulikov
1 sibling, 1 reply; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-06-04 5:47 UTC (permalink / raw)
To: kernel-hardening; +Cc: Eugene Teo
[-- Attachment #1: Type: text/plain, Size: 2243 bytes --]
Solar,
On Fri, Jun 03, 2011 at 23:11 +0400, Solar Designer wrote:
> I welcome suggestions on how to achieve the desired functionality for
> procfs in a non-confusing and generic way. It should support the
> following reasonable configuration:
>
> /proc/PID directories restricted to group proc (except for owners and
> root, indeed). However, /proc/cpuinfo and the like unrestricted.
> Here's what this looks like on Linux 2.4.x-ow:
>
> dr-xr-x--- 3 root proc 0 Jun 3 22:59 1
> ...
> dr-xr-x--- 3 syslogd proc 0 Jun 3 22:59 205
> dr-xr-x--- 3 klogd proc 0 Jun 3 22:59 211
> ...
> -r--r--r-- 1 root proc 0 Jun 3 23:00 cpuinfo
> ...
> -r-------- 1 root proc 536743936 Jun 3 23:00 kcore
> -r-------- 1 root proc 0 May 5 20:36 kmsg
> ...
> dr-xr-x--- 5 root proc 0 Jun 3 23:00 net
> ...
> -r--r--r-- 1 root proc 0 Jun 3 23:00 uptime
> -r--r--r-- 1 root proc 0 Jun 3 23:00 version
>
> Perhaps gid=proc,umask=007 should result in the above for /proc/PID, but
> how do we justify it not affecting /proc/cpuinfo, uptime, version (and
> many others)? How do we justify it nevertheless affecting /proc/net (or
> should another option do that)?
I think it should be done with separate mount options for /proc/self/net
(/proc/net is a symlink to /proc/self/net since net namespaces
introduction) and for /proc/PID. All other files should be e.g.
chmod'ed go= and then some white list should be chmod'ed to the relaxed
perms.
> Indeed, we could set some of these perms with chmod post-mount, but as
> discussed this has drawbacks.
Where its drawbacks were discussed? I cannot find anything on
owl-dev. Do you mean some possible diffirences between procfs files
among different kernel versions? If so, white list instead of black
list should partly solve it.
> So ideally our preferred configuration
> (which will be the default on Owl) should be achievable with mount
> options alone.
At least for sysfs it is unreachable if we go in the current direction -
umask doesn't change perms of already created files, and additional
"chmod -R" is needed anyway.
Thanks,
--
Vasiliy
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options
2011-06-04 5:47 ` [kernel-hardening] Re: procfs mount options Vasiliy Kulikov
@ 2011-06-04 13:20 ` Solar Designer
2011-06-04 20:09 ` Vasiliy Kulikov
0 siblings, 1 reply; 29+ messages in thread
From: Solar Designer @ 2011-06-04 13:20 UTC (permalink / raw)
To: kernel-hardening
Vasiliy,
On Sat, Jun 04, 2011 at 09:47:58AM +0400, Vasiliy Kulikov wrote:
> I think it should be done with separate mount options for /proc/self/net
> (/proc/net is a symlink to /proc/self/net since net namespaces
> introduction) and for /proc/PID.
OK. What do we call these? pmask (for "p"rocesses) and nmask (for
"n"etwork)? Doesn't this deviation from umask reduce the chances of the
patch getting accepted?.. And is there really much reason to let a user
see others' processes, but not network connections, or vice versa?
(Maybe there is.)
> All other files should be e.g.
> chmod'ed go= and then some white list should be chmod'ed to the relaxed
> perms.
Where would this logic be implemented - hard-coded in the kernel
(enabled with some mount option?) or done by a script in the userspace
post-mount?
> > Indeed, we could set some of these perms with chmod post-mount, but as
> > discussed this has drawbacks.
>
> Where its drawbacks were discussed?
IIRC, you, I, and some others discussed them via Jabber.
> I cannot find anything on
> owl-dev. Do you mean some possible diffirences between procfs files
> among different kernel versions? If so, white list instead of black
> list should partly solve it.
No, I meant race conditions, which you had to deal with by mounting
under a parent directory with restricted perms, then mount --move'ing.
Part of the intent of us patching the kernel is to avoid having to do
things like that. Also, an admin might just "mount /proc", not knowing
that special userspace magic was implemented by the distro.
A possible approach is to implement mount options gid=... and restricted
(name suggested by spender), where the latter would enable a hard-coded
set of permissions. This is not generic, but at least it's simple, not
confusing, and it allows for future changes (we may change what exactly
restricted means).
> > So ideally our preferred configuration
> > (which will be the default on Owl) should be achievable with mount
> > options alone.
>
> At least for sysfs it is unreachable if we go in the current direction -
> umask doesn't change perms of already created files, and additional
> "chmod -R" is needed anyway.
Hmm. I guess I still don't understand your umask vs. mode stuff.
Ideally, I'd have umask apply to each instance of sysfs (and other
special filesystems) individually, affecting all files under that
instance (both those already existing in the kernel at the time of mount
and those appearing after mount). But perhaps that's not how the kernel
keeps track of permissions for those filesystems currently? (Sorry, I
haven't played with this since Linux 2.4, which didn't even have sysfs.)
Thanks,
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options
2011-06-04 13:20 ` [kernel-hardening] " Solar Designer
@ 2011-06-04 20:09 ` Vasiliy Kulikov
2011-06-04 20:59 ` Solar Designer
0 siblings, 1 reply; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-06-04 20:09 UTC (permalink / raw)
To: kernel-hardening
Solar,
On Sat, Jun 04, 2011 at 17:20 +0400, Solar Designer wrote:
> On Sat, Jun 04, 2011 at 09:47:58AM +0400, Vasiliy Kulikov wrote:
> > I think it should be done with separate mount options for /proc/self/net
> > (/proc/net is a symlink to /proc/self/net since net namespaces
> > introduction) and for /proc/PID.
>
> OK. What do we call these? pmask (for "p"rocesses) and nmask (for
> "n"etwork)? Doesn't this deviation from umask reduce the chances of the
> patch getting accepted?.. And is there really much reason to let a user
> see others' processes, but not network connections, or vice versa?
> (Maybe there is.)
While implementing this PID permissions restricting mount opt (don't
know how to call it better), I've noticed that only 2 modes make sense -
full view (current behaviour) and restricted view (similar to umask=07).
So, I propose to implement & call the options as follows:
(no)safepid - instead of umask/mode; if it is on, only
owner/root/special group may read /proc/PID.
(no)hidepid - hide restricted /proc/PID from *stat*(2), getdents*()
and similar. Needs safepid.
(no)hidenet - hide /proc/PID/net contents (restricting /proc/PID/net
itself is a bit more difficult to implement).
gid - group to evade safepid,hidepid,hidenet.
Also as hidepid needs safepid, maybe it makes sense to implement one
option instead of safepid and hidepid: hidepid=X where:
X = 0 means nohidepid,nosafepid above
X = 1 means safepid,nohidepid above
X = 2 means safepid,hidepid above
> > All other files should be e.g.
> > chmod'ed go= and then some white list should be chmod'ed to the relaxed
> > perms.
>
> Where would this logic be implemented - hard-coded in the kernel
> (enabled with some mount option?) or done by a script in the userspace
> post-mount?
I'm sure if we try to implement it in kernel, we'd surely get NAK
because it may be implemented much simplier in userspace. Compare:
a single chmod -R vs. some in-kernel filtering of core procfs files and
driver-specific files (yes, there are rare driver files in /proc, it is
blamed, but anyway).
> > > Indeed, we could set some of these perms with chmod post-mount, but as
> > > discussed this has drawbacks.
> >
> > Where its drawbacks were discussed?
>
> IIRC, you, I, and some others discussed them via Jabber.
Ah, OK ;)
> > I cannot find anything on
> > owl-dev. Do you mean some possible diffirences between procfs files
> > among different kernel versions? If so, white list instead of black
> > list should partly solve it.
>
> No, I meant race conditions, which you had to deal with by mounting
> under a parent directory with restricted perms, then mount --move'ing.
> Part of the intent of us patching the kernel is to avoid having to do
> things like that.
For distros with sysv init there is no race if it is implemented in a
boot script before any login/cron/etc. It's a rare situation now,
though :)
> Also, an admin might just "mount /proc", not knowing
> that special userspace magic was implemented by the distro.
Agreed, but I still don't think it is a good argument for upstream...
> A possible approach is to implement mount options gid=... and restricted
> (name suggested by spender), where the latter would enable a hard-coded
> set of permissions.
To what files? All /proc/* files or recursively? Is there a white list
of o=r/o=rw files or all of them would be chmod'ed "o="?
> This is not generic, but at least it's simple, not
> confusing, and it allows for future changes (we may change what exactly
> restricted means).
Changing what some option means after implementing it is a ABI breakage,
which is terrible for upstream. However, it might be acceptable for Owl.
> > > So ideally our preferred configuration
> > > (which will be the default on Owl) should be achievable with mount
> > > options alone.
> >
> > At least for sysfs it is unreachable if we go in the current direction -
> > umask doesn't change perms of already created files, and additional
> > "chmod -R" is needed anyway.
>
> Hmm. I guess I still don't understand your umask vs. mode stuff.
>
> Ideally, I'd have umask apply to each instance of sysfs (and other
> special filesystems) individually, affecting all files under that
> instance
What instance? All files (=kobjects) are global, there is almost no
division among mount points. The only division is among net namespaces,
but it is mostly the exception than the rule and there will be always
files global to all namespaces (e.g. modules list, devices list, etc.)
> (both those already existing in the kernel at the time of mount
> and those appearing after mount). But perhaps that's not how the kernel
> keeps track of permissions for those filesystems currently?
I'd not change permissions of all currently existing files. What would
you do with already created and chmod'ed files?
And what you suggest is strictly speaking not "umask". "umask" is for newly
created files. "mode" is almost everything judging by mount(8) :)
Currently I have only one TODO in my list for procfs - restricted
/proc/PID/net returns -EINVAL instead of -ENOENT as it is done in
grsecurity :) Need to make it say true.
Thanks,
Vasiliy.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options
2011-06-04 20:09 ` Vasiliy Kulikov
@ 2011-06-04 20:59 ` Solar Designer
2011-06-05 18:24 ` [kernel-hardening] [RFC v1] " Vasiliy Kulikov
2011-06-05 19:17 ` [kernel-hardening] " Vasiliy Kulikov
0 siblings, 2 replies; 29+ messages in thread
From: Solar Designer @ 2011-06-04 20:59 UTC (permalink / raw)
To: kernel-hardening
On Sun, Jun 05, 2011 at 12:09:51AM +0400, Vasiliy Kulikov wrote:
> While implementing this PID permissions restricting mount opt (don't
> know how to call it better), I've noticed that only 2 modes make sense -
> full view (current behaviour) and restricted view (similar to umask=07).
> So, I propose to implement & call the options as follows:
>
> (no)safepid - instead of umask/mode; if it is on, only
> owner/root/special group may read /proc/PID.
>
> (no)hidepid - hide restricted /proc/PID from *stat*(2), getdents*()
> and similar. Needs safepid.
>
> (no)hidenet - hide /proc/PID/net contents (restricting /proc/PID/net
> itself is a bit more difficult to implement).
>
> gid - group to evade safepid,hidepid,hidenet.
>
> Also as hidepid needs safepid, maybe it makes sense to implement one
> option instead of safepid and hidepid: hidepid=X where:
>
> X = 0 means nohidepid,nosafepid above
> X = 1 means safepid,nohidepid above
> X = 2 means safepid,hidepid above
This makes sense to me, and I like the tri-state option better.
> On Sat, Jun 04, 2011 at 17:20 +0400, Solar Designer wrote:
> > A possible approach is to implement mount options gid=... and restricted
> > (name suggested by spender), where the latter would enable a hard-coded
> > set of permissions.
>
> To what files? All /proc/* files or recursively? Is there a white list
> of o=r/o=rw files or all of them would be chmod'ed "o="?
Oh, that's not what I was referring to. The restricted option would
essentially be the same as your hidepid=2 - enabling whatever grsecurity
normally does. But having a tri-state option as you propose makes more
sense to me.
> > This is not generic, but at least it's simple, not
> > confusing, and it allows for future changes (we may change what exactly
> > restricted means).
>
> Changing what some option means after implementing it is a ABI breakage,
> which is terrible for upstream.
This depends on how the option is defined initially. It is possible to
define the option as "enabling unspecified version-dependent restrictions,
which are subject to change across kernel versions and builds". ;-) OK,
the word "unspecified" may be dropped.
> > Ideally, I'd have umask apply to each instance of sysfs (and other
> > special filesystems) individually, affecting all files under that
> > instance
>
> What instance? All files (=kobjects) are global, there is almost no
> division among mount points. The only division is among net namespaces,
> but it is mostly the exception than the rule and there will be always
> files global to all namespaces (e.g. modules list, devices list, etc.)
>
> > (both those already existing in the kernel at the time of mount
> > and those appearing after mount). But perhaps that's not how the kernel
> > keeps track of permissions for those filesystems currently?
>
> I'd not change permissions of all currently existing files. What would
> you do with already created and chmod'ed files?
The reality (kobjects being global) doesn't quite match my desired
behavior (being able to mount more than one instance of procfs, etc.
and set their mount options affecting perms differently). However, this
is not important enough to warrant changing the reality. ;-)
Here's a related thought: if these mount options happen to affect all
instances of the filesystem (in the same container), maybe they should
be sysctl's instead?
Here's another thought: if the gid is not specified but hidepid is
enabled, maybe the perms should be 700 rather than 750 (don't grant any
privs to group root).
> And what you suggest is strictly speaking not "umask". "umask" is for newly
> created files. "mode" is almost everything judging by mount(8) :)
mount(8) mentions both "umask" and "mode" for different filesystems,
with slightly different semantics.
> Currently I have only one TODO in my list for procfs - restricted
> /proc/PID/net returns -EINVAL instead of -ENOENT as it is done in
> grsecurity :) Need to make it say true.
Do you mean you have already implemented the mount options as you have
proposed above?
Thanks,
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* [kernel-hardening] [RFC v1] procfs mount options
2011-06-04 20:59 ` Solar Designer
@ 2011-06-05 18:24 ` Vasiliy Kulikov
2011-06-05 19:26 ` Solar Designer
2011-06-05 19:17 ` [kernel-hardening] " Vasiliy Kulikov
1 sibling, 1 reply; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-06-05 18:24 UTC (permalink / raw)
To: kernel-hardening
[-- Attachment #1: Type: text/plain, Size: 13695 bytes --]
This patch introduces support of procfs mount options and adds mount
options to restrict access to /proc/PID/ directories and /proc/PID/net/
contents. The default backward-compatible behaviour is left untouched.
The first mount option is called "hidepid" and its value defines how much
info about processes we want to be available for non-owners:
hidepid=0 (default) means the current behaviour - anybody may read all
/proc/PID/* files.
hidepid=1 means all files not running as current user and group are
unaccessible (directories' permissions are 0550). Sensitive files like
cmdline, io, sched*, status, wchan are now protected against other
users.
hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to
other users. It doesn't mean that it hides a fact whether a process
exists (it can be learned by other means, e.g. by sending signals), but
it hides process' uid and gid. It greatly compicates intruder's task of
gathering info about running processes, whether some daemon runs with
elevated privileges, whether other user runs some sensitive program,
whether other users run any program at all, etc.
hidenet means /proc/PID/net will be accessible to processes with
CAP_NET_ADMIN capability or to members of a special group.
gid=XXX defines a group that will be able to gather all processes' info
and network connections info.
TODO/thoughs:
- /proc/pid/net/ currently doesn't show ANYTHING, even "." and "..".
This is confusing :)
- copy options description to Documenataion/filesystems/procfs.txt
- need to alter "(inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO))" checks
to honour non-pid directories.
- what if one keeps open /proc/PID/ while executing set*id/capable
binary?
- maybe hidenet belongs to net namespace, not pid? However, it is
seen through procfs, which is per-pid_namespace.
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9d096e8..dd28832 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1688,6 +1688,7 @@ static struct inode *proc_pid_make_inode(struct super_block * sb, struct task_st
struct inode * inode;
struct proc_inode *ei;
const struct cred *cred;
+ struct nsproxy *ns;
/* We need a new inode */
@@ -1712,7 +1713,12 @@ static struct inode *proc_pid_make_inode(struct super_block * sb, struct task_st
rcu_read_lock();
cred = __task_cred(task);
inode->i_uid = cred->euid;
- inode->i_gid = cred->egid;
+ ns = task_nsproxy(task);
+ if (ns && ns->pid_ns->hide_pid) {
+ inode->i_gid = ns->pid_ns->pid_gid;
+ } else {
+ inode->i_gid = cred->egid;
+ }
rcu_read_unlock();
}
security_task_to_inode(task, inode);
@@ -1725,11 +1731,22 @@ out_unlock:
return NULL;
}
+/* TODO: ugly, need something more generic */
+static bool proc_pid_may_getattr(struct pid_namespace* pid, struct task_struct *task)
+{
+ if (pid->hide_pid < 2)
+ return true;
+ if (ptrace_may_access(task, PTRACE_MODE_READ))
+ return true;
+ return in_group_p(pid->pid_gid);
+}
+
static int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
{
struct inode *inode = dentry->d_inode;
struct task_struct *task;
const struct cred *cred;
+ struct pid_namespace *pid = dentry->d_sb->s_fs_info;
generic_fillattr(inode, stat);
@@ -1738,11 +1755,17 @@ static int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat
stat->gid = 0;
task = pid_task(proc_pid(inode), PIDTYPE_PID);
if (task) {
- if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) ||
- task_dumpable(task)) {
+ /* TODO: if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) */
+ if ((((inode->i_mode & S_IFDIR) == S_IFDIR) ||
+ task_dumpable(task))) {
cred = __task_cred(task);
- stat->uid = cred->euid;
- stat->gid = cred->egid;
+ if (proc_pid_may_getattr(pid, task)) {
+ stat->uid = cred->euid;
+ if (pid->hide_pid)
+ stat->gid = pid->pid_gid;
+ else
+ stat->gid = cred->egid;
+ }
}
}
rcu_read_unlock();
@@ -1771,6 +1794,7 @@ static int pid_revalidate(struct dentry *dentry, struct nameidata *nd)
struct inode *inode;
struct task_struct *task;
const struct cred *cred;
+ struct pid_namespace *pid = dentry->d_sb->s_fs_info;
if (nd && nd->flags & LOOKUP_RCU)
return -ECHILD;
@@ -1779,12 +1803,16 @@ static int pid_revalidate(struct dentry *dentry, struct nameidata *nd)
task = get_proc_task(inode);
if (task) {
- if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) ||
+ /* TODO: if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) */
+ if (((inode->i_mode & S_IFDIR) == S_IFDIR) ||
task_dumpable(task)) {
rcu_read_lock();
cred = __task_cred(task);
inode->i_uid = cred->euid;
- inode->i_gid = cred->egid;
+ if (pid->hide_pid)
+ inode->i_gid = pid->pid_gid;
+ else
+ inode->i_gid = cred->egid;
rcu_read_unlock();
} else {
inode->i_uid = 0;
@@ -2986,12 +3014,21 @@ static struct dentry *proc_pid_instantiate(struct inode *dir,
{
struct dentry *error = ERR_PTR(-ENOENT);
struct inode *inode;
+ struct nsproxy *ns;
+ unsigned int mode;
inode = proc_pid_make_inode(dir->i_sb, task);
if (!inode)
goto out;
- inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO;
+ rcu_read_lock();
+ ns = task_nsproxy(task);
+ if (ns && ns->pid_ns->hide_pid)
+ mode = 07;
+ else
+ mode = 0;
+ rcu_read_unlock();
+ inode->i_mode = S_IFDIR | ((S_IRUGO|S_IXUGO) & ~mode);
inode->i_op = &proc_tgid_base_inode_operations;
inode->i_fop = &proc_tgid_base_operations;
inode->i_flags|=S_IMMUTABLE;
@@ -3093,6 +3130,11 @@ static int proc_pid_fill_cache(struct file *filp, void *dirent, filldir_t filldi
proc_pid_instantiate, iter.task, NULL);
}
+static int fake_filldir(void *buf, const char *name, int namelen, loff_t offset, u64 ino, unsigned d_type)
+{
+ return 0;
+}
+
/* for the /proc/ directory itself, after non-process stuff has been done */
int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
{
@@ -3100,6 +3142,7 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
struct task_struct *reaper = get_proc_task(filp->f_path.dentry->d_inode);
struct tgid_iter iter;
struct pid_namespace *ns;
+ filldir_t __filldir;
if (!reaper)
goto out_no_task;
@@ -3116,8 +3159,13 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
for (iter = next_tgid(ns, iter);
iter.task;
iter.tgid += 1, iter = next_tgid(ns, iter)) {
+ if (proc_pid_may_getattr(ns, iter.task))
+ __filldir = filldir;
+ else
+ __filldir = fake_filldir;
+
filp->f_pos = iter.tgid + TGID_OFFSET;
- if (proc_pid_fill_cache(filp, dirent, filldir, iter) < 0) {
+ if (proc_pid_fill_cache(filp, dirent, __filldir, iter) < 0) {
put_task_struct(iter.task);
goto out;
}
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 176ce4c..698fb41 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -7,6 +7,7 @@
#include <linux/time.h>
#include <linux/proc_fs.h>
#include <linux/kernel.h>
+#include <linux/pid_namespace.h>
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/stat.h>
@@ -17,7 +18,9 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/sysctl.h>
+#include <linux/seq_file.h>
#include <linux/slab.h>
+#include <linux/mount.h>
#include <asm/system.h>
#include <asm/uaccess.h>
@@ -93,12 +96,29 @@ void __init proc_init_inodecache(void)
init_once);
}
+static int proc_show_options(struct seq_file *seq, struct vfsmount *vfs)
+{
+ struct super_block *sb = vfs->mnt_sb;
+ struct pid_namespace *pid = sb->s_fs_info;
+
+ if (pid->pid_gid)
+ seq_printf(seq, ",gid=%lu", (unsigned long)pid->pid_gid);
+ if (pid->hide_pid != 0)
+ seq_printf(seq, ",hidepid=%u", pid->hide_pid);
+ if (pid->hide_net)
+ seq_printf(seq, ",hidenet");
+
+ return 0;
+}
+
static const struct super_operations proc_sops = {
.alloc_inode = proc_alloc_inode,
.destroy_inode = proc_destroy_inode,
.drop_inode = generic_delete_inode,
.evict_inode = proc_evict_inode,
.statfs = simple_statfs,
+ .remount_fs = proc_remount,
+ .show_options = proc_show_options,
};
static void __pde_users_dec(struct proc_dir_entry *pde)
@@ -468,7 +488,7 @@ int proc_fill_super(struct super_block *s)
s->s_magic = PROC_SUPER_MAGIC;
s->s_op = &proc_sops;
s->s_time_gran = 1;
-
+
pde_get(&proc_root);
root_inode = proc_get_inode(s, &proc_root);
if (!root_inode)
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 9ad561d..1cacb6a 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -110,6 +110,7 @@ void pde_put(struct proc_dir_entry *pde);
extern struct vfsmount *proc_mnt;
int proc_fill_super(struct super_block *);
struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
+int proc_remount(struct super_block *sb, int *flags, char *data);
/*
* These are generic /proc routines that use the internal
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 9020ac1..7dc0b96 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -22,6 +22,7 @@
#include <linux/mount.h>
#include <linux/nsproxy.h>
#include <net/net_namespace.h>
+#include <linux/pid_namespace.h>
#include <linux/seq_file.h>
#include "internal.h"
@@ -105,6 +106,11 @@ static struct net *get_proc_task_net(struct inode *dir)
struct task_struct *task;
struct nsproxy *ns;
struct net *net = NULL;
+ struct pid_namespace *pid = dir->i_sb->s_fs_info;
+
+ if (pid->hide_net &&
+ (!capable(CAP_NET_ADMIN) && !in_group_p(pid->pid_gid)))
+ return net;
rcu_read_lock();
task = pid_task(proc_pid(dir), PIDTYPE_PID);
@@ -162,7 +168,7 @@ static int proc_tgid_net_readdir(struct file *filp, void *dirent,
int ret;
struct net *net;
- ret = -EINVAL;
+ ret = -ENOENT;
net = get_proc_task_net(filp->f_path.dentry->d_inode);
if (net != NULL) {
ret = proc_readdir_de(net->proc_net, filp, dirent, filldir);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index ef9fa8e..269067c 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -18,6 +18,7 @@
#include <linux/bitops.h>
#include <linux/mount.h>
#include <linux/pid_namespace.h>
+#include <linux/parser.h>
#include "internal.h"
@@ -35,6 +36,75 @@ static int proc_set_super(struct super_block *sb, void *data)
return set_anon_super(sb, NULL);
}
+enum {
+ Opt_gid, Opt_hidepid, Opt_hidenet, Opt_nohidenet, Opt_err,
+};
+
+static const match_table_t tokens = {
+ {Opt_hidepid, "hidepid=%u"},
+ {Opt_gid, "gid=%u"},
+ {Opt_hidenet, "hidenet"},
+ {Opt_nohidenet, "nohidenet"},
+ {Opt_err, NULL},
+};
+
+static int proc_parse_options(char *options, struct pid_namespace *pid)
+{
+ char *p;
+ substring_t args[MAX_OPT_ARGS];
+ int option;
+
+ pr_debug("proc: options = %s\n", options);
+
+ if (!options)
+ return 1;
+
+ while ((p = strsep(&options, ",")) != NULL) {
+ int token;
+ if (!*p)
+ continue;
+
+ args[0].to = args[0].from = 0;
+ token = match_token(p, tokens, args);
+ switch (token) {
+ case Opt_gid:
+ if (match_int(&args[0], &option))
+ return 0;
+ pid->pid_gid = option;
+ break;
+ case Opt_hidepid:
+ if (match_int(&args[0], &option))
+ return 0;
+ if (option < 0 || option > 2) {
+ pr_err("proc: hidepid value must be between 0 and 2.\n");
+ return 0;
+ }
+ pid->hide_pid = option;
+ break;
+ case Opt_hidenet:
+ pid->hide_net = true;
+ break;
+ case Opt_nohidenet:
+ pid->hide_net = false;
+ break;
+ default:
+ pr_err("proc: unrecognized mount option \"%s\" "
+ "or missing value", p);
+ return 0;
+ }
+ }
+
+ pr_debug("proc: gid = %u, hidepid = %o, hidenet = %d\n", pid->pid_gid, pid->hide_pid, (int)pid->hide_net);
+
+ return 1;
+}
+
+int proc_remount(struct super_block *sb, int *flags, char *data)
+{
+ struct pid_namespace *pid = sb->s_fs_info;
+ return !proc_parse_options(data, pid);
+}
+
static struct dentry *proc_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
@@ -42,6 +112,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
struct super_block *sb;
struct pid_namespace *ns;
struct proc_inode *ei;
+ char *options;
if (proc_mnt) {
/* Seed the root directory with a pid so it doesn't need
@@ -54,10 +125,13 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
ei->pid = find_get_pid(1);
}
- if (flags & MS_KERNMOUNT)
+ if (flags & MS_KERNMOUNT) {
ns = (struct pid_namespace *)data;
- else
+ options = NULL;
+ } else {
ns = current->nsproxy->pid_ns;
+ options = data;
+ }
sb = sget(fs_type, proc_test_super, proc_set_super, ns);
if (IS_ERR(sb))
@@ -65,6 +139,10 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
if (!sb->s_root) {
sb->s_flags = flags;
+ if (!proc_parse_options(options, ns)) {
+ deactivate_locked_super(sb);
+ return ERR_PTR(-EINVAL);
+ }
err = proc_fill_super(sb);
if (err) {
deactivate_locked_super(sb);
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 38d1032..1c33094 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -30,6 +30,9 @@ struct pid_namespace {
#ifdef CONFIG_BSD_PROCESS_ACCT
struct bsd_acct_struct *bacct;
#endif
+ gid_t pid_gid;
+ int hide_pid;
+ bool hide_net;
};
extern struct pid_namespace init_pid_ns;
--
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [kernel-hardening] Re: [owl-dev] procfs mount options
[not found] <20110603191153.GB514@openwall.com>
2011-06-04 5:47 ` [kernel-hardening] Re: procfs mount options Vasiliy Kulikov
@ 2011-06-05 18:36 ` Vasiliy Kulikov
2011-06-05 18:47 ` [kernel-hardening] " Solar Designer
1 sibling, 1 reply; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-06-05 18:36 UTC (permalink / raw)
To: kernel-hardening
[-- Attachment #1: Type: text/plain, Size: 661 bytes --]
On Fri, Jun 03, 2011 at 23:11 +0400, Solar Designer wrote:
> Indeed, we could set some of these perms with chmod post-mount, but as
> discussed this has drawbacks. So ideally our preferred configuration
> (which will be the default on Owl) should be achievable with mount
> options alone.
What if implement mode=XXX option to alter root directory permissions
only, like tmpfs? Then all non-pid files may be chmod'ed without any
race due to distro-specific policy and then "chmod a+rx /proc" to allow
nonroot users to see procfs files.
Thanks,
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options
2011-06-05 18:36 ` [kernel-hardening] Re: [owl-dev] " Vasiliy Kulikov
@ 2011-06-05 18:47 ` Solar Designer
0 siblings, 0 replies; 29+ messages in thread
From: Solar Designer @ 2011-06-05 18:47 UTC (permalink / raw)
To: kernel-hardening
On Sun, Jun 05, 2011 at 10:36:20PM +0400, Vasiliy Kulikov wrote:
> On Fri, Jun 03, 2011 at 23:11 +0400, Solar Designer wrote:
> > Indeed, we could set some of these perms with chmod post-mount, but as
> > discussed this has drawbacks. So ideally our preferred configuration
> > (which will be the default on Owl) should be achievable with mount
> > options alone.
>
> What if implement mode=XXX option to alter root directory permissions
> only, like tmpfs? Then all non-pid files may be chmod'ed without any
> race due to distro-specific policy and then "chmod a+rx /proc" to allow
> nonroot users to see procfs files.
This makes sense to me, although other mount options that you
implemented appear to be sufficient to implement the desired default
policy for Owl.
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options
2011-06-04 20:59 ` Solar Designer
2011-06-05 18:24 ` [kernel-hardening] [RFC v1] " Vasiliy Kulikov
@ 2011-06-05 19:17 ` Vasiliy Kulikov
2011-06-05 19:40 ` Solar Designer
1 sibling, 1 reply; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-06-05 19:17 UTC (permalink / raw)
To: kernel-hardening
[-- Attachment #1: Type: text/plain, Size: 1284 bytes --]
Solar,
On Sun, Jun 05, 2011 at 00:59 +0400, Solar Designer wrote:
> > > This is not generic, but at least it's simple, not
> > > confusing, and it allows for future changes (we may change what exactly
> > > restricted means).
> >
> > Changing what some option means after implementing it is a ABI breakage,
> > which is terrible for upstream.
>
> This depends on how the option is defined initially. It is possible to
> define the option as "enabling unspecified version-dependent restrictions,
> which are subject to change across kernel versions and builds". ;-) OK,
> the word "unspecified" may be dropped.
Everything that is implemented should behave the same way it was
initially implemented, regardless of what is explicitly marked as "ABI".
This is upsteam's way to define "ABI" term. Often they deliberately
break such ABI, but normally this is not encouraged.
> Here's a related thought: if these mount options happen to affect all
> instances of the filesystem (in the same container), maybe they should
> be sysctl's instead?
AFAIR, only net namespaces have their own sysctl sets. Other sysctls
are global. So, implementing pid_namespace-specific sysctl would be a
bit weird (according to current policies).
Thanks,
--
Vasiliy
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options
2011-06-05 18:24 ` [kernel-hardening] [RFC v1] " Vasiliy Kulikov
@ 2011-06-05 19:26 ` Solar Designer
2011-06-05 19:47 ` Vasiliy Kulikov
0 siblings, 1 reply; 29+ messages in thread
From: Solar Designer @ 2011-06-05 19:26 UTC (permalink / raw)
To: kernel-hardening
Vasiliy, all -
To comment on this in detail (such as on specific source code changes),
I'd need to dive into it myself, effectively duplicating Vasiliy's work.
So I am not doing that, but I'd appreciate it if others in here review
and comment (preferably, folks with more up-to-date experience of
patching these areas of the kernel - such as after the introduction of
namespaces; my own experience is too rusty). I think it's going to be
similar for other tasks Vasiliy will be working on under this project -
I am happy to provide overall guidance, but I am too rusty on many of
the details involved and I am not going to have time to (re-)learn them
this summer. So any help is appreciated.
On Sun, Jun 05, 2011 at 10:24:31PM +0400, Vasiliy Kulikov wrote:
> This patch introduces support of procfs mount options and adds mount
> options to restrict access to /proc/PID/ directories and /proc/PID/net/
> contents. The default backward-compatible behaviour is left untouched.
>
> The first mount option is called "hidepid" and its value defines how much
> info about processes we want to be available for non-owners:
>
> hidepid=0 (default) means the current behaviour - anybody may read all
> /proc/PID/* files.
Aren't some /prod/PID/* files restricted by default, in stock kernels?
I think several are (auxv, fd/, mem). So perhaps re-word this.
> hidepid=1 means all files not running as current user and group are
"... files not running ..." needs to be re-worded.
> TODO/thoughs:
> - /proc/pid/net/ currently doesn't show ANYTHING, even "." and "..".
> This is confusing :)
Ouch. Can't you simply restrict its perms such that this directory
can't be listed unless you have privs? It should act as a normal mode
550 directory on a regular filesystem.
> - copy options description to Documenataion/filesystems/procfs.txt
Yes.
> - need to alter "(inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO))" checks
> to honour non-pid directories.
I see this in the patch, but can't really comment without reviewing the
code in full context myself.
> - what if one keeps open /proc/PID/ while executing set*id/capable
> binary?
Then they deliberately grant this privilege to this process (and maybe
to its heirs). I see no problem with that.
However, if a set*id/capable binary can be tricked into opening
/proc/PID/ or a /proc/PID/* entry on its own, and would leave the fd
open for whatever other program(s) it invokes, then we have a problem.
But that should probably be regarded as an issue of that specific
program, unless the underlying cause enabling the specific attack is in
the kernel or in a library. I find it unlikely that an issue like that
would be specific to just /proc/PID; it'd probably be usable on other
restricted-perms directories and/or files as well.
> - maybe hidenet belongs to net namespace, not pid? However, it is
> seen through procfs, which is per-pid_namespace.
I am not the right person to comment on that.
> + if (pid->hide_net &&
> + (!capable(CAP_NET_ADMIN) && !in_group_p(pid->pid_gid)))
capable() sets a flag when it makes use of the capability (or at least
it used to), visible via pacct. So, unless anything has changed in this
area, it is best to check capable() last, such that it's only reached
when it actually makes a difference. Thus, I'd write:
if (pid->hide_net && !in_group_p(pid->pid_gid) &&
!capable(CAP_NET_ADMIN))
Also, what did you mean by the extra braces? Just separating the
setting check from the permissions check for readability? I don't think
this helps.
Thanks,
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options
2011-06-05 19:17 ` [kernel-hardening] " Vasiliy Kulikov
@ 2011-06-05 19:40 ` Solar Designer
2011-06-05 19:53 ` Vasiliy Kulikov
0 siblings, 1 reply; 29+ messages in thread
From: Solar Designer @ 2011-06-05 19:40 UTC (permalink / raw)
To: kernel-hardening
On Sun, Jun 05, 2011 at 11:17:47PM +0400, Vasiliy Kulikov wrote:
> On Sun, Jun 05, 2011 at 00:59 +0400, Solar Designer wrote:
> > Here's a related thought: if these mount options happen to affect all
> > instances of the filesystem (in the same container), maybe they should
> > be sysctl's instead?
>
> AFAIR, only net namespaces have their own sysctl sets. Other sysctls
> are global. So, implementing pid_namespace-specific sysctl would be a
> bit weird (according to current policies).
Here's what we have immediate need for, in practice:
We need to be able to mount /proc with different permission settings in
different OpenVZ containers (perhaps running different distros, which
have their different defaults - e.g., Owl will use the restricted proc
options by default, but other distros mostly won't).
Since recent versions of OpenVZ build upon the namespaces code that has
been upstream'ed, I guess this will rely on upstream's namespaces code
(once we move to RHEL6'ish OpenVZ kernels and beyond), correct?
Now, leaving sysctl's aside and speaking of mount options only for now,
what happens when a container mounts /proc with umask=007, but then
another container mounts /proc without that option or with umask=0?
Does the first container retain its restricted perms, including for
newly appearing entries under its /proc? If so, where is this different
setting stored? Is it per mount (preferable)? Is it per pid namespace
(OK)? Or per net namespace (weird)?
Thanks,
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options
2011-06-05 19:26 ` Solar Designer
@ 2011-06-05 19:47 ` Vasiliy Kulikov
2011-06-05 20:10 ` Solar Designer
0 siblings, 1 reply; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-06-05 19:47 UTC (permalink / raw)
To: kernel-hardening
[-- Attachment #1: Type: text/plain, Size: 3611 bytes --]
Solar,
On Sun, Jun 05, 2011 at 23:26 +0400, Solar Designer wrote:
> On Sun, Jun 05, 2011 at 10:24:31PM +0400, Vasiliy Kulikov wrote:
> > This patch introduces support of procfs mount options and adds mount
> > options to restrict access to /proc/PID/ directories and /proc/PID/net/
> > contents. The default backward-compatible behaviour is left untouched.
> >
> > The first mount option is called "hidepid" and its value defines how much
> > info about processes we want to be available for non-owners:
> >
> > hidepid=0 (default) means the current behaviour - anybody may read all
> > /proc/PID/* files.
>
> Aren't some /prod/PID/* files restricted by default, in stock kernels?
> I think several are (auxv, fd/, mem). So perhaps re-word this.
Yes, I've mentioned still accessible files in hidepid=1. Will do.
> > hidepid=1 means all files not running as current user and group are
>
> "... files not running ..." needs to be re-worded.
Oops, made a mistake while rewording.
> > TODO/thoughs:
> > - /proc/pid/net/ currently doesn't show ANYTHING, even "." and "..".
> > This is confusing :)
>
> Ouch. Can't you simply restrict its perms such that this directory
> can't be listed unless you have privs?
Well, yes, but it would touch too many code - currently it is handled as
an entry in a static table. Changing this would touch many high level
loops of the table handling. Hiding its contents is just simplier.
Another solution - create a fake net namespace and process this
namespace if not enough permissions :) It also removes weird netstat
errors like "seems like networking was disabled for this kernel".
> It should act as a normal mode
> 550 directory on a regular filesystem.
What 550 perms would give? /proc/pid/net/ contains all network
information about _all_ network connections in current net namespace.
So, /proc/1/net and /proc/2/net are logically the same directory.
However, changing the mode to 550 _and_ changing uid and gid will help.
> > - need to alter "(inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO))" checks
> > to honour non-pid directories.
>
> I see this in the patch, but can't really comment without reviewing the
> code in full context myself.
As Spender commented, this is a weird thing to recognize pids from other
files, which is needed due to some weird caching :)
> > - what if one keeps open /proc/PID/ while executing set*id/capable
> > binary?
>
> Then they deliberately grant this privilege to this process (and maybe
> to its heirs). I see no problem with that.
Ehh... I mean another thing:
Process A with UID=1000 opens /proc/123/, while 123 has UID=1000.
123 exec's setuid binary, /proc/123/ becomes unaccessible to A.
However, A still keeps the directory opened and may read its contents.
> > + if (pid->hide_net &&
> > + (!capable(CAP_NET_ADMIN) && !in_group_p(pid->pid_gid)))
>
> capable() sets a flag when it makes use of the capability (or at least
> it used to), visible via pacct. So, unless anything has changed in this
> area, it is best to check capable() last, such that it's only reached
> when it actually makes a difference. Thus, I'd write:
>
> if (pid->hide_net && !in_group_p(pid->pid_gid) &&
> !capable(CAP_NET_ADMIN))
OK.
> Also, what did you mean by the extra braces? Just separating the
> setting check from the permissions check for readability?
Initially I wrote some ||, so the braces were needed. Will remove.
Thanks you for the comments, will fix and repost,
--
Vasiliy
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs mount options
2011-06-05 19:40 ` Solar Designer
@ 2011-06-05 19:53 ` Vasiliy Kulikov
0 siblings, 0 replies; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-06-05 19:53 UTC (permalink / raw)
To: kernel-hardening
[-- Attachment #1: Type: text/plain, Size: 944 bytes --]
Solar,
On Sun, Jun 05, 2011 at 23:40 +0400, Solar Designer wrote:
> Since recent versions of OpenVZ build upon the namespaces code that has
> been upstream'ed, I guess this will rely on upstream's namespaces code
> (once we move to RHEL6'ish OpenVZ kernels and beyond), correct?
Namespaces - yes. But AFAIK, only a pair of sysctls is accessible in
containers, IIRC, only net sysctls.
> Now, leaving sysctl's aside and speaking of mount options only for now,
> what happens when a container mounts /proc with umask=007, but then
> another container mounts /proc without that option or with umask=0?
> Does the first container retain its restricted perms, including for
> newly appearing entries under its /proc? If so, where is this different
> setting stored? Is it per mount (preferable)? Is it per pid namespace
> (OK)?
It is per-pid_namespace, it should be fully consistent with OpenVZ.
Thanks,
--
Vasiliy
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options
2011-06-05 19:47 ` Vasiliy Kulikov
@ 2011-06-05 20:10 ` Solar Designer
2011-06-06 18:08 ` Vasiliy Kulikov
2011-06-06 19:20 ` [kernel-hardening] [RFC v1] procfs " Vasiliy Kulikov
0 siblings, 2 replies; 29+ messages in thread
From: Solar Designer @ 2011-06-05 20:10 UTC (permalink / raw)
To: kernel-hardening
On Sun, Jun 05, 2011 at 11:47:46PM +0400, Vasiliy Kulikov wrote:
> On Sun, Jun 05, 2011 at 23:26 +0400, Solar Designer wrote:
> > On Sun, Jun 05, 2011 at 10:24:31PM +0400, Vasiliy Kulikov wrote:
> > > TODO/thoughs:
> > > - /proc/pid/net/ currently doesn't show ANYTHING, even "." and "..".
> > > This is confusing :)
> >
> > Ouch. Can't you simply restrict its perms such that this directory
> > can't be listed unless you have privs?
>
> Well, yes, but it would touch too many code - currently it is handled as
> an entry in a static table. Changing this would touch many high level
> loops of the table handling. Hiding its contents is just simplier.
>
> Another solution - create a fake net namespace and process this
> namespace if not enough permissions :) It also removes weird netstat
> errors like "seems like networking was disabled for this kernel".
This sounds good to me, although I am not familiar with namespaces.
> > It should act as a normal mode 550 directory on a regular filesystem.
>
> What 550 perms would give? /proc/pid/net/ contains all network
> information about _all_ network connections in current net namespace.
> So, /proc/1/net and /proc/2/net are logically the same directory.
>
> However, changing the mode to 550 _and_ changing uid and gid will help.
I implied that the gid would be the one set by our gid option.
> > > - what if one keeps open /proc/PID/ while executing set*id/capable
> > > binary?
> >
> > Then they deliberately grant this privilege to this process (and maybe
> > to its heirs). I see no problem with that.
>
> Ehh... I mean another thing:
>
> Process A with UID=1000 opens /proc/123/, while 123 has UID=1000.
>
> 123 exec's setuid binary, /proc/123/ becomes unaccessible to A.
>
> However, A still keeps the directory opened and may read its contents.
Oh, this is a valid concern. Please research this. Perhaps there
should be a may-ptrace check (or maybe more than one).
As it relates to privacy, there's little need for this, because the user
started those processes on their own (in fact, this is a reason why the
user may want to be able to see those processes, despite of procfs perms).
However, if our concern is that the processes may leak sensitive data
the user is not supposed to see, then we need to restrict this.
Overall, it makes sense either to relax things such that /proc/PID have
real UID rather than effective UID as their owner (so we make such
access to almost-own processes legitimate) or to have the proper
may-ptrace checks in place (such that permissions are actually enforced).
Maybe this should be another mount option. But this feels like too much
complexity for an obscure option that few people will understand and use.
Thanks,
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options
2011-06-05 20:10 ` Solar Designer
@ 2011-06-06 18:08 ` Vasiliy Kulikov
2011-06-06 18:33 ` Solar Designer
2011-06-06 19:20 ` [kernel-hardening] [RFC v1] procfs " Vasiliy Kulikov
1 sibling, 1 reply; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-06-06 18:08 UTC (permalink / raw)
To: kernel-hardening
[-- Attachment #1: Type: text/plain, Size: 797 bytes --]
On Mon, Jun 06, 2011 at 00:10 +0400, Solar Designer wrote:
> > Process A with UID=1000 opens /proc/123/, while 123 has UID=1000.
> >
> > 123 exec's setuid binary, /proc/123/ becomes unaccessible to A.
> >
> > However, A still keeps the directory opened and may read its contents.
>
> Oh, this is a valid concern. Please research this. Perhaps there
> should be a may-ptrace check (or maybe more than one).
This is similar to CVE-2011-1020:
https://lkml.org/lkml/2011/2/7/368
http://seclists.org/fulldisclosure/2011/Jan/421
The proposed solution for separate procfs files is implementing
additional runtime checks (besides POSIX perms), however, it probably
doesn't scale for the whole PID directory.
Will try to invent some simple way to deal with it.
--
Vasiliy
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options
2011-06-06 18:08 ` Vasiliy Kulikov
@ 2011-06-06 18:33 ` Solar Designer
2011-06-08 17:23 ` [kernel-hardening] [RFC v2] " Vasiliy Kulikov
0 siblings, 1 reply; 29+ messages in thread
From: Solar Designer @ 2011-06-06 18:33 UTC (permalink / raw)
To: kernel-hardening
On Mon, Jun 06, 2011 at 10:08:06PM +0400, Vasiliy Kulikov wrote:
> On Mon, Jun 06, 2011 at 00:10 +0400, Solar Designer wrote:
> > > Process A with UID=1000 opens /proc/123/, while 123 has UID=1000.
> > >
> > > 123 exec's setuid binary, /proc/123/ becomes unaccessible to A.
> > >
> > > However, A still keeps the directory opened and may read its contents.
> >
> > Oh, this is a valid concern. Please research this. Perhaps there
> > should be a may-ptrace check (or maybe more than one).
>
> This is similar to CVE-2011-1020:
>
> https://lkml.org/lkml/2011/2/7/368
> http://seclists.org/fulldisclosure/2011/Jan/421
>
> The proposed solution for separate procfs files is implementing
> additional runtime checks (besides POSIX perms),
Right. IIRC, the approach with selectively adding may-ptrace checks to
sensitive proc files dates back to Linux 2.0. Apparently, some newer
entries such as auxv were not covered by it (there's no /proc/PID/auxv
on 2.4).
> however, it probably doesn't scale for the whole PID directory.
>
> Will try to invent some simple way to deal with it.
Maybe have a may-ptrace check automatically applied to all entries that
are not world-readable?
...Here's a trickier attack: open a sensitive /proc/PID/something file
on fd 0, invoke a SUID/SGID/fscap'ed program. Depending on that
program's functionality, it might read its stdin and it might report
what it read from there (e.g., quote it in an error message complaining
about invalid input data). I don't recall if this has been dealt with
in any way or not.
While these are good to research and fix, I am concerned about the scope
creep. We might prefer to get a basic procfs mount options patch in
first, then propose these enhancements.
Thanks,
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v1] procfs mount options
2011-06-05 20:10 ` Solar Designer
2011-06-06 18:08 ` Vasiliy Kulikov
@ 2011-06-06 19:20 ` Vasiliy Kulikov
1 sibling, 0 replies; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-06-06 19:20 UTC (permalink / raw)
To: kernel-hardening
[-- Attachment #1: Type: text/plain, Size: 1361 bytes --]
On Mon, Jun 06, 2011 at 00:10 +0400, Solar Designer wrote:
> On Sun, Jun 05, 2011 at 11:47:46PM +0400, Vasiliy Kulikov wrote:
> > On Sun, Jun 05, 2011 at 23:26 +0400, Solar Designer wrote:
> > > On Sun, Jun 05, 2011 at 10:24:31PM +0400, Vasiliy Kulikov wrote:
> > > > TODO/thoughs:
> > > > - /proc/pid/net/ currently doesn't show ANYTHING, even "." and "..".
> > > > This is confusing :)
> > >
> > > Ouch. Can't you simply restrict its perms such that this directory
> > > can't be listed unless you have privs?
...
> > Another solution - create a fake net namespace and process this
> > namespace if not enough permissions :) It also removes weird netstat
> > errors like "seems like networking was disabled for this kernel".
A fake net namespace works perfect:
$ LANG=C netstat -nlp4
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
No warning from netstat. I remember brctl didn't properly handle
missing sysfs files, so fake files make sense.
Will repost the patch after I'm sure that changing hidepid works well
with inode caching (I see a bug in my current implementation).
Thanks,
--
Vasiliy
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* [kernel-hardening] [RFC v2] procfs mount options
2011-06-06 18:33 ` Solar Designer
@ 2011-06-08 17:23 ` Vasiliy Kulikov
2011-06-08 17:43 ` Vasiliy Kulikov
2011-06-12 2:39 ` Solar Designer
0 siblings, 2 replies; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-06-08 17:23 UTC (permalink / raw)
To: kernel-hardening
[-- Attachment #1: Type: text/plain, Size: 16085 bytes --]
In this version I've completely discarded grsecurity way to chech
permissions. Implementing customized inode_operations->permission() is
MUCH simplier. The problem with old way, changing gid, is not
backward-compatible - procfs files have tricky owners depending on their
permissions, and some programs depend on perms AND uid/gid. So, in this
version I don't actually change gid, but match with this gid in runtime
(proc_pid_permission()).
Additionally it solves the problem with suid binaries, but only for
operations that use permissions checks like readdir(), open(),
but not read(), write(), etc. *stat*() are denied too. The generic
problem is not solved, though.
This patch is as much compatible with the current behaviour as possible.
It leaves permissions and changes runtime behaviour only. Nothing
should be broken.
The only problem with the patch is using net namespaces. If
CONFIG_NET_NS=n then fake_net is created, but it is blank. So, in this
case /proc/PID/net doesn't contain common files and netstat and other
programs might spit with warnings.
I think this version of the patch is ready for LKML review.
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9d096e8..78fbcdc 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -568,8 +568,38 @@ static int proc_setattr(struct dentry *dentry, struct iattr *attr)
return 0;
}
+static int proc_pid_permission(struct inode *inode, int mask, unsigned int flags)
+{
+ struct pid_namespace *pid = inode->i_sb->s_fs_info;
+ struct task_struct *task = get_proc_task(inode);
+
+ if (pid->hide_pid &&
+ !ptrace_may_access(task, PTRACE_MODE_READ) &&
+ !in_group_p(pid->pid_gid)) {
+ if (pid->hide_pid == 2)
+ return -ENOENT;
+ else
+ return -EPERM;
+ }
+ return generic_permission(inode, mask, flags, NULL);
+}
+
+/*
+ * May current process learn task's euid/egid?
+ */
+static bool proc_pid_may_getattr(struct pid_namespace* pid, struct task_struct *task)
+{
+ if (pid->hide_pid < 2)
+ return true;
+ if (ptrace_may_access(task, PTRACE_MODE_READ))
+ return true;
+ return in_group_p(pid->pid_gid);
+}
+
+
static const struct inode_operations proc_def_inode_operations = {
.setattr = proc_setattr,
+ .permission = proc_pid_permission,
};
static int mounts_open_common(struct inode *inode, struct file *file,
@@ -1662,6 +1692,7 @@ static const struct inode_operations proc_pid_link_inode_operations = {
.readlink = proc_pid_readlink,
.follow_link = proc_pid_follow_link,
.setattr = proc_setattr,
+ .permission = proc_pid_permission,
};
@@ -1730,6 +1761,7 @@ static int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat
struct inode *inode = dentry->d_inode;
struct task_struct *task;
const struct cred *cred;
+ struct pid_namespace *pid = dentry->d_sb->s_fs_info;
generic_fillattr(inode, stat);
@@ -1738,6 +1770,14 @@ static int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat
stat->gid = 0;
task = pid_task(proc_pid(inode), PIDTYPE_PID);
if (task) {
+ if (!proc_pid_may_getattr(pid, task)) {
+ rcu_read_unlock();
+ /*
+ * This doesn't prevent learning whether PID exists,
+ * it only makes getattr() consistent with readdir().
+ */
+ return -ENOENT;
+ }
if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) ||
task_dumpable(task)) {
cred = __task_cred(task);
@@ -2184,6 +2224,7 @@ static const struct inode_operations proc_fd_inode_operations = {
.lookup = proc_lookupfd,
.permission = proc_fd_permission,
.setattr = proc_setattr,
+ .permission = proc_pid_permission,
};
static struct dentry *proc_fdinfo_instantiate(struct inode *dir,
@@ -2236,6 +2277,7 @@ static const struct file_operations proc_fdinfo_operations = {
static const struct inode_operations proc_fdinfo_inode_operations = {
.lookup = proc_lookupfdinfo,
.setattr = proc_setattr,
+ .permission = proc_pid_permission,
};
@@ -2473,6 +2515,7 @@ static const struct inode_operations proc_attr_dir_inode_operations = {
.lookup = proc_attr_dir_lookup,
.getattr = pid_getattr,
.setattr = proc_setattr,
+ .permission = proc_pid_permission,
};
#endif
@@ -2890,6 +2933,7 @@ static const struct inode_operations proc_tgid_base_inode_operations = {
.lookup = proc_tgid_base_lookup,
.getattr = pid_getattr,
.setattr = proc_setattr,
+ .permission = proc_pid_permission,
};
static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
@@ -3093,6 +3137,11 @@ static int proc_pid_fill_cache(struct file *filp, void *dirent, filldir_t filldi
proc_pid_instantiate, iter.task, NULL);
}
+static int fake_filldir(void *buf, const char *name, int namelen, loff_t offset, u64 ino, unsigned d_type)
+{
+ return 0;
+}
+
/* for the /proc/ directory itself, after non-process stuff has been done */
int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
{
@@ -3100,6 +3149,7 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
struct task_struct *reaper = get_proc_task(filp->f_path.dentry->d_inode);
struct tgid_iter iter;
struct pid_namespace *ns;
+ filldir_t __filldir;
if (!reaper)
goto out_no_task;
@@ -3116,8 +3166,13 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
for (iter = next_tgid(ns, iter);
iter.task;
iter.tgid += 1, iter = next_tgid(ns, iter)) {
+ if (proc_pid_may_getattr(ns, iter.task))
+ __filldir = filldir;
+ else
+ __filldir = fake_filldir;
+
filp->f_pos = iter.tgid + TGID_OFFSET;
- if (proc_pid_fill_cache(filp, dirent, filldir, iter) < 0) {
+ if (proc_pid_fill_cache(filp, dirent, __filldir, iter) < 0) {
put_task_struct(iter.task);
goto out;
}
@@ -3223,6 +3278,7 @@ static const struct inode_operations proc_tid_base_inode_operations = {
.lookup = proc_tid_base_lookup,
.getattr = pid_getattr,
.setattr = proc_setattr,
+ .permission = proc_pid_permission,
};
static struct dentry *proc_task_instantiate(struct inode *dir,
@@ -3448,6 +3504,7 @@ static const struct inode_operations proc_task_inode_operations = {
.lookup = proc_task_lookup,
.getattr = proc_task_getattr,
.setattr = proc_setattr,
+ .permission = proc_pid_permission,
};
static const struct file_operations proc_task_operations = {
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 176ce4c..e4452bf 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -7,6 +7,7 @@
#include <linux/time.h>
#include <linux/proc_fs.h>
#include <linux/kernel.h>
+#include <linux/pid_namespace.h>
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/stat.h>
@@ -17,7 +18,9 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/sysctl.h>
+#include <linux/seq_file.h>
#include <linux/slab.h>
+#include <linux/mount.h>
#include <asm/system.h>
#include <asm/uaccess.h>
@@ -93,12 +96,29 @@ void __init proc_init_inodecache(void)
init_once);
}
+static int proc_show_options(struct seq_file *seq, struct vfsmount *vfs)
+{
+ struct super_block *sb = vfs->mnt_sb;
+ struct pid_namespace *pid = sb->s_fs_info;
+
+ if (pid->pid_gid)
+ seq_printf(seq, ",gid=%lu", (unsigned long)pid->pid_gid);
+ if (pid->hide_pid != 0)
+ seq_printf(seq, ",hidepid=%u", pid->hide_pid);
+ if (pid->hide_net)
+ seq_printf(seq, ",hidenet");
+
+ return 0;
+}
+
static const struct super_operations proc_sops = {
.alloc_inode = proc_alloc_inode,
.destroy_inode = proc_destroy_inode,
.drop_inode = generic_delete_inode,
.evict_inode = proc_evict_inode,
.statfs = simple_statfs,
+ .remount_fs = proc_remount,
+ .show_options = proc_show_options,
};
static void __pde_users_dec(struct proc_dir_entry *pde)
@@ -423,6 +443,7 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
inode = iget_locked(sb, de->low_ino);
if (!inode)
return NULL;
+
if (inode->i_state & I_NEW) {
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
PROC_I(inode)->fd = 0;
@@ -468,7 +489,7 @@ int proc_fill_super(struct super_block *s)
s->s_magic = PROC_SUPER_MAGIC;
s->s_op = &proc_sops;
s->s_time_gran = 1;
-
+
pde_get(&proc_root);
root_inode = proc_get_inode(s, &proc_root);
if (!root_inode)
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 9ad561d..1cacb6a 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -110,6 +110,7 @@ void pde_put(struct proc_dir_entry *pde);
extern struct vfsmount *proc_mnt;
int proc_fill_super(struct super_block *);
struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
+int proc_remount(struct super_block *sb, int *flags, char *data);
/*
* These are generic /proc routines that use the internal
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 9020ac1..3b4e89e 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -22,10 +22,13 @@
#include <linux/mount.h>
#include <linux/nsproxy.h>
#include <net/net_namespace.h>
+#include <linux/pid_namespace.h>
#include <linux/seq_file.h>
#include "internal.h"
+static struct net *fake_net;
+
static struct net *get_proc_net(const struct inode *inode)
{
@@ -105,6 +108,13 @@ static struct net *get_proc_task_net(struct inode *dir)
struct task_struct *task;
struct nsproxy *ns;
struct net *net = NULL;
+ struct pid_namespace *pid = dir->i_sb->s_fs_info;
+
+ if (pid->hide_net &&
+ (!capable(CAP_NET_ADMIN) && !in_group_p(pid->pid_gid))) {
+ pr_err("return fake_net = %p\n", fake_net);
+ return get_net(fake_net);
+ }
rcu_read_lock();
task = pid_task(proc_pid(dir), PIDTYPE_PID);
@@ -236,6 +246,18 @@ static struct pernet_operations __net_initdata proc_net_ns_ops = {
int __init proc_net_init(void)
{
proc_symlink("net", NULL, "self/net");
-
return register_pernet_subsys(&proc_net_ns_ops);
}
+
+int __init proc_net_initcall(void)
+{
+ fake_net = net_create();
+ pr_err("fake_net = %p\n", fake_net);
+ if (fake_net == NULL)
+ return -ENOMEM;
+
+ get_net(fake_net);
+ return 0;
+}
+
+late_initcall(proc_net_initcall);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index ef9fa8e..269067c 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -18,6 +18,7 @@
#include <linux/bitops.h>
#include <linux/mount.h>
#include <linux/pid_namespace.h>
+#include <linux/parser.h>
#include "internal.h"
@@ -35,6 +36,75 @@ static int proc_set_super(struct super_block *sb, void *data)
return set_anon_super(sb, NULL);
}
+enum {
+ Opt_gid, Opt_hidepid, Opt_hidenet, Opt_nohidenet, Opt_err,
+};
+
+static const match_table_t tokens = {
+ {Opt_hidepid, "hidepid=%u"},
+ {Opt_gid, "gid=%u"},
+ {Opt_hidenet, "hidenet"},
+ {Opt_nohidenet, "nohidenet"},
+ {Opt_err, NULL},
+};
+
+static int proc_parse_options(char *options, struct pid_namespace *pid)
+{
+ char *p;
+ substring_t args[MAX_OPT_ARGS];
+ int option;
+
+ pr_debug("proc: options = %s\n", options);
+
+ if (!options)
+ return 1;
+
+ while ((p = strsep(&options, ",")) != NULL) {
+ int token;
+ if (!*p)
+ continue;
+
+ args[0].to = args[0].from = 0;
+ token = match_token(p, tokens, args);
+ switch (token) {
+ case Opt_gid:
+ if (match_int(&args[0], &option))
+ return 0;
+ pid->pid_gid = option;
+ break;
+ case Opt_hidepid:
+ if (match_int(&args[0], &option))
+ return 0;
+ if (option < 0 || option > 2) {
+ pr_err("proc: hidepid value must be between 0 and 2.\n");
+ return 0;
+ }
+ pid->hide_pid = option;
+ break;
+ case Opt_hidenet:
+ pid->hide_net = true;
+ break;
+ case Opt_nohidenet:
+ pid->hide_net = false;
+ break;
+ default:
+ pr_err("proc: unrecognized mount option \"%s\" "
+ "or missing value", p);
+ return 0;
+ }
+ }
+
+ pr_debug("proc: gid = %u, hidepid = %o, hidenet = %d\n", pid->pid_gid, pid->hide_pid, (int)pid->hide_net);
+
+ return 1;
+}
+
+int proc_remount(struct super_block *sb, int *flags, char *data)
+{
+ struct pid_namespace *pid = sb->s_fs_info;
+ return !proc_parse_options(data, pid);
+}
+
static struct dentry *proc_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
@@ -42,6 +112,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
struct super_block *sb;
struct pid_namespace *ns;
struct proc_inode *ei;
+ char *options;
if (proc_mnt) {
/* Seed the root directory with a pid so it doesn't need
@@ -54,10 +125,13 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
ei->pid = find_get_pid(1);
}
- if (flags & MS_KERNMOUNT)
+ if (flags & MS_KERNMOUNT) {
ns = (struct pid_namespace *)data;
- else
+ options = NULL;
+ } else {
ns = current->nsproxy->pid_ns;
+ options = data;
+ }
sb = sget(fs_type, proc_test_super, proc_set_super, ns);
if (IS_ERR(sb))
@@ -65,6 +139,10 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
if (!sb->s_root) {
sb->s_flags = flags;
+ if (!proc_parse_options(options, ns)) {
+ deactivate_locked_super(sb);
+ return ERR_PTR(-EINVAL);
+ }
err = proc_fill_super(sb);
if (err) {
deactivate_locked_super(sb);
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 38d1032..1c33094 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -30,6 +30,9 @@ struct pid_namespace {
#ifdef CONFIG_BSD_PROCESS_ACCT
struct bsd_acct_struct *bacct;
#endif
+ gid_t pid_gid;
+ int hide_pid;
+ bool hide_net;
};
extern struct pid_namespace init_pid_ns;
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 1bf812b..d40c61c 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -113,6 +113,8 @@ static inline struct net *copy_net_ns(unsigned long flags, struct net *net_ns)
}
#endif /* CONFIG_NET */
+extern struct net *net_create(void);
+
extern struct list_head net_namespace_list;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 3f86026..51c3442 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -177,6 +177,7 @@ static struct net_generic *net_alloc_generic(void)
return ng;
}
+
#ifdef CONFIG_NET_NS
static struct kmem_cache *net_cachep;
static struct workqueue_struct *netns_wq;
@@ -216,7 +217,7 @@ static void net_free(struct net *net)
kmem_cache_free(net_cachep, net);
}
-static struct net *net_create(void)
+struct net *net_create(void)
{
struct net *net;
int rv;
@@ -315,6 +316,64 @@ void __put_net(struct net *net)
EXPORT_SYMBOL_GPL(__put_net);
#else
+static struct net *net_alloc(void)
+{
+ struct net *net = NULL;
+ struct net_generic *ng;
+
+ ng = net_alloc_generic();
+ if (!ng)
+ goto out;
+
+ net = kzalloc(sizeof(*net), GFP_KERNEL);
+ if (!net)
+ goto out_free;
+
+ rcu_assign_pointer(net->gen, ng);
+out:
+ return net;
+
+out_free:
+ kfree(ng);
+ goto out;
+}
+
+static void net_free(struct net *net)
+{
+#ifdef NETNS_REFCNT_DEBUG
+ if (unlikely(atomic_read(&net->use_count) != 0)) {
+ printk(KERN_EMERG "network namespace not free! Usage: %d\n",
+ atomic_read(&net->use_count));
+ return;
+ }
+#endif
+ kfree(net->gen);
+ kfree(net);
+}
+
+struct net *net_create(void)
+{
+ struct net *net;
+ int rv;
+
+ net = net_alloc();
+ if (!net)
+ return ERR_PTR(-ENOMEM);
+ mutex_lock(&net_mutex);
+ rv = setup_net(net);
+ if (rv == 0) {
+ rtnl_lock();
+ list_add_tail_rcu(&net->list, &net_namespace_list);
+ rtnl_unlock();
+ }
+ mutex_unlock(&net_mutex);
+ if (rv < 0) {
+ net_free(net);
+ return ERR_PTR(rv);
+ }
+ return net;
+}
+
struct net *copy_net_ns(unsigned long flags, struct net *old_net)
{
if (flags & CLONE_NEWNET)
--
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v2] procfs mount options
2011-06-08 17:23 ` [kernel-hardening] [RFC v2] " Vasiliy Kulikov
@ 2011-06-08 17:43 ` Vasiliy Kulikov
2011-06-12 2:39 ` Solar Designer
1 sibling, 0 replies; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-06-08 17:43 UTC (permalink / raw)
To: kernel-hardening
[-- Attachment #1: Type: text/plain, Size: 302 bytes --]
On Wed, Jun 08, 2011 at 21:23 +0400, Vasiliy Kulikov wrote:
> I think this version of the patch is ready for LKML review.
I mean the patch is OK except net part, that's why LKML is needed now :)
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v2] procfs mount options
2011-06-08 17:23 ` [kernel-hardening] [RFC v2] " Vasiliy Kulikov
2011-06-08 17:43 ` Vasiliy Kulikov
@ 2011-06-12 2:39 ` Solar Designer
2011-07-24 18:55 ` Vasiliy Kulikov
[not found] ` <20110724185036.GC3510@albatros>
1 sibling, 2 replies; 29+ messages in thread
From: Solar Designer @ 2011-06-12 2:39 UTC (permalink / raw)
To: kernel-hardening
Vasiliy,
On Wed, Jun 08, 2011 at 09:23:08PM +0400, Vasiliy Kulikov wrote:
> I think this version of the patch is ready for LKML review.
OK, post it - and CC kernel-hardening on your posting, as planned.
(We'll see if we can afford the traffic, or if we need to split this
list in two...)
Unfortunately, I don't have time to review this more closely (in
context), and no one else in here provided any comments. ;-(
> + if (pid->hide_net &&
> + (!capable(CAP_NET_ADMIN) && !in_group_p(pid->pid_gid))) {
As discussed, capable() should be the very last check.
Thanks,
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v2] procfs mount options
2011-06-12 2:39 ` Solar Designer
@ 2011-07-24 18:55 ` Vasiliy Kulikov
[not found] ` <20110724185036.GC3510@albatros>
1 sibling, 0 replies; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-07-24 18:55 UTC (permalink / raw)
To: kernel-hardening
Hi,
A small follow-up of my current work.
I'm implementing the scheme I've proposed on LKML. Instead of on/off
semantics a mask of allowed files is passed as a mount option. Like
this:
mount -o tgid_allowed=.;cmdline;mounts;smaps /proc
All files not enumerated in tgid_allowed= will be restricted to
processes being able to ptrace the task. Files enumerated in
tgid_allowed= are not additionally restricted and classic POSIX
permission checking rules are applied here. So, it's impossible to
_relax_ procfs permissions with my scheme.
";" is used instead of "," as a comma is already used as options
separator.
Besides tgid_allowed= there will be attr_allowed= and tid_allowed= to
enumerated files in appropriate subdirectories.
As a "side effect" it will effectively fix all holes of poor permission
checking of setuid binaries (including check and execve() race) without
massive code refactoring. (All future procfs files are already
protected too ;-))
Currently I have a working implementation of per-files restriction, but
without mount option parsing. I'll post the code when I have a working
implementation of at least parsing and handling tgid_allowed=XXX. Also
there is a (minor, I hope) issue with /proc/PID/maps handling, where
the ptrace check is too restrictive.
TODO: net restriction, privileged monitoring gid.
Thanks,
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] [RFC v2] procfs mount options
[not found] ` <20110724185036.GC3510@albatros>
@ 2011-07-26 14:50 ` Vasiliy Kulikov
2011-07-29 17:47 ` [kernel-hardening] procfs {tid,tgid,attr}_allowed " Vasiliy Kulikov
0 siblings, 1 reply; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-07-26 14:50 UTC (permalink / raw)
To: kernel-hardening
Hi,
This is a demonstration of procfs feature I'm writing now. It is not
yet fully ready (need more clean code and need to divide the patch into
series), however, it's ready for some demonstration:
# mount -o remount,tgid_allowed=none,tid_allowed=none,attr_allowed=none /proc
root@albatros:~# ls -ld /proc/1
dr-x------ 8 root root 0 2011-07-26 17:57 /proc/1
# mount -o remount,tgid_allowed=stat /proc
# ls -ld /proc/1
dr-xr-xr-x 8 root root 0 2011-07-26 18:03 /proc/1
# ls -ld /proc/1/{stat,status,sched}
-rw------- 1 root root 0 2011-07-26 18:03 /proc/1/sched
-r--r--r-- 1 root root 0 2011-07-26 18:03 /proc/1/stat
-r-------- 1 root root 0 2011-07-26 18:03 /proc/1/status
# mount -o remount,tgid_allowed=all /proc
# ls -ld /proc/1/{stat,status,sched}
-rw-r--r-- 1 root root 0 2011-07-26 18:03 /proc/1/sched
-r--r--r-- 1 root root 0 2011-07-26 18:03 /proc/1/stat
-r--r--r-- 1 root root 0 2011-07-26 18:03 /proc/1/status
# ls -ld /proc/1/task/1/
dr-xr-xr-x 6 root root 0 2011-07-26 18:03 /proc/1/task/1/
# ls -ld /proc/1/task/1/{stat,status,sched}
-rw------- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/sched
-r-------- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/stat
-r-------- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/status
# mount -o remount,tid_allowed=sched /proc
# ls -ld /proc/1/task/1/{stat,status,sched}
-rw-r--r-- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/sched
-r-------- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/stat
-r-------- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/status
# mount -o remount,tid_allowed=all /proc
# ls -ld /proc/1/task/1/{stat,status,sched}
-rw-r--r-- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/sched
-r--r--r-- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/stat
-r--r--r-- 1 root root 0 2011-07-26 18:04 /proc/1/task/1/status
# ls -l /proc/1/attr/
итого 0
-rw------- 1 root root 0 2011-07-26 18:04 current
-rw------- 1 root root 0 2011-07-26 18:04 exec
-rw------- 1 root root 0 2011-07-26 18:04 fscreate
-rw------- 1 root root 0 2011-07-26 18:04 keycreate
-r-------- 1 root root 0 2011-07-26 18:04 prev
-rw------- 1 root root 0 2011-07-26 18:04 sockcreate
# mount -o remount,attr_allowed=exec\;sockcreate /proc
# ls -l /proc/1/attr/
итого 0
-rw------- 1 root root 0 2011-07-26 18:05 current
-rw-rw-rw- 1 root root 0 2011-07-26 18:05 exec
-rw------- 1 root root 0 2011-07-26 18:05 fscreate
-rw------- 1 root root 0 2011-07-26 18:05 keycreate
-r-------- 1 root root 0 2011-07-26 18:05 prev
-rw-rw-rw- 1 root root 0 2011-07-26 18:05 sockcreate
# mount -o remount,attr_allowed=all /proc
# ls -l /proc/1/attr/
итого 0
-rw-rw-rw- 1 root root 0 2011-07-26 18:05 current
-rw-rw-rw- 1 root root 0 2011-07-26 18:05 exec
-rw-rw-rw- 1 root root 0 2011-07-26 18:05 fscreate
-rw-rw-rw- 1 root root 0 2011-07-26 18:05 keycreate
-r--r--r-- 1 root root 0 2011-07-26 18:05 prev
-rw-rw-rw- 1 root root 0 2011-07-26 18:05 sockcreate
# mount -o remount,attr_allowed=none /proc
# ls -l /proc/1/attr/
итого 0
-rw------- 1 root root 0 2011-07-26 18:08 current
-rw------- 1 root root 0 2011-07-26 18:08 exec
-rw------- 1 root root 0 2011-07-26 18:08 fscreate
-rw------- 1 root root 0 2011-07-26 18:08 keycreate
-r-------- 1 root root 0 2011-07-26 18:08 prev
-rw------- 1 root root 0 2011-07-26 18:08 sockcreate
As you see, there are different sets for /proc/PID, /proc/PID/task/TID/,
/proc/PID/attr. I've slightly changed the semantics and removed ".", it
is implicitly enabled iff something inside is enabled. To deny "."
XXX=none should be passed.
Questions/comments:
1) it seems to me "\;" is hard to pass, maybe ":" is a better separator?
2) almost all tid files are copies of tgid files. coredump_filter,
mountstats, net are missing in task/. So, maybe fully remove
tid_allowed= and copy the tgid files' permissions to the appropriate tid
files?
3) currently the implementation is somewhat slow as every file is
wrapped into additional file operations handlers. It cannot be removed
for "fast" cases as all permissions of all files should be dynamically
recalculated for every access. Probably I could make it faster a bit.
It is the most significant drawback of the patch :-(
Thanks,
--
Vasiliy Kulikov
^ permalink raw reply [flat|nested] 29+ messages in thread
* [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options
2011-07-26 14:50 ` Vasiliy Kulikov
@ 2011-07-29 17:47 ` Vasiliy Kulikov
2011-08-04 11:23 ` [kernel-hardening] " Vasiliy Kulikov
0 siblings, 1 reply; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-07-29 17:47 UTC (permalink / raw)
To: kernel-hardening
Hi,
On Tue, Jul 26, 2011 at 18:50 +0400, Vasiliy Kulikov wrote:
> This is a demonstration of procfs feature I'm writing now. It is not
> yet fully ready (need more clean code and need to divide the patch into
> series), however, it's ready for some demonstration:
Now the patch seems rather stable, more or less clean. It lacks some
testing, after that I'll send it as RFC to LKML.
BTW, it fully solves the problem of procfs files kept across execve()
(CVE-2011-1020).
---
fs/proc/Makefile | 2 +-
fs/proc/base.c | 253 ++++++++++++++++-----------------
fs/proc/base_perms.c | 314 +++++++++++++++++++++++++++++++++++++++++
fs/proc/inode.c | 18 +++
fs/proc/internal.h | 24 +++
fs/proc/root.c | 121 ++++++++++++++++-
fs/proc/task_nommu.c | 2 +-
include/linux/pid_namespace.h | 13 ++
include/linux/proc_fs.h | 12 ++
kernel/pid.c | 2 +
kernel/pid_namespace.c | 10 ++-
kernel/sysctl.c | 1 +
12 files changed, 635 insertions(+), 137 deletions(-)
--
diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index c1c7293..81020f9 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -8,7 +8,7 @@ proc-y := nommu.o task_nommu.o
proc-$(CONFIG_MMU) := mmu.o task_mmu.o
proc-y += inode.o root.o base.o generic.o array.o \
- proc_tty.o
+ proc_tty.o base_perms.o
proc-y += cmdline.o
proc-y += consoles.o
proc-y += cpuinfo.o
diff --git a/fs/proc/base.c b/fs/proc/base.c
index fc5bc27..0af542e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -98,40 +98,43 @@
* in /proc for a task before it execs a suid executable.
*/
-struct pid_entry {
- char *name;
- int len;
- mode_t mode;
- const struct inode_operations *iop;
- const struct file_operations *fop;
- union proc_op op;
-};
-
-#define NOD(NAME, MODE, IOP, FOP, OP) { \
+#define NOD(NAME, MODE, IOP, FOP, OP, PERMS) { \
.name = (NAME), \
.len = sizeof(NAME) - 1, \
.mode = MODE, \
.iop = IOP, \
.fop = FOP, \
.op = OP, \
+ .need_perms_check = PERMS, \
}
+/*
+ * XX_PERMS() are files without any ptrace() check.
+ * However, they have updated uid/gid and permissions on each file operation.
+ */
#define DIR(NAME, MODE, iops, fops) \
- NOD(NAME, (S_IFDIR|(MODE)), &iops, &fops, {} )
+ NOD(NAME, (S_IFDIR|(MODE)), &iops, &fops, {}, true)
#define LNK(NAME, get_link) \
NOD(NAME, (S_IFLNK|S_IRWXUGO), \
&proc_pid_link_inode_operations, NULL, \
- { .proc_get_link = get_link } )
+ { .proc_get_link = get_link }, false)
#define REG(NAME, MODE, fops) \
- NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {})
+ NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}, true)
+#define REG_PERMS(NAME, MODE, fops) \
+ NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}, false)
#define INF(NAME, MODE, read) \
NOD(NAME, (S_IFREG|(MODE)), \
NULL, &proc_info_file_operations, \
- { .proc_read = read } )
+ { .proc_read = read }, true )
+#define INF_PERMS(NAME, MODE, read) \
+ NOD(NAME, (S_IFREG|(MODE)), \
+ NULL, &proc_info_file_operations, \
+ { .proc_read = read }, false )
#define ONE(NAME, MODE, show) \
NOD(NAME, (S_IFREG|(MODE)), \
NULL, &proc_single_file_operations, \
- { .proc_show = show } )
+ { .proc_show = show }, true )
+
/*
* Count the number of hardlinks for the pid_entry table, excluding the .
@@ -229,35 +232,12 @@ static struct mm_struct *__check_mem_permission(struct task_struct *task)
return ERR_PTR(-EPERM);
}
-/*
- * If current may access user memory in @task return a reference to the
- * corresponding mm, otherwise ERR_PTR.
- */
-static struct mm_struct *check_mem_permission(struct task_struct *task)
-{
- struct mm_struct *mm;
- int err;
-
- /*
- * Avoid racing if task exec's as we might get a new mm but validate
- * against old credentials.
- */
- err = mutex_lock_killable(&task->signal->cred_guard_mutex);
- if (err)
- return ERR_PTR(err);
-
- mm = __check_mem_permission(task);
- mutex_unlock(&task->signal->cred_guard_mutex);
-
- return mm;
-}
-
struct mm_struct *mm_for_maps(struct task_struct *task)
{
struct mm_struct *mm;
int err;
- err = mutex_lock_killable(&task->signal->cred_guard_mutex);
+ err = mutex_lock_killable(&task->signal->cred_guard_mutex);
if (err)
return ERR_PTR(err);
@@ -327,7 +307,6 @@ static int proc_pid_auxv(struct task_struct *task, char *buffer)
return res;
}
-
#ifdef CONFIG_KALLSYMS
/*
* Provides a wchan file via kallsyms in a proper one-value-per-file format.
@@ -350,23 +329,6 @@ static int proc_pid_wchan(struct task_struct *task, char *buffer)
}
#endif /* CONFIG_KALLSYMS */
-static int lock_trace(struct task_struct *task)
-{
- int err = mutex_lock_killable(&task->signal->cred_guard_mutex);
- if (err)
- return err;
- if (!ptrace_may_access(task, PTRACE_MODE_ATTACH)) {
- mutex_unlock(&task->signal->cred_guard_mutex);
- return -EPERM;
- }
- return 0;
-}
-
-static void unlock_trace(struct task_struct *task)
-{
- mutex_unlock(&task->signal->cred_guard_mutex);
-}
-
#ifdef CONFIG_STACKTRACE
#define MAX_STACK_TRACE_DEPTH 64
@@ -388,16 +350,13 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
trace.entries = entries;
trace.skip = 0;
- err = lock_trace(task);
- if (!err) {
- save_stack_trace_tsk(task, &trace);
+ save_stack_trace_tsk(task, &trace);
- for (i = 0; i < trace.nr_entries; i++) {
- seq_printf(m, "[<%pK>] %pS\n",
- (void *)entries[i], (void *)entries[i]);
- }
- unlock_trace(task);
+ for (i = 0; i < trace.nr_entries; i++) {
+ seq_printf(m, "[<%pK>] %pS\n",
+ (void *)entries[i], (void *)entries[i]);
}
+
kfree(entries);
return err;
@@ -563,9 +522,7 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer)
{
long nr;
unsigned long args[6], sp, pc;
- int res = lock_trace(task);
- if (res)
- return res;
+ int res;
if (task_current_syscall(task, &nr, args, 6, &sp, &pc))
res = sprintf(buffer, "running\n");
@@ -577,7 +534,7 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer)
nr,
args[0], args[1], args[2], args[3], args[4], args[5],
sp, pc);
- unlock_trace(task);
+
return res;
}
#endif /* CONFIG_HAVE_ARCH_TRACEHOOK */
@@ -586,23 +543,6 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer)
/* Here the fs part begins */
/************************************************************************/
-/* permission checks */
-static int proc_fd_access_allowed(struct inode *inode)
-{
- struct task_struct *task;
- int allowed = 0;
- /* Allow access to a task's file descriptors if it is us or we
- * may use ptrace attach to the process and find out that
- * information.
- */
- task = get_proc_task(inode);
- if (task) {
- allowed = ptrace_may_access(task, PTRACE_MODE_READ);
- put_task_struct(task);
- }
- return allowed;
-}
-
int proc_setattr(struct dentry *dentry, struct iattr *attr)
{
int error;
@@ -839,7 +779,7 @@ static ssize_t mem_read(struct file * file, char __user * buf,
if (!page)
goto out;
- mm = check_mem_permission(task);
+ mm = __check_mem_permission(task);
ret = PTR_ERR(mm);
if (IS_ERR(mm))
goto out_free;
@@ -902,7 +842,7 @@ static ssize_t mem_write(struct file * file, const char __user *buf,
if (!page)
goto out_task;
- mm = check_mem_permission(task);
+ mm = __check_mem_permission(task);
copied = PTR_ERR(mm);
if (IS_ERR(mm))
goto out_free;
@@ -1613,12 +1553,7 @@ static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
/* We don't need a base pointer in the /proc filesystem */
path_put(&nd->path);
- /* Are we allowed to snoop on the tasks file descriptors? */
- if (!proc_fd_access_allowed(inode))
- goto out;
-
error = PROC_I(inode)->op.proc_get_link(inode, &nd->path);
-out:
return ERR_PTR(error);
}
@@ -1652,9 +1587,6 @@ static int proc_pid_readlink(struct dentry * dentry, char __user * buffer, int b
struct inode *inode = dentry->d_inode;
struct path path;
- /* Are we allowed to snoop on the tasks file descriptors? */
- if (!proc_fd_access_allowed(inode))
- goto out;
error = PROC_I(inode)->op.proc_get_link(inode, &path);
if (error)
@@ -1758,6 +1690,8 @@ int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
/* dentry stuff */
+static void pid_revalidate_perms(struct dentry *dentry);
+
/*
* Exceptional case: normally we are not allowed to unhash a busy
* directory. In this case, however, we can do it - no aliasing problems
@@ -1785,6 +1719,8 @@ int pid_revalidate(struct dentry *dentry, struct nameidata *nd)
inode = dentry->d_inode;
task = get_proc_task(inode);
+ pid_revalidate_perms(dentry);
+
if (task) {
if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) ||
task_dumpable(task)) {
@@ -2200,7 +2136,9 @@ static struct dentry *proc_fdinfo_instantiate(struct inode *dir,
ei = PROC_I(inode);
ei->fd = fd;
inode->i_mode = S_IFREG | S_IRUSR;
- inode->i_fop = &proc_fdinfo_file_operations;
+ ei->real_fops = &proc_fdinfo_file_operations;
+ inode->i_fop = &proc_pid_perms_fops;
+
d_set_d_op(dentry, &tid_fd_dentry_operations);
d_add(dentry, inode);
/* Close the race of the process dying before we return the dentry */
@@ -2238,7 +2176,6 @@ static const struct inode_operations proc_fdinfo_inode_operations = {
.setattr = proc_setattr,
};
-
static struct dentry *proc_pident_instantiate(struct inode *dir,
struct dentry *dentry, struct task_struct *task, const void *ptr)
{
@@ -2255,11 +2192,16 @@ static struct dentry *proc_pident_instantiate(struct inode *dir,
inode->i_mode = p->mode;
if (S_ISDIR(inode->i_mode))
inode->i_nlink = 2; /* Use getattr to fix if necessary */
+
if (p->iop)
inode->i_op = p->iop;
if (p->fop)
- inode->i_fop = p->fop;
+ ei->real_fops = p->fop;
ei->op = p->op;
+ inode->i_fop = &proc_pid_perms_fops;
+
+ ei->dirent = p;
+
d_set_d_op(dentry, &pid_dentry_operations);
d_add(dentry, inode);
/* Close the race of the process dying before we return the dentry */
@@ -2417,15 +2359,9 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
if (copy_from_user(page, buf, count))
goto out_free;
- /* Guard against adverse ptrace interaction */
- length = mutex_lock_interruptible(&task->signal->cred_guard_mutex);
- if (length < 0)
- goto out_free;
-
length = security_setprocattr(task,
(char*)file->f_path.dentry->d_name.name,
(void*)page, count);
- mutex_unlock(&task->signal->cred_guard_mutex);
out_free:
free_page((unsigned long) page);
out:
@@ -2469,7 +2405,7 @@ static struct dentry *proc_attr_dir_lookup(struct inode *dir,
attr_dir_stuff, ARRAY_SIZE(attr_dir_stuff));
}
-static const struct inode_operations proc_attr_dir_inode_operations = {
+const struct inode_operations proc_attr_dir_inode_operations = {
.lookup = proc_attr_dir_lookup,
.getattr = pid_getattr,
.setattr = proc_setattr,
@@ -2617,7 +2553,7 @@ static const struct inode_operations proc_self_inode_operations = {
*/
static const struct pid_entry proc_base_stuff[] = {
NOD("self", S_IFLNK|S_IRWXUGO,
- &proc_self_inode_operations, NULL, {}),
+ &proc_self_inode_operations, NULL, {}, false),
};
static struct dentry *proc_base_instantiate(struct inode *dir,
@@ -2653,9 +2589,12 @@ static struct dentry *proc_base_instantiate(struct inode *dir,
inode->i_size = 64;
if (p->iop)
inode->i_op = p->iop;
+
if (p->fop)
- inode->i_fop = p->fop;
+ ei->real_fops = p->fop;
ei->op = p->op;
+ inode->i_fop = &proc_pid_perms_fops;
+
d_add(dentry, inode);
error = NULL;
out:
@@ -2708,9 +2647,6 @@ static int do_io_accounting(struct task_struct *task, char *buffer, int whole)
struct task_io_accounting acct = task->ioac;
unsigned long flags;
- if (!ptrace_may_access(task, PTRACE_MODE_READ))
- return -EACCES;
-
if (whole && lock_task_sighand(task, &flags)) {
struct task_struct *t = task;
@@ -2751,12 +2687,8 @@ static int proc_tgid_io_accounting(struct task_struct *task, char *buffer)
static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
- int err = lock_trace(task);
- if (!err) {
- seq_printf(m, "%08x\n", task->personality);
- unlock_trace(task);
- }
- return err;
+ seq_printf(m, "%08x\n", task->personality);
+ return 0;
}
/*
@@ -2773,8 +2705,8 @@ static const struct pid_entry tgid_base_stuff[] = {
#ifdef CONFIG_NET
DIR("net", S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations),
#endif
- REG("environ", S_IRUSR, proc_environ_operations),
- INF("auxv", S_IRUSR, proc_pid_auxv),
+ REG_PERMS("environ", S_IRUSR, proc_environ_operations),
+ INF_PERMS("auxv", S_IRUSR, proc_pid_auxv),
ONE("status", S_IRUGO, proc_pid_status),
ONE("personality", S_IRUGO, proc_pid_personality),
INF("limits", S_IRUGO, proc_pid_limits),
@@ -2791,9 +2723,9 @@ static const struct pid_entry tgid_base_stuff[] = {
INF("cmdline", S_IRUGO, proc_pid_cmdline),
ONE("stat", S_IRUGO, proc_tgid_stat),
ONE("statm", S_IRUGO, proc_pid_statm),
- REG("maps", S_IRUGO, proc_maps_operations),
+ REG_PERMS("maps", S_IRUGO, proc_maps_operations),
#ifdef CONFIG_NUMA
- REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
+ REG_PERMS("numa_maps", S_IRUGO, proc_numa_maps_operations),
#endif
REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd", proc_cwd_link),
@@ -2804,7 +2736,7 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("mountstats", S_IRUSR, proc_mountstats_operations),
#ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
- REG("smaps", S_IRUGO, proc_smaps_operations),
+ REG_PERMS("smaps", S_IRUGO, proc_smaps_operations),
REG("pagemap", S_IRUGO, proc_pagemap_operations),
#endif
#ifdef CONFIG_SECURITY
@@ -2867,7 +2799,7 @@ static struct dentry *proc_tgid_base_lookup(struct inode *dir, struct dentry *de
tgid_base_stuff, ARRAY_SIZE(tgid_base_stuff));
}
-static const struct inode_operations proc_tgid_base_inode_operations = {
+const struct inode_operations proc_tgid_base_inode_operations = {
.lookup = proc_tgid_base_lookup,
.getattr = pid_getattr,
.setattr = proc_setattr,
@@ -2972,9 +2904,10 @@ static struct dentry *proc_pid_instantiate(struct inode *dir,
if (!inode)
goto out;
- inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO;
+ inode->i_mode = S_IFDIR | S_IRUGO | S_IXUGO;
inode->i_op = &proc_tgid_base_inode_operations;
- inode->i_fop = &proc_tgid_base_operations;
+ inode->i_fop = &proc_pid_perms_fops;
+ PROC_I(inode)->real_fops = &proc_tgid_base_operations;
inode->i_flags|=S_IMMUTABLE;
inode->i_nlink = 2 + pid_entry_count_dirs(tgid_base_stuff,
@@ -3122,8 +3055,8 @@ static const struct pid_entry tid_base_stuff[] = {
DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations),
DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations),
DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations),
- REG("environ", S_IRUSR, proc_environ_operations),
- INF("auxv", S_IRUSR, proc_pid_auxv),
+ REG_PERMS("environ", S_IRUSR, proc_environ_operations),
+ INF_PERMS("auxv", S_IRUSR, proc_pid_auxv),
ONE("status", S_IRUGO, proc_pid_status),
ONE("personality", S_IRUGO, proc_pid_personality),
INF("limits", S_IRUGO, proc_pid_limits),
@@ -3137,9 +3070,9 @@ static const struct pid_entry tid_base_stuff[] = {
INF("cmdline", S_IRUGO, proc_pid_cmdline),
ONE("stat", S_IRUGO, proc_tid_stat),
ONE("statm", S_IRUGO, proc_pid_statm),
- REG("maps", S_IRUGO, proc_maps_operations),
+ REG_PERMS("maps", S_IRUGO, proc_maps_operations),
#ifdef CONFIG_NUMA
- REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
+ REG_PERMS("numa_maps", S_IRUGO, proc_numa_maps_operations),
#endif
REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd", proc_cwd_link),
@@ -3149,7 +3082,7 @@ static const struct pid_entry tid_base_stuff[] = {
REG("mountinfo", S_IRUGO, proc_mountinfo_operations),
#ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
- REG("smaps", S_IRUGO, proc_smaps_operations),
+ REG_PERMS("smaps", S_IRUGO, proc_smaps_operations),
REG("pagemap", S_IRUGO, proc_pagemap_operations),
#endif
#ifdef CONFIG_SECURITY
@@ -3209,7 +3142,7 @@ static const struct file_operations proc_tid_base_operations = {
.llseek = default_llseek,
};
-static const struct inode_operations proc_tid_base_inode_operations = {
+const struct inode_operations proc_tid_base_inode_operations = {
.lookup = proc_tid_base_lookup,
.getattr = pid_getattr,
.setattr = proc_setattr,
@@ -3226,7 +3159,8 @@ static struct dentry *proc_task_instantiate(struct inode *dir,
goto out;
inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO;
inode->i_op = &proc_tid_base_inode_operations;
- inode->i_fop = &proc_tid_base_operations;
+ inode->i_fop = &proc_pid_perms_fops;
+ PROC_I(inode)->real_fops = &proc_tid_base_operations;
inode->i_flags|=S_IMMUTABLE;
inode->i_nlink = 2 + pid_entry_count_dirs(tid_base_stuff,
@@ -3445,3 +3379,58 @@ static const struct file_operations proc_task_operations = {
.readdir = proc_task_readdir,
.llseek = default_llseek,
};
+
+const int perms_size[] = {
+ [PROC_PERMS_NTID] = ARRAY_SIZE(tid_base_stuff),
+ [PROC_PERMS_NTGID] = ARRAY_SIZE(tgid_base_stuff),
+ [PROC_PERMS_NATTR] = ARRAY_SIZE(attr_dir_stuff),
+};
+
+const struct pid_entry *dir_ents[] = {
+ [PROC_PERMS_NTID] = tid_base_stuff,
+ [PROC_PERMS_NTGID] = tgid_base_stuff,
+ [PROC_PERMS_NATTR] = attr_dir_stuff,
+};
+
+static void pid_revalidate_perms(struct dentry *dentry)
+{
+ struct inode *inode;
+ struct pid_namespace *pid_ns;
+ struct proc_perms *perms;
+ const struct inode_operations *parent_op;
+ struct proc_inode *ei;
+
+ pid_ns = dentry->d_sb->s_fs_info;
+ perms = &pid_ns->proc_perms;
+ inode = dentry->d_inode;
+ parent_op = dentry->d_parent->d_inode->i_op;
+ ei = PROC_I(inode);
+
+ /* We never ever chmod symlinks (XXX: symlinks must be protected too) */
+ if (S_ISLNK(inode->i_mode))
+ return;
+
+ if (inode->i_op == &proc_tgid_base_inode_operations ||
+ inode->i_op == &proc_tid_base_inode_operations) {
+ mode_t mode = S_IFDIR | S_IRUGO | S_IXUGO;
+ mode_t mask = perms->proc_pid_allowed ? ~0 : ~077;
+ inode->i_mode = mode & mask;
+ } else if (parent_op == &proc_tgid_base_inode_operations) {
+ inode->i_mode = get_ent_perms(ei->dirent,
+ perms,
+ PROC_PERMS_NTGID,
+ (const struct pid_entry *)ei->dirent - &tgid_base_stuff[0]);
+ } else if (parent_op == &proc_tid_base_inode_operations) {
+ inode->i_mode = get_ent_perms(ei->dirent,
+ perms,
+ PROC_PERMS_NTID,
+ (const struct pid_entry *)ei->dirent - &tid_base_stuff[0]);
+ } else if (parent_op == &proc_attr_dir_inode_operations) {
+ inode->i_mode = get_ent_perms(ei->dirent,
+ perms,
+ PROC_PERMS_NATTR,
+ (struct pid_entry *)ei->dirent - &attr_dir_stuff[0]);
+ } else {
+ pr_err("other not found (%s)\n", dentry->d_name.name);
+ }
+}
diff --git a/fs/proc/base_perms.c b/fs/proc/base_perms.c
new file mode 100644
index 0000000..9d05e7a
--- /dev/null
+++ b/fs/proc/base_perms.c
@@ -0,0 +1,314 @@
+/*
+ * linux/fs/proc/base_perms.c
+ *
+ * Copyright (C) 2011 Vasiliy Kulikov
+ *
+ * proc base directory permissions handling functions
+ */
+
+#include <asm/uaccess.h>
+#include <linux/proc_fs.h>
+#include <linux/string.h>
+#include <linux/ptrace.h>
+#include <linux/poll.h>
+#include <linux/pid_namespace.h>
+#include "internal.h"
+
+
+/* Similar to acl_permission_check(), but with procfs specific changes */
+static int proc_permission_check(struct task_struct *task, struct inode *inode, int mask)
+{
+ mode_t mode = inode->i_mode;
+ uid_t uid = inode->i_uid;
+ gid_t gid = inode->i_gid;
+
+ if (mode & mask)
+ return 0;
+
+ /* XXX: This should be updated conformably to user ns code changes.
+ * Or maybe just remove it as ns users may not have any access to procfs? */
+ if (current_user_ns() != inode_userns(inode))
+ return -EACCES;
+
+ if (in_group_p(gid) && ((mode >> 3) & mask))
+ return 0;
+
+ /*
+ * The major changes compared to acl_permission_check():
+ * 1) we check euid, not fsuid
+ * 2) we additionally check ptrace ability
+ */
+ if (current_euid() == uid) {
+ int pmode = (mask == MAY_WRITE) ? PTRACE_MODE_ATTACH : PTRACE_MODE_READ;
+ if (ptrace_may_access(task, pmode))
+ return 0;
+ }
+
+ return -EACCES;
+}
+
+static void unlock_pid_trace(struct task_struct *task);
+
+/*
+ * All operations with /proc/PID/ directory MUST be processed
+ * under lock_pid_trace() to avoid execve() races.
+ *
+ * It checks permissions similar to acl_permission_check() and probably holds
+ * (*task)->signal->cred_guard_mutex. If so, *task would point to the target
+ * task, NULL otherwise. To unlock the mutex call unlock_pid_trace() with the
+ * same *task argument.
+ */
+static int lock_pid_trace(struct dentry *dentry,
+ int mask,
+ struct task_struct **task)
+{
+ int err;
+ struct inode *inode = dentry->d_inode;
+ const struct pid_entry *pe = PROC_I(inode)->dirent;
+
+ *task = get_proc_task(inode);
+ if (*task == NULL)
+ return -ESRCH;
+
+ err = mutex_lock_killable(&(*task)->signal->cred_guard_mutex);
+ if (err)
+ goto put_task;
+
+ /*
+ * We have to revalidate both uid/gid and permissions.
+ * uid/gid revalidation is racy against execve(), so we do it under
+ * ->cred_guard_mutex.
+ *
+ * We cannot just update uid/gid because of LSM.
+ */
+ /* XXX: locking? */
+#if 1
+ err = -ESRCH;
+ if (pid_revalidate(dentry, NULL) == 0)
+ goto free_mutex;
+#endif
+
+ if (!pe->need_perms_check) {
+ /* OK, the file checks permissions on his own */
+ unlock_pid_trace(*task);
+ *task = NULL;
+ return 0;
+ }
+
+ err = -EACCES;
+ if (proc_permission_check(*task, inode, mask))
+ goto free_mutex;
+
+ err = 0;
+ goto exit;
+
+free_mutex:
+ mutex_unlock(&(*task)->signal->cred_guard_mutex);
+put_task:
+ put_task_struct(*task);
+ *task = NULL;
+exit:
+ return err;
+}
+
+static void unlock_pid_trace(struct task_struct *task)
+{
+ if (task) {
+ mutex_unlock(&task->signal->cred_guard_mutex);
+ put_task_struct(task);
+ }
+}
+
+static int proc_pid_perms_open(struct inode *inode, struct file *file)
+{
+ struct proc_inode *pi = PROC_I(inode);
+
+ /* No ptrace check here. All checks should be located
+ * in read/write/readdir/etc. */
+
+ if (pi->real_fops->open)
+ return pi->real_fops->open(inode, file);
+ return 0;
+}
+
+static int proc_pid_perms_release(struct inode *inode, struct file *file)
+{
+ struct proc_inode *pi = PROC_I(inode);
+
+ if (pi->real_fops->release)
+ return pi->real_fops->release(inode, file);
+ return 0;
+}
+
+static ssize_t proc_pid_perms_read(struct file * file, char __user * buf,
+ size_t count, loff_t *ppos)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct proc_inode *pi = PROC_I(dentry->d_inode);
+ int rc;
+ struct task_struct *task;
+
+ if (pi->real_fops->read) {
+ rc = lock_pid_trace(dentry, MAY_READ, &task);
+ if (!rc)
+ rc = pi->real_fops->read(file, buf, count, ppos);
+ unlock_pid_trace(task);
+ return rc;
+ }
+
+ /* Should never happen, but better safe than sorry */
+ return -EPERM;
+}
+
+static ssize_t proc_pid_perms_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *offs)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct proc_inode *pi = PROC_I(dentry->d_inode);
+ int rc;
+ struct task_struct *task;
+
+ if (pi->real_fops->write) {
+ rc = lock_pid_trace(dentry, MAY_WRITE, &task);
+ if (!rc)
+ rc = pi->real_fops->write(file, buf, count, offs);
+ unlock_pid_trace(task);
+ return rc;
+ }
+
+ /* Should never happen, but better safe than sorry */
+ return -EPERM;
+}
+
+static int proc_pid_perms_readdir(struct file *file, void *dirent, filldir_t filldir)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct proc_inode *pi = PROC_I(dentry->d_inode);
+ int rc;
+ struct task_struct *task;
+
+ if (pi->real_fops->readdir) {
+ rc = lock_pid_trace(dentry, MAY_READ, &task);
+ if (!rc)
+ rc = pi->real_fops->readdir(file, dirent, filldir);
+ unlock_pid_trace(task);
+ return rc;
+ }
+
+ /* Should never happen, but better safe than sorry */
+ return -ENOTDIR;
+}
+
+static unsigned proc_pid_perms_poll(struct file *file, poll_table *wait)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct proc_inode *pi = PROC_I(dentry->d_inode);
+ struct task_struct *task;
+ int rc;
+
+ /* Check perms before the poll. There is a race against execve(),
+ * but hopefuly the infoleak is very minor */
+ rc = lock_pid_trace(dentry, MAY_READ, &task);
+ unlock_pid_trace(task);
+ if (rc)
+ return rc;
+
+ if (pi->real_fops->poll)
+ return pi->real_fops->poll(file, wait);
+
+ /* The fallback for files not implementing poll */
+ return DEFAULT_POLLMASK;
+}
+
+static loff_t proc_pid_perms_llseek(struct file *file, loff_t offset, int orig)
+{
+ struct proc_inode *pi = PROC_I(file->f_dentry->d_inode);
+
+ /* Looks like llseek() doesn't need any ptrace protection */
+
+ if (pi->real_fops->llseek)
+ return pi->real_fops->llseek(file, offset, orig);
+
+ return no_llseek(file, offset, orig);
+}
+
+/*
+ * Global wrapper fops for _ALL_ files inside /proc/PID/.
+ *
+ * It checks the permissions (including ptrace() check) of the current
+ * task on all file ops. The runtime checks protect from holding any fd
+ * across execve() of set*id binaries. task->signal->cred_guard_mutex
+ * is held during read(), readdir(), and write(). open(), release(),
+ * llseek() are not protected as they are harmless anyway. It makes
+ * effort to check permissions on poll(), but it doesn't hold the
+ * mutex during actual fops->poll() as poll() might sleep for much time.
+ *
+ * It makes sense to use proc_pid_perms_fops even for files doing their
+ * own checks (e.g. REG_PERMS()).
+ */
+const struct file_operations proc_pid_perms_fops = {
+ .open = proc_pid_perms_open,
+ .read = proc_pid_perms_read,
+ .readdir = proc_pid_perms_readdir,
+ .write = proc_pid_perms_write,
+ .llseek = proc_pid_perms_llseek,
+ .poll = proc_pid_perms_poll,
+ .release = proc_pid_perms_release,
+};
+
+mode_t get_ent_perms(const struct pid_entry *p,
+ struct proc_perms *perms,
+ int ndir,
+ int nent)
+{
+ mode_t umask = perms->proc_ent_allowed[ndir][nent] ? 0 : 077;
+ return p->mode & ~umask;
+}
+
+int init_proc_perms(struct proc_perms *p)
+{
+ int i;
+
+ p->proc_pid_allowed = true;
+ for (i = 0; i < PROC_PERMS_NMAX; i++) {
+ size_t size = perms_size[i]*sizeof(bool);
+ p->proc_ent_allowed[i] = kzalloc(size, GFP_KERNEL);
+ if (p->proc_ent_allowed[i] == NULL)
+ goto err;
+ }
+
+ return 0;
+err:
+ for (i-- ; i >= 0; i--)
+ kfree(p->proc_ent_allowed[i]);
+ return -ENOMEM;
+}
+
+void free_proc_perms(struct proc_perms *p)
+{
+ int i;
+
+ for (i = 0; i < PROC_PERMS_NMAX; i++)
+ kfree(p->proc_ent_allowed[i]);
+}
+
+void fill_proc_perms_ent(struct proc_perms *p, int ndir, bool val)
+{
+ int i;
+ for (i = 0; i < perms_size[ndir]; i++)
+ p->proc_ent_allowed[ndir][i] = val;
+}
+
+bool set_proc_perms_ent(struct proc_perms *p, int ndir, const char *fname, bool val)
+{
+ int i;
+ for (i = 0; i < perms_size[ndir]; i++)
+ if (!strcmp(dir_ents[ndir][i].name, fname)) {
+ pr_err("found %s (%d)\n", fname, i);
+ p->proc_ent_allowed[ndir][i] = val;
+ return true;
+ }
+
+ pr_err("not found %s (%d)\n", fname, i);
+ return false;
+}
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 74b48cf..0a02c79 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -18,6 +18,10 @@
#include <linux/module.h>
#include <linux/sysctl.h>
#include <linux/slab.h>
+#include <linux/pid_namespace.h>
+#include <linux/seq_file.h>
+#include <linux/mount.h>
+
#include <asm/system.h>
#include <asm/uaccess.h>
@@ -69,6 +73,7 @@ static struct inode *proc_alloc_inode(struct super_block *sb)
ei->sysctl_entry = NULL;
ei->ns = NULL;
ei->ns_ops = NULL;
+ ei->real_fops = NULL;
inode = &ei->vfs_inode;
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
return inode;
@@ -102,12 +107,25 @@ void __init proc_init_inodecache(void)
init_once);
}
+static int proc_show_options(struct seq_file *seq, struct vfsmount *vfs)
+{
+ struct super_block *sb = vfs->mnt_sb;
+ struct pid_namespace *pid = sb->s_fs_info;
+
+ if (pid->proc_perms.proc_pid_allowed)
+ seq_printf(seq, ",proc_pid_allowed=.");
+
+ return 0;
+}
+
static const struct super_operations proc_sops = {
.alloc_inode = proc_alloc_inode,
.destroy_inode = proc_destroy_inode,
.drop_inode = generic_delete_inode,
.evict_inode = proc_evict_inode,
.statfs = simple_statfs,
+ .remount_fs = proc_remount,
+ .show_options = proc_show_options,
};
static void __pde_users_dec(struct proc_dir_entry *pde)
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 7838e5c..35c760a 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -141,7 +141,31 @@ struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct *t
extern const struct dentry_operations pid_dentry_operations;
int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
int proc_setattr(struct dentry *dentry, struct iattr *attr);
+int proc_remount(struct super_block *sb, int *flags, char *data);
extern const struct inode_operations proc_ns_dir_inode_operations;
extern const struct file_operations proc_ns_dir_operations;
+extern const struct inode_operations proc_tgid_base_inode_operations;
+extern const struct inode_operations proc_tid_base_inode_operations;
+extern const struct inode_operations proc_attr_dir_inode_operations;
+
+extern const struct file_operations proc_pid_perms_fops;
+extern const int perms_size[];
+extern const struct pid_entry *dir_ents[];
+
+extern mode_t get_ent_perms(const struct pid_entry *p,
+ struct proc_perms *perms,
+ int ndir,
+ int nent);
+
+struct pid_entry {
+ char *name;
+ int len;
+ mode_t mode;
+ const struct inode_operations *iop;
+ const struct file_operations *fop;
+ union proc_op op;
+ bool need_perms_check;
+};
+
diff --git a/fs/proc/root.c b/fs/proc/root.c
index d6c3b41..9193fa1 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -18,6 +18,7 @@
#include <linux/bitops.h>
#include <linux/mount.h>
#include <linux/pid_namespace.h>
+#include <linux/parser.h>
#include "internal.h"
@@ -36,6 +37,112 @@ static int proc_set_super(struct super_block *sb, void *data)
return err;
}
+enum {
+ Opt_err, Opt_tid_allowed, Opt_tgid_allowed, Opt_attr_allowed,
+};
+
+static const match_table_t tokens = {
+ {Opt_tgid_allowed, "tgid_allowed=%s"},
+ {Opt_tid_allowed, "tid_allowed=%s"},
+ {Opt_attr_allowed, "attr_allowed=%s"},
+ {Opt_err, NULL},
+};
+
+static bool proc_parse_xallowed(struct proc_perms *p, int ndir, char *val, bool set_pid_allowed)
+{
+ char *field;
+
+ fill_proc_perms_ent(p, ndir, false);
+
+ if (strcmp(val, "none") == 0) {
+ if (set_pid_allowed)
+ p->proc_pid_allowed = false;
+ return true;
+ }
+ if (strcmp(val, "all") == 0) {
+ fill_proc_perms_ent(p, ndir, true);
+ if (set_pid_allowed)
+ p->proc_pid_allowed = true;
+ return true;
+ }
+
+ while ((field = strsep(&val, ";")) != NULL) {
+ if (!set_proc_perms_ent(p, ndir, field, true))
+ return false;
+ }
+
+ if (set_pid_allowed)
+ p->proc_pid_allowed = true;
+ return true;
+}
+
+static int proc_parse_options(char *options, struct pid_namespace *pid)
+{
+ char *p, *val;
+ substring_t args[MAX_OPT_ARGS];
+
+ pr_debug("proc: options = %s\n", options);
+
+ if (!options)
+ return 1;
+
+ while ((p = strsep(&options, ",")) != NULL) {
+ int token;
+ if (!*p)
+ continue;
+
+ args[0].to = args[0].from = 0;
+ token = match_token(p, tokens, args);
+ switch (token) {
+ case Opt_tgid_allowed:
+ val = match_strdup(&args[0]);
+ if (!val)
+ return 0;
+ if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NTGID, val, true)) {
+ kfree(val);
+ return 0;
+ }
+ kfree(val);
+ break;
+ case Opt_tid_allowed:
+ val = match_strdup(&args[0]);
+ if (!val)
+ return 0;
+ if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NTID, val, false)) {
+ kfree(val);
+ return 0;
+ }
+ kfree(val);
+ break;
+ case Opt_attr_allowed:
+ val = match_strdup(&args[0]);
+ if (!val)
+ return 0;
+ if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NATTR, val, false)) {
+ kfree(val);
+ return 0;
+ }
+ kfree(val);
+ break;
+ default:
+ pr_err("proc: unrecognized mount option \"%s\" "
+ "or missing value", p);
+ return 0;
+ }
+ }
+
+ //pr_debug("proc: gid = %u, hidepid = %o, hidenet = %d\n", pid->pid_gid, pid->hide_pid, (int)pid->hide_net);
+
+ return 1;
+}
+
+int proc_remount(struct super_block *sb, int *flags, char *data)
+{
+ struct pid_namespace *pid = sb->s_fs_info;
+ return !proc_parse_options(data, pid);
+}
+
+
static struct dentry *proc_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
@@ -43,11 +150,16 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
struct super_block *sb;
struct pid_namespace *ns;
struct proc_inode *ei;
+ char *options;
- if (flags & MS_KERNMOUNT)
+
+ if (flags & MS_KERNMOUNT) {
ns = (struct pid_namespace *)data;
- else
+ options = NULL;
+ } else {
ns = current->nsproxy->pid_ns;
+ options = data;
+ }
sb = sget(fs_type, proc_test_super, proc_set_super, ns);
if (IS_ERR(sb))
@@ -55,6 +167,10 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
if (!sb->s_root) {
sb->s_flags = flags;
+ if (!proc_parse_options(options, ns)) {
+ deactivate_locked_super(sb);
+ return ERR_PTR(-EINVAL);
+ }
err = proc_fill_super(sb);
if (err) {
deactivate_locked_super(sb);
@@ -74,6 +190,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
return dget(sb->s_root);
}
+
static void proc_kill_sb(struct super_block *sb)
{
struct pid_namespace *ns;
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 980de54..da98ba2 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -201,7 +201,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
if (!priv->task)
return ERR_PTR(-ESRCH);
- mm = mm_for_maps(priv->task);
+ mm = get_task_mm(priv->task);
if (!mm || IS_ERR(mm)) {
put_task_struct(priv->task);
priv->task = NULL;
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 38d1032..ed10329 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -16,6 +16,18 @@ struct pidmap {
struct bsd_acct_struct;
+enum {
+ PROC_PERMS_NTID,
+ PROC_PERMS_NTGID,
+ PROC_PERMS_NATTR,
+ PROC_PERMS_NMAX
+};
+
+struct proc_perms {
+ bool proc_pid_allowed;
+ bool *proc_ent_allowed[PROC_PERMS_NMAX];
+};
+
struct pid_namespace {
struct kref kref;
struct pidmap pidmap[PIDMAP_ENTRIES];
@@ -30,6 +42,7 @@ struct pid_namespace {
#ifdef CONFIG_BSD_PROCESS_ACCT
struct bsd_acct_struct *bacct;
#endif
+ struct proc_perms proc_perms;
};
extern struct pid_namespace init_pid_ns;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index e7576cf..a1525d7 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -115,10 +115,17 @@ struct proc_dir_entry *proc_create_data(const char *name, mode_t mode,
extern void remove_proc_entry(const char *name, struct proc_dir_entry *parent);
struct pid_namespace;
+struct proc_perms;
extern int pid_ns_prepare_proc(struct pid_namespace *ns);
extern void pid_ns_release_proc(struct pid_namespace *ns);
+extern int init_proc_perms(struct proc_perms *p);
+extern void free_proc_perms(struct proc_perms *p);
+
+extern void fill_proc_perms_ent(struct proc_perms *p, int ndir, bool val);
+extern bool set_proc_perms_ent(struct proc_perms *p, int ndir, const char *fname, bool val);
+
/*
* proc_tty.c
*/
@@ -267,6 +274,11 @@ struct proc_inode {
struct pid *pid;
int fd;
union proc_op op;
+
+ /* Used by files inside /proc/PID/ only */
+ const struct file_operations *real_fops;
+ const void *dirent;
+
struct proc_dir_entry *pde;
struct ctl_table_header *sysctl;
struct ctl_table *sysctl_entry;
diff --git a/kernel/pid.c b/kernel/pid.c
index 57a8346..ffa8b37 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -565,6 +565,8 @@ void __init pidmap_init(void)
set_bit(0, init_pid_ns.pidmap[0].page);
atomic_dec(&init_pid_ns.pidmap[0].nr_free);
+ BUG_ON(init_proc_perms(&init_pid_ns.proc_perms));
+
init_pid_ns.pid_cachep = KMEM_CACHE(pid,
SLAB_HWCACHE_ALIGN | SLAB_PANIC);
}
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index e9c9adc..ab4261c 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -97,12 +97,18 @@ static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_p
for (i = 1; i < PIDMAP_ENTRIES; i++)
atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
- err = pid_ns_prepare_proc(ns);
+ err = init_proc_perms(&ns->proc_perms);
if (err)
goto out_put_parent_pid_ns;
+ err = pid_ns_prepare_proc(ns);
+ if (err)
+ goto out_free_proc_perms;
+
return ns;
+out_free_proc_perms:
+ free_proc_perms(&ns->proc_perms);
out_put_parent_pid_ns:
put_pid_ns(parent_pid_ns);
out_free_map:
@@ -119,6 +125,8 @@ static void destroy_pid_namespace(struct pid_namespace *ns)
for (i = 0; i < PIDMAP_ENTRIES; i++)
kfree(ns->pidmap[i].page);
+
+ free_proc_perms(&ns->proc_perms);
kmem_cache_free(pid_ns_cachep, ns);
}
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f175d98..745b71e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -57,6 +57,7 @@
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
#include <linux/kmod.h>
+#include <linux/pid_namespace.h>
#include <asm/uaccess.h>
#include <asm/processor.h>
--
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [kernel-hardening] Re: procfs {tid,tgid,attr}_allowed mount options
2011-07-29 17:47 ` [kernel-hardening] procfs {tid,tgid,attr}_allowed " Vasiliy Kulikov
@ 2011-08-04 11:23 ` Vasiliy Kulikov
2011-08-10 10:02 ` Vasiliy Kulikov
2011-08-10 11:25 ` Solar Designer
0 siblings, 2 replies; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-08-04 11:23 UTC (permalink / raw)
To: kernel-hardening
Hi,
New version. Cleanups/fixes here and there.
It lacks net/ restriction, but IMO it is already complicated enough
(more than 500 new lines). Such (relatively) simple thing as
net_allowed= is an additional care. I'd achieve at least processes
restrictions in upstream, after it will come networking.
---
fs/proc/Makefile | 2 +-
fs/proc/base.c | 286 +++++++++++++++++++++------------------
fs/proc/base_perms.c | 305 +++++++++++++++++++++++++++++++++++++++++
fs/proc/inode.c | 19 +++
fs/proc/internal.h | 25 ++++
fs/proc/root.c | 121 ++++++++++++++++-
fs/proc/task_nommu.c | 2 +-
include/linux/pid_namespace.h | 13 ++
include/linux/proc_fs.h | 12 ++
kernel/pid.c | 2 +
kernel/pid_namespace.c | 10 ++-
kernel/sysctl.c | 1 +
12 files changed, 660 insertions(+), 138 deletions(-)
---
diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index c1c7293..81020f9 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -8,7 +8,7 @@ proc-y := nommu.o task_nommu.o
proc-$(CONFIG_MMU) := mmu.o task_mmu.o
proc-y += inode.o root.o base.o generic.o array.o \
- proc_tty.o
+ proc_tty.o base_perms.o
proc-y += cmdline.o
proc-y += consoles.o
proc-y += cpuinfo.o
diff --git a/fs/proc/base.c b/fs/proc/base.c
index fc5bc27..33684f7 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -98,40 +98,43 @@
* in /proc for a task before it execs a suid executable.
*/
-struct pid_entry {
- char *name;
- int len;
- mode_t mode;
- const struct inode_operations *iop;
- const struct file_operations *fop;
- union proc_op op;
-};
-
-#define NOD(NAME, MODE, IOP, FOP, OP) { \
+#define NOD(NAME, MODE, IOP, FOP, OP, PERMS) { \
.name = (NAME), \
.len = sizeof(NAME) - 1, \
.mode = MODE, \
.iop = IOP, \
.fop = FOP, \
.op = OP, \
+ .need_perms_check = PERMS, \
}
+/*
+ * XX_PERMS() are files without any ptrace() check.
+ * However, they have updated uid/gid and permissions on each file operation.
+ */
#define DIR(NAME, MODE, iops, fops) \
- NOD(NAME, (S_IFDIR|(MODE)), &iops, &fops, {} )
+ NOD(NAME, (S_IFDIR|(MODE)), &iops, &fops, {}, true)
#define LNK(NAME, get_link) \
NOD(NAME, (S_IFLNK|S_IRWXUGO), \
&proc_pid_link_inode_operations, NULL, \
- { .proc_get_link = get_link } )
+ { .proc_get_link = get_link }, false)
#define REG(NAME, MODE, fops) \
- NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {})
+ NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}, true)
+#define REG_PERMS(NAME, MODE, fops) \
+ NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}, false)
#define INF(NAME, MODE, read) \
NOD(NAME, (S_IFREG|(MODE)), \
NULL, &proc_info_file_operations, \
- { .proc_read = read } )
+ { .proc_read = read }, true )
+#define INF_PERMS(NAME, MODE, read) \
+ NOD(NAME, (S_IFREG|(MODE)), \
+ NULL, &proc_info_file_operations, \
+ { .proc_read = read }, false )
#define ONE(NAME, MODE, show) \
NOD(NAME, (S_IFREG|(MODE)), \
NULL, &proc_single_file_operations, \
- { .proc_show = show } )
+ { .proc_show = show }, true )
+
/*
* Count the number of hardlinks for the pid_entry table, excluding the .
@@ -229,35 +232,12 @@ static struct mm_struct *__check_mem_permission(struct task_struct *task)
return ERR_PTR(-EPERM);
}
-/*
- * If current may access user memory in @task return a reference to the
- * corresponding mm, otherwise ERR_PTR.
- */
-static struct mm_struct *check_mem_permission(struct task_struct *task)
-{
- struct mm_struct *mm;
- int err;
-
- /*
- * Avoid racing if task exec's as we might get a new mm but validate
- * against old credentials.
- */
- err = mutex_lock_killable(&task->signal->cred_guard_mutex);
- if (err)
- return ERR_PTR(err);
-
- mm = __check_mem_permission(task);
- mutex_unlock(&task->signal->cred_guard_mutex);
-
- return mm;
-}
-
struct mm_struct *mm_for_maps(struct task_struct *task)
{
struct mm_struct *mm;
int err;
- err = mutex_lock_killable(&task->signal->cred_guard_mutex);
+ err = mutex_lock_killable(&task->signal->cred_guard_mutex);
if (err)
return ERR_PTR(err);
@@ -327,7 +307,6 @@ static int proc_pid_auxv(struct task_struct *task, char *buffer)
return res;
}
-
#ifdef CONFIG_KALLSYMS
/*
* Provides a wchan file via kallsyms in a proper one-value-per-file format.
@@ -350,23 +329,6 @@ static int proc_pid_wchan(struct task_struct *task, char *buffer)
}
#endif /* CONFIG_KALLSYMS */
-static int lock_trace(struct task_struct *task)
-{
- int err = mutex_lock_killable(&task->signal->cred_guard_mutex);
- if (err)
- return err;
- if (!ptrace_may_access(task, PTRACE_MODE_ATTACH)) {
- mutex_unlock(&task->signal->cred_guard_mutex);
- return -EPERM;
- }
- return 0;
-}
-
-static void unlock_trace(struct task_struct *task)
-{
- mutex_unlock(&task->signal->cred_guard_mutex);
-}
-
#ifdef CONFIG_STACKTRACE
#define MAX_STACK_TRACE_DEPTH 64
@@ -388,16 +350,13 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
trace.entries = entries;
trace.skip = 0;
- err = lock_trace(task);
- if (!err) {
- save_stack_trace_tsk(task, &trace);
+ save_stack_trace_tsk(task, &trace);
- for (i = 0; i < trace.nr_entries; i++) {
- seq_printf(m, "[<%pK>] %pS\n",
- (void *)entries[i], (void *)entries[i]);
- }
- unlock_trace(task);
+ for (i = 0; i < trace.nr_entries; i++) {
+ seq_printf(m, "[<%pK>] %pS\n",
+ (void *)entries[i], (void *)entries[i]);
}
+
kfree(entries);
return err;
@@ -563,9 +522,7 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer)
{
long nr;
unsigned long args[6], sp, pc;
- int res = lock_trace(task);
- if (res)
- return res;
+ int res;
if (task_current_syscall(task, &nr, args, 6, &sp, &pc))
res = sprintf(buffer, "running\n");
@@ -577,7 +534,7 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer)
nr,
args[0], args[1], args[2], args[3], args[4], args[5],
sp, pc);
- unlock_trace(task);
+
return res;
}
#endif /* CONFIG_HAVE_ARCH_TRACEHOOK */
@@ -589,18 +546,18 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer)
/* permission checks */
static int proc_fd_access_allowed(struct inode *inode)
{
- struct task_struct *task;
- int allowed = 0;
- /* Allow access to a task's file descriptors if it is us or we
- * may use ptrace attach to the process and find out that
- * information.
- */
- task = get_proc_task(inode);
- if (task) {
- allowed = ptrace_may_access(task, PTRACE_MODE_READ);
- put_task_struct(task);
- }
- return allowed;
+ struct task_struct *task;
+ int allowed = 0;
+ /* Allow access to a task's file descriptors if it is us or we
+ * may use ptrace attach to the process and find out that
+ * information.
+ */
+ task = get_proc_task(inode);
+ if (task) {
+ allowed = ptrace_may_access(task, PTRACE_MODE_READ);
+ put_task_struct(task);
+ }
+ return allowed;
}
int proc_setattr(struct dentry *dentry, struct iattr *attr)
@@ -839,7 +796,7 @@ static ssize_t mem_read(struct file * file, char __user * buf,
if (!page)
goto out;
- mm = check_mem_permission(task);
+ mm = __check_mem_permission(task);
ret = PTR_ERR(mm);
if (IS_ERR(mm))
goto out_free;
@@ -902,7 +859,7 @@ static ssize_t mem_write(struct file * file, const char __user *buf,
if (!page)
goto out_task;
- mm = check_mem_permission(task);
+ mm = __check_mem_permission(task);
copied = PTR_ERR(mm);
if (IS_ERR(mm))
goto out_free;
@@ -1758,6 +1715,27 @@ int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
/* dentry stuff */
+static void pid_revalidate_perms(struct dentry *dentry);
+
+void __pid_revalidate(struct inode *inode, struct task_struct *task)
+{
+ const struct cred *cred;
+
+ if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) ||
+ task_dumpable(task)) {
+ rcu_read_lock();
+ cred = __task_cred(task);
+ inode->i_uid = cred->euid;
+ inode->i_gid = cred->egid;
+ rcu_read_unlock();
+ } else {
+ inode->i_uid = 0;
+ inode->i_gid = 0;
+ }
+ inode->i_mode &= ~(S_ISUID | S_ISGID);
+ security_task_to_inode(task, inode);
+}
+
/*
* Exceptional case: normally we are not allowed to unhash a busy
* directory. In this case, however, we can do it - no aliasing problems
@@ -1777,7 +1755,6 @@ int pid_revalidate(struct dentry *dentry, struct nameidata *nd)
{
struct inode *inode;
struct task_struct *task;
- const struct cred *cred;
if (nd && nd->flags & LOOKUP_RCU)
return -ECHILD;
@@ -1785,20 +1762,10 @@ int pid_revalidate(struct dentry *dentry, struct nameidata *nd)
inode = dentry->d_inode;
task = get_proc_task(inode);
+ pid_revalidate_perms(dentry);
+
if (task) {
- if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) ||
- task_dumpable(task)) {
- rcu_read_lock();
- cred = __task_cred(task);
- inode->i_uid = cred->euid;
- inode->i_gid = cred->egid;
- rcu_read_unlock();
- } else {
- inode->i_uid = 0;
- inode->i_gid = 0;
- }
- inode->i_mode &= ~(S_ISUID | S_ISGID);
- security_task_to_inode(task, inode);
+ __pid_revalidate(inode, task);
put_task_struct(task);
return 1;
}
@@ -2200,7 +2167,9 @@ static struct dentry *proc_fdinfo_instantiate(struct inode *dir,
ei = PROC_I(inode);
ei->fd = fd;
inode->i_mode = S_IFREG | S_IRUSR;
- inode->i_fop = &proc_fdinfo_file_operations;
+ ei->real_fops = &proc_fdinfo_file_operations;
+ inode->i_fop = &proc_pid_perms_fops;
+
d_set_d_op(dentry, &tid_fd_dentry_operations);
d_add(dentry, inode);
/* Close the race of the process dying before we return the dentry */
@@ -2238,7 +2207,6 @@ static const struct inode_operations proc_fdinfo_inode_operations = {
.setattr = proc_setattr,
};
-
static struct dentry *proc_pident_instantiate(struct inode *dir,
struct dentry *dentry, struct task_struct *task, const void *ptr)
{
@@ -2255,11 +2223,16 @@ static struct dentry *proc_pident_instantiate(struct inode *dir,
inode->i_mode = p->mode;
if (S_ISDIR(inode->i_mode))
inode->i_nlink = 2; /* Use getattr to fix if necessary */
+
if (p->iop)
inode->i_op = p->iop;
if (p->fop)
- inode->i_fop = p->fop;
+ ei->real_fops = p->fop;
ei->op = p->op;
+ inode->i_fop = &proc_pid_perms_fops;
+
+ ei->dirent = p;
+
d_set_d_op(dentry, &pid_dentry_operations);
d_add(dentry, inode);
/* Close the race of the process dying before we return the dentry */
@@ -2417,15 +2390,9 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
if (copy_from_user(page, buf, count))
goto out_free;
- /* Guard against adverse ptrace interaction */
- length = mutex_lock_interruptible(&task->signal->cred_guard_mutex);
- if (length < 0)
- goto out_free;
-
length = security_setprocattr(task,
(char*)file->f_path.dentry->d_name.name,
(void*)page, count);
- mutex_unlock(&task->signal->cred_guard_mutex);
out_free:
free_page((unsigned long) page);
out:
@@ -2469,7 +2436,7 @@ static struct dentry *proc_attr_dir_lookup(struct inode *dir,
attr_dir_stuff, ARRAY_SIZE(attr_dir_stuff));
}
-static const struct inode_operations proc_attr_dir_inode_operations = {
+const struct inode_operations proc_attr_dir_inode_operations = {
.lookup = proc_attr_dir_lookup,
.getattr = pid_getattr,
.setattr = proc_setattr,
@@ -2617,7 +2584,7 @@ static const struct inode_operations proc_self_inode_operations = {
*/
static const struct pid_entry proc_base_stuff[] = {
NOD("self", S_IFLNK|S_IRWXUGO,
- &proc_self_inode_operations, NULL, {}),
+ &proc_self_inode_operations, NULL, {}, false),
};
static struct dentry *proc_base_instantiate(struct inode *dir,
@@ -2653,9 +2620,12 @@ static struct dentry *proc_base_instantiate(struct inode *dir,
inode->i_size = 64;
if (p->iop)
inode->i_op = p->iop;
+
if (p->fop)
- inode->i_fop = p->fop;
+ ei->real_fops = p->fop;
ei->op = p->op;
+ inode->i_fop = &proc_pid_perms_fops;
+
d_add(dentry, inode);
error = NULL;
out:
@@ -2708,9 +2678,6 @@ static int do_io_accounting(struct task_struct *task, char *buffer, int whole)
struct task_io_accounting acct = task->ioac;
unsigned long flags;
- if (!ptrace_may_access(task, PTRACE_MODE_READ))
- return -EACCES;
-
if (whole && lock_task_sighand(task, &flags)) {
struct task_struct *t = task;
@@ -2751,12 +2718,8 @@ static int proc_tgid_io_accounting(struct task_struct *task, char *buffer)
static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
- int err = lock_trace(task);
- if (!err) {
- seq_printf(m, "%08x\n", task->personality);
- unlock_trace(task);
- }
- return err;
+ seq_printf(m, "%08x\n", task->personality);
+ return 0;
}
/*
@@ -2773,8 +2736,8 @@ static const struct pid_entry tgid_base_stuff[] = {
#ifdef CONFIG_NET
DIR("net", S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations),
#endif
- REG("environ", S_IRUSR, proc_environ_operations),
- INF("auxv", S_IRUSR, proc_pid_auxv),
+ REG_PERMS("environ", S_IRUSR, proc_environ_operations),
+ INF_PERMS("auxv", S_IRUSR, proc_pid_auxv),
ONE("status", S_IRUGO, proc_pid_status),
ONE("personality", S_IRUGO, proc_pid_personality),
INF("limits", S_IRUGO, proc_pid_limits),
@@ -2791,9 +2754,9 @@ static const struct pid_entry tgid_base_stuff[] = {
INF("cmdline", S_IRUGO, proc_pid_cmdline),
ONE("stat", S_IRUGO, proc_tgid_stat),
ONE("statm", S_IRUGO, proc_pid_statm),
- REG("maps", S_IRUGO, proc_maps_operations),
+ REG_PERMS("maps", S_IRUGO, proc_maps_operations),
#ifdef CONFIG_NUMA
- REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
+ REG_PERMS("numa_maps", S_IRUGO, proc_numa_maps_operations),
#endif
REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd", proc_cwd_link),
@@ -2804,7 +2767,7 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("mountstats", S_IRUSR, proc_mountstats_operations),
#ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
- REG("smaps", S_IRUGO, proc_smaps_operations),
+ REG_PERMS("smaps", S_IRUGO, proc_smaps_operations),
REG("pagemap", S_IRUGO, proc_pagemap_operations),
#endif
#ifdef CONFIG_SECURITY
@@ -2867,7 +2830,7 @@ static struct dentry *proc_tgid_base_lookup(struct inode *dir, struct dentry *de
tgid_base_stuff, ARRAY_SIZE(tgid_base_stuff));
}
-static const struct inode_operations proc_tgid_base_inode_operations = {
+const struct inode_operations proc_tgid_base_inode_operations = {
.lookup = proc_tgid_base_lookup,
.getattr = pid_getattr,
.setattr = proc_setattr,
@@ -2972,9 +2935,10 @@ static struct dentry *proc_pid_instantiate(struct inode *dir,
if (!inode)
goto out;
- inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO;
+ inode->i_mode = S_IFDIR | S_IRUGO | S_IXUGO;
inode->i_op = &proc_tgid_base_inode_operations;
- inode->i_fop = &proc_tgid_base_operations;
+ inode->i_fop = &proc_pid_perms_fops;
+ PROC_I(inode)->real_fops = &proc_tgid_base_operations;
inode->i_flags|=S_IMMUTABLE;
inode->i_nlink = 2 + pid_entry_count_dirs(tgid_base_stuff,
@@ -3122,8 +3086,8 @@ static const struct pid_entry tid_base_stuff[] = {
DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations),
DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations),
DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations),
- REG("environ", S_IRUSR, proc_environ_operations),
- INF("auxv", S_IRUSR, proc_pid_auxv),
+ REG_PERMS("environ", S_IRUSR, proc_environ_operations),
+ INF_PERMS("auxv", S_IRUSR, proc_pid_auxv),
ONE("status", S_IRUGO, proc_pid_status),
ONE("personality", S_IRUGO, proc_pid_personality),
INF("limits", S_IRUGO, proc_pid_limits),
@@ -3137,9 +3101,9 @@ static const struct pid_entry tid_base_stuff[] = {
INF("cmdline", S_IRUGO, proc_pid_cmdline),
ONE("stat", S_IRUGO, proc_tid_stat),
ONE("statm", S_IRUGO, proc_pid_statm),
- REG("maps", S_IRUGO, proc_maps_operations),
+ REG_PERMS("maps", S_IRUGO, proc_maps_operations),
#ifdef CONFIG_NUMA
- REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
+ REG_PERMS("numa_maps", S_IRUGO, proc_numa_maps_operations),
#endif
REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd", proc_cwd_link),
@@ -3149,7 +3113,7 @@ static const struct pid_entry tid_base_stuff[] = {
REG("mountinfo", S_IRUGO, proc_mountinfo_operations),
#ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
- REG("smaps", S_IRUGO, proc_smaps_operations),
+ REG_PERMS("smaps", S_IRUGO, proc_smaps_operations),
REG("pagemap", S_IRUGO, proc_pagemap_operations),
#endif
#ifdef CONFIG_SECURITY
@@ -3209,7 +3173,7 @@ static const struct file_operations proc_tid_base_operations = {
.llseek = default_llseek,
};
-static const struct inode_operations proc_tid_base_inode_operations = {
+const struct inode_operations proc_tid_base_inode_operations = {
.lookup = proc_tid_base_lookup,
.getattr = pid_getattr,
.setattr = proc_setattr,
@@ -3226,7 +3190,8 @@ static struct dentry *proc_task_instantiate(struct inode *dir,
goto out;
inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO;
inode->i_op = &proc_tid_base_inode_operations;
- inode->i_fop = &proc_tid_base_operations;
+ inode->i_fop = &proc_pid_perms_fops;
+ PROC_I(inode)->real_fops = &proc_tid_base_operations;
inode->i_flags|=S_IMMUTABLE;
inode->i_nlink = 2 + pid_entry_count_dirs(tid_base_stuff,
@@ -3445,3 +3410,58 @@ static const struct file_operations proc_task_operations = {
.readdir = proc_task_readdir,
.llseek = default_llseek,
};
+
+const int perms_size[] = {
+ [PROC_PERMS_NTID] = ARRAY_SIZE(tid_base_stuff),
+ [PROC_PERMS_NTGID] = ARRAY_SIZE(tgid_base_stuff),
+ [PROC_PERMS_NATTR] = ARRAY_SIZE(attr_dir_stuff),
+};
+
+const struct pid_entry *dir_ents[] = {
+ [PROC_PERMS_NTID] = tid_base_stuff,
+ [PROC_PERMS_NTGID] = tgid_base_stuff,
+ [PROC_PERMS_NATTR] = attr_dir_stuff,
+};
+
+static void pid_revalidate_perms(struct dentry *dentry)
+{
+ struct inode *inode;
+ struct pid_namespace *pid_ns;
+ struct proc_perms *perms;
+ const struct inode_operations *parent_op;
+ struct proc_inode *ei;
+
+ pid_ns = dentry->d_sb->s_fs_info;
+ perms = &pid_ns->proc_perms;
+ inode = dentry->d_inode;
+ parent_op = dentry->d_parent->d_inode->i_op;
+ ei = PROC_I(inode);
+
+ /* We never ever chmod symlinks (XXX: symlinks must be protected too) */
+ if (S_ISLNK(inode->i_mode))
+ return;
+
+ if (inode->i_op == &proc_tgid_base_inode_operations ||
+ inode->i_op == &proc_tid_base_inode_operations) {
+ mode_t mode = S_IFDIR | S_IRUGO | S_IXUGO;
+ mode_t mask = perms->proc_pid_allowed ? ~0 : ~077;
+ inode->i_mode = mode & mask;
+ } else if (parent_op == &proc_tgid_base_inode_operations) {
+ inode->i_mode = get_ent_perms(ei->dirent,
+ perms,
+ PROC_PERMS_NTGID,
+ (const struct pid_entry *)ei->dirent - &tgid_base_stuff[0]);
+ } else if (parent_op == &proc_tid_base_inode_operations) {
+ inode->i_mode = get_ent_perms(ei->dirent,
+ perms,
+ PROC_PERMS_NTID,
+ (const struct pid_entry *)ei->dirent - &tid_base_stuff[0]);
+ } else if (parent_op == &proc_attr_dir_inode_operations) {
+ inode->i_mode = get_ent_perms(ei->dirent,
+ perms,
+ PROC_PERMS_NATTR,
+ (struct pid_entry *)ei->dirent - &attr_dir_stuff[0]);
+ } else {
+ pr_err("other not found (%s)\n", dentry->d_name.name);
+ }
+}
diff --git a/fs/proc/base_perms.c b/fs/proc/base_perms.c
new file mode 100644
index 0000000..26e157a
--- /dev/null
+++ b/fs/proc/base_perms.c
@@ -0,0 +1,305 @@
+/*
+ * linux/fs/proc/base_perms.c
+ *
+ * Copyright (C) 2011 Vasiliy Kulikov
+ *
+ * proc base directory permissions handling functions
+ */
+
+#include <asm/uaccess.h>
+#include <linux/proc_fs.h>
+#include <linux/string.h>
+#include <linux/ptrace.h>
+#include <linux/poll.h>
+#include <linux/pid_namespace.h>
+#include "internal.h"
+
+
+/* Similar to acl_permission_check(), but with procfs specific changes */
+static int proc_permission_check(struct task_struct *task, struct inode *inode, int mask)
+{
+ mode_t mode = inode->i_mode;
+ //uid_t uid = inode->i_uid;
+ gid_t gid = inode->i_gid;
+ int pmode = (mask == MAY_WRITE) ? PTRACE_MODE_ATTACH : PTRACE_MODE_READ;
+
+ if (mode & mask)
+ return 0;
+
+ /* XXX: This should be updated conformably to user ns code changes.
+ * Or maybe just remove it as ns users may not have any access to procfs? */
+ if (current_user_ns() != inode_userns(inode))
+ return -EACCES;
+
+ if (in_group_p(gid) && ((mode >> 3) & mask))
+ return 0;
+
+ /*
+ * The major changes compared to acl_permission_check():
+ * We check ptrace ability instead of uid comparison.
+ */
+ if (ptrace_may_access(task, pmode))
+ return 0;
+
+ return -EACCES;
+}
+
+static void unlock_pid_trace(struct task_struct *task);
+
+/*
+ * All operations with /proc/PID/ directory MUST be processed
+ * under lock_pid_trace() to avoid execve() races.
+ *
+ * It checks permissions similar to acl_permission_check() and probably holds
+ * (*task)->signal->cred_guard_mutex. If so, *task would point to the target
+ * task, NULL otherwise. To unlock the mutex call unlock_pid_trace() with the
+ * same *task argument.
+ */
+static int lock_pid_trace(struct dentry *dentry,
+ int mask,
+ struct task_struct **task)
+{
+ int err;
+ struct inode *inode = dentry->d_inode;
+ const struct pid_entry *pe = PROC_I(inode)->dirent;
+
+ if (pe && !pe->need_perms_check) {
+ /* OK, the file checks permissions on his own */
+ *task = NULL;
+ return 0;
+ }
+
+ *task = get_proc_task(inode);
+ if (*task == NULL)
+ return -ESRCH;
+
+ err = mutex_lock_killable(&(*task)->signal->cred_guard_mutex);
+ if (err)
+ goto put_task;
+
+ /*
+ * We have to revalidate both uid/gid and permissions.
+ * uid/gid revalidation is racy against execve(), so we do it under
+ * ->cred_guard_mutex.
+ *
+ * We cannot just update uid/gid because of LSM.
+ */
+ __pid_revalidate(inode, *task);
+
+ err = -EACCES;
+ if (proc_permission_check(*task, inode, mask))
+ goto free_mutex;
+
+ err = 0;
+ goto exit;
+
+free_mutex:
+ mutex_unlock(&(*task)->signal->cred_guard_mutex);
+put_task:
+ put_task_struct(*task);
+ *task = NULL;
+exit:
+ return err;
+}
+
+static void unlock_pid_trace(struct task_struct *task)
+{
+ if (task) {
+ mutex_unlock(&task->signal->cred_guard_mutex);
+ put_task_struct(task);
+ }
+}
+
+static int proc_pid_perms_open(struct inode *inode, struct file *file)
+{
+ struct proc_inode *pi = PROC_I(inode);
+
+ /* No ptrace check here. All checks should be located
+ * in read/write/readdir/etc. */
+
+ if (pi->real_fops->open)
+ return pi->real_fops->open(inode, file);
+ return 0;
+}
+
+static int proc_pid_perms_release(struct inode *inode, struct file *file)
+{
+ struct proc_inode *pi = PROC_I(inode);
+
+ if (pi->real_fops->release)
+ return pi->real_fops->release(inode, file);
+ return 0;
+}
+
+static ssize_t proc_pid_perms_read(struct file * file, char __user * buf,
+ size_t count, loff_t *ppos)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct proc_inode *pi = PROC_I(dentry->d_inode);
+ int rc;
+ struct task_struct *task;
+
+ if (pi->real_fops->read) {
+ rc = lock_pid_trace(dentry, MAY_READ, &task);
+ if (!rc)
+ rc = pi->real_fops->read(file, buf, count, ppos);
+ unlock_pid_trace(task);
+ return rc;
+ }
+
+ /* Should never happen, but better safe than sorry */
+ return -EPERM;
+}
+
+static ssize_t proc_pid_perms_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *offs)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct proc_inode *pi = PROC_I(dentry->d_inode);
+ int rc;
+ struct task_struct *task;
+
+ if (pi->real_fops->write) {
+ rc = lock_pid_trace(dentry, MAY_WRITE, &task);
+ if (!rc)
+ rc = pi->real_fops->write(file, buf, count, offs);
+ unlock_pid_trace(task);
+ return rc;
+ }
+
+ /* Should never happen, but better safe than sorry */
+ return -EPERM;
+}
+
+static int proc_pid_perms_readdir(struct file *file, void *dirent, filldir_t filldir)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct proc_inode *pi = PROC_I(dentry->d_inode);
+ int rc;
+ struct task_struct *task;
+
+ if (pi->real_fops->readdir) {
+ rc = lock_pid_trace(dentry, MAY_READ, &task);
+ if (!rc)
+ rc = pi->real_fops->readdir(file, dirent, filldir);
+ unlock_pid_trace(task);
+ return rc;
+ }
+
+ /* Should never happen, but better safe than sorry */
+ return -ENOTDIR;
+}
+
+static unsigned proc_pid_perms_poll(struct file *file, poll_table *wait)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct proc_inode *pi = PROC_I(dentry->d_inode);
+ struct task_struct *task;
+ int rc;
+
+ /* Check perms before the poll. There is a race against execve(),
+ * but hopefuly the infoleak is very minor */
+ rc = lock_pid_trace(dentry, MAY_READ, &task);
+ unlock_pid_trace(task);
+ if (rc)
+ return rc;
+
+ if (pi->real_fops->poll)
+ return pi->real_fops->poll(file, wait);
+
+ /* The fallback for files not implementing poll */
+ return DEFAULT_POLLMASK;
+}
+
+static loff_t proc_pid_perms_llseek(struct file *file, loff_t offset, int orig)
+{
+ struct proc_inode *pi = PROC_I(file->f_dentry->d_inode);
+
+ /* Looks like llseek() doesn't need any ptrace protection */
+
+ if (pi->real_fops->llseek)
+ return pi->real_fops->llseek(file, offset, orig);
+
+ return no_llseek(file, offset, orig);
+}
+
+/*
+ * Global wrapper fops for _ALL_ files inside /proc/PID/.
+ *
+ * It checks the permissions (including ptrace() check) of the current
+ * task on all file ops. The runtime checks protect from holding any fd
+ * across execve() of set*id binaries. task->signal->cred_guard_mutex
+ * is held during read(), readdir(), and write(). open(), release(),
+ * llseek() are not protected as they are harmless anyway. It makes
+ * effort to check permissions on poll(), but it doesn't hold the
+ * mutex during actual fops->poll() as poll() might sleep for much time.
+ *
+ * It makes sense to use proc_pid_perms_fops even for files doing their
+ * own checks (e.g. REG_PERMS()).
+ */
+const struct file_operations proc_pid_perms_fops = {
+ .open = proc_pid_perms_open,
+ .read = proc_pid_perms_read,
+ .readdir = proc_pid_perms_readdir,
+ .write = proc_pid_perms_write,
+ .llseek = proc_pid_perms_llseek,
+ .poll = proc_pid_perms_poll,
+ .release = proc_pid_perms_release,
+};
+
+mode_t get_ent_perms(const struct pid_entry *p,
+ struct proc_perms *perms,
+ int ndir,
+ int nent)
+{
+ mode_t umask = perms->proc_ent_allowed[ndir][nent] ? 0 : 077;
+ return p->mode & ~umask;
+}
+
+int init_proc_perms(struct proc_perms *p)
+{
+ int i;
+
+ p->proc_pid_allowed = true;
+ for (i = 0; i < PROC_PERMS_NMAX; i++) {
+ size_t size = perms_size[i]*sizeof(bool);
+ p->proc_ent_allowed[i] = kzalloc(size, GFP_KERNEL);
+ if (p->proc_ent_allowed[i] == NULL)
+ goto err;
+ }
+
+ return 0;
+err:
+ for (i-- ; i >= 0; i--)
+ kfree(p->proc_ent_allowed[i]);
+ return -ENOMEM;
+}
+
+void free_proc_perms(struct proc_perms *p)
+{
+ int i;
+
+ for (i = 0; i < PROC_PERMS_NMAX; i++)
+ kfree(p->proc_ent_allowed[i]);
+}
+
+void fill_proc_perms_ent(struct proc_perms *p, int ndir, bool val)
+{
+ int i;
+ for (i = 0; i < perms_size[ndir]; i++)
+ p->proc_ent_allowed[ndir][i] = val;
+}
+
+bool set_proc_perms_ent(struct proc_perms *p, int ndir, const char *fname, bool val)
+{
+ int i;
+ for (i = 0; i < perms_size[ndir]; i++)
+ if (!strcmp(dir_ents[ndir][i].name, fname)) {
+ pr_err("found %s (%d)\n", fname, i);
+ p->proc_ent_allowed[ndir][i] = val;
+ return true;
+ }
+
+ pr_err("not found %s (%d)\n", fname, i);
+ return false;
+}
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 74b48cf..022ff46 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -18,6 +18,10 @@
#include <linux/module.h>
#include <linux/sysctl.h>
#include <linux/slab.h>
+#include <linux/pid_namespace.h>
+#include <linux/seq_file.h>
+#include <linux/mount.h>
+
#include <asm/system.h>
#include <asm/uaccess.h>
@@ -69,6 +73,8 @@ static struct inode *proc_alloc_inode(struct super_block *sb)
ei->sysctl_entry = NULL;
ei->ns = NULL;
ei->ns_ops = NULL;
+ ei->real_fops = NULL;
+ ei->dirent = NULL;
inode = &ei->vfs_inode;
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
return inode;
@@ -102,12 +108,25 @@ void __init proc_init_inodecache(void)
init_once);
}
+static int proc_show_options(struct seq_file *seq, struct vfsmount *vfs)
+{
+ struct super_block *sb = vfs->mnt_sb;
+ struct pid_namespace *pid = sb->s_fs_info;
+
+ if (pid->proc_perms.proc_pid_allowed)
+ seq_printf(seq, ",proc_pid_allowed=.");
+
+ return 0;
+}
+
static const struct super_operations proc_sops = {
.alloc_inode = proc_alloc_inode,
.destroy_inode = proc_destroy_inode,
.drop_inode = generic_delete_inode,
.evict_inode = proc_evict_inode,
.statfs = simple_statfs,
+ .remount_fs = proc_remount,
+ .show_options = proc_show_options,
};
static void __pde_users_dec(struct proc_dir_entry *pde)
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 7838e5c..fe9ed27 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -137,11 +137,36 @@ int proc_fill_cache(struct file *filp, void *dirent, filldir_t filldir,
const char *name, int len,
instantiate_t instantiate, struct task_struct *task, const void *ptr);
int pid_revalidate(struct dentry *dentry, struct nameidata *nd);
+extern void __pid_revalidate(struct inode *inode, struct task_struct *task);
struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct *task);
extern const struct dentry_operations pid_dentry_operations;
int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
int proc_setattr(struct dentry *dentry, struct iattr *attr);
+int proc_remount(struct super_block *sb, int *flags, char *data);
extern const struct inode_operations proc_ns_dir_inode_operations;
extern const struct file_operations proc_ns_dir_operations;
+extern const struct inode_operations proc_tgid_base_inode_operations;
+extern const struct inode_operations proc_tid_base_inode_operations;
+extern const struct inode_operations proc_attr_dir_inode_operations;
+
+extern const struct file_operations proc_pid_perms_fops;
+extern const int perms_size[];
+extern const struct pid_entry *dir_ents[];
+
+extern mode_t get_ent_perms(const struct pid_entry *p,
+ struct proc_perms *perms,
+ int ndir,
+ int nent);
+
+struct pid_entry {
+ char *name;
+ int len;
+ mode_t mode;
+ const struct inode_operations *iop;
+ const struct file_operations *fop;
+ union proc_op op;
+ bool need_perms_check;
+};
+
diff --git a/fs/proc/root.c b/fs/proc/root.c
index d6c3b41..9193fa1 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -18,6 +18,7 @@
#include <linux/bitops.h>
#include <linux/mount.h>
#include <linux/pid_namespace.h>
+#include <linux/parser.h>
#include "internal.h"
@@ -36,6 +37,112 @@ static int proc_set_super(struct super_block *sb, void *data)
return err;
}
+enum {
+ Opt_err, Opt_tid_allowed, Opt_tgid_allowed, Opt_attr_allowed,
+};
+
+static const match_table_t tokens = {
+ {Opt_tgid_allowed, "tgid_allowed=%s"},
+ {Opt_tid_allowed, "tid_allowed=%s"},
+ {Opt_attr_allowed, "attr_allowed=%s"},
+ {Opt_err, NULL},
+};
+
+static bool proc_parse_xallowed(struct proc_perms *p, int ndir, char *val, bool set_pid_allowed)
+{
+ char *field;
+
+ fill_proc_perms_ent(p, ndir, false);
+
+ if (strcmp(val, "none") == 0) {
+ if (set_pid_allowed)
+ p->proc_pid_allowed = false;
+ return true;
+ }
+ if (strcmp(val, "all") == 0) {
+ fill_proc_perms_ent(p, ndir, true);
+ if (set_pid_allowed)
+ p->proc_pid_allowed = true;
+ return true;
+ }
+
+ while ((field = strsep(&val, ";")) != NULL) {
+ if (!set_proc_perms_ent(p, ndir, field, true))
+ return false;
+ }
+
+ if (set_pid_allowed)
+ p->proc_pid_allowed = true;
+ return true;
+}
+
+static int proc_parse_options(char *options, struct pid_namespace *pid)
+{
+ char *p, *val;
+ substring_t args[MAX_OPT_ARGS];
+
+ pr_debug("proc: options = %s\n", options);
+
+ if (!options)
+ return 1;
+
+ while ((p = strsep(&options, ",")) != NULL) {
+ int token;
+ if (!*p)
+ continue;
+
+ args[0].to = args[0].from = 0;
+ token = match_token(p, tokens, args);
+ switch (token) {
+ case Opt_tgid_allowed:
+ val = match_strdup(&args[0]);
+ if (!val)
+ return 0;
+ if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NTGID, val, true)) {
+ kfree(val);
+ return 0;
+ }
+ kfree(val);
+ break;
+ case Opt_tid_allowed:
+ val = match_strdup(&args[0]);
+ if (!val)
+ return 0;
+ if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NTID, val, false)) {
+ kfree(val);
+ return 0;
+ }
+ kfree(val);
+ break;
+ case Opt_attr_allowed:
+ val = match_strdup(&args[0]);
+ if (!val)
+ return 0;
+ if (!proc_parse_xallowed(&pid->proc_perms, PROC_PERMS_NATTR, val, false)) {
+ kfree(val);
+ return 0;
+ }
+ kfree(val);
+ break;
+ default:
+ pr_err("proc: unrecognized mount option \"%s\" "
+ "or missing value", p);
+ return 0;
+ }
+ }
+
+ //pr_debug("proc: gid = %u, hidepid = %o, hidenet = %d\n", pid->pid_gid, pid->hide_pid, (int)pid->hide_net);
+
+ return 1;
+}
+
+int proc_remount(struct super_block *sb, int *flags, char *data)
+{
+ struct pid_namespace *pid = sb->s_fs_info;
+ return !proc_parse_options(data, pid);
+}
+
+
static struct dentry *proc_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
@@ -43,11 +150,16 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
struct super_block *sb;
struct pid_namespace *ns;
struct proc_inode *ei;
+ char *options;
- if (flags & MS_KERNMOUNT)
+
+ if (flags & MS_KERNMOUNT) {
ns = (struct pid_namespace *)data;
- else
+ options = NULL;
+ } else {
ns = current->nsproxy->pid_ns;
+ options = data;
+ }
sb = sget(fs_type, proc_test_super, proc_set_super, ns);
if (IS_ERR(sb))
@@ -55,6 +167,10 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
if (!sb->s_root) {
sb->s_flags = flags;
+ if (!proc_parse_options(options, ns)) {
+ deactivate_locked_super(sb);
+ return ERR_PTR(-EINVAL);
+ }
err = proc_fill_super(sb);
if (err) {
deactivate_locked_super(sb);
@@ -74,6 +190,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
return dget(sb->s_root);
}
+
static void proc_kill_sb(struct super_block *sb)
{
struct pid_namespace *ns;
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 980de54..da98ba2 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -201,7 +201,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
if (!priv->task)
return ERR_PTR(-ESRCH);
- mm = mm_for_maps(priv->task);
+ mm = get_task_mm(priv->task);
if (!mm || IS_ERR(mm)) {
put_task_struct(priv->task);
priv->task = NULL;
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 38d1032..ed10329 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -16,6 +16,18 @@ struct pidmap {
struct bsd_acct_struct;
+enum {
+ PROC_PERMS_NTID,
+ PROC_PERMS_NTGID,
+ PROC_PERMS_NATTR,
+ PROC_PERMS_NMAX
+};
+
+struct proc_perms {
+ bool proc_pid_allowed;
+ bool *proc_ent_allowed[PROC_PERMS_NMAX];
+};
+
struct pid_namespace {
struct kref kref;
struct pidmap pidmap[PIDMAP_ENTRIES];
@@ -30,6 +42,7 @@ struct pid_namespace {
#ifdef CONFIG_BSD_PROCESS_ACCT
struct bsd_acct_struct *bacct;
#endif
+ struct proc_perms proc_perms;
};
extern struct pid_namespace init_pid_ns;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index e7576cf..a1525d7 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -115,10 +115,17 @@ struct proc_dir_entry *proc_create_data(const char *name, mode_t mode,
extern void remove_proc_entry(const char *name, struct proc_dir_entry *parent);
struct pid_namespace;
+struct proc_perms;
extern int pid_ns_prepare_proc(struct pid_namespace *ns);
extern void pid_ns_release_proc(struct pid_namespace *ns);
+extern int init_proc_perms(struct proc_perms *p);
+extern void free_proc_perms(struct proc_perms *p);
+
+extern void fill_proc_perms_ent(struct proc_perms *p, int ndir, bool val);
+extern bool set_proc_perms_ent(struct proc_perms *p, int ndir, const char *fname, bool val);
+
/*
* proc_tty.c
*/
@@ -267,6 +274,11 @@ struct proc_inode {
struct pid *pid;
int fd;
union proc_op op;
+
+ /* Used by files inside /proc/PID/ only */
+ const struct file_operations *real_fops;
+ const void *dirent;
+
struct proc_dir_entry *pde;
struct ctl_table_header *sysctl;
struct ctl_table *sysctl_entry;
diff --git a/kernel/pid.c b/kernel/pid.c
index 57a8346..ffa8b37 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -565,6 +565,8 @@ void __init pidmap_init(void)
set_bit(0, init_pid_ns.pidmap[0].page);
atomic_dec(&init_pid_ns.pidmap[0].nr_free);
+ BUG_ON(init_proc_perms(&init_pid_ns.proc_perms));
+
init_pid_ns.pid_cachep = KMEM_CACHE(pid,
SLAB_HWCACHE_ALIGN | SLAB_PANIC);
}
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index e9c9adc..ab4261c 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -97,12 +97,18 @@ static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_p
for (i = 1; i < PIDMAP_ENTRIES; i++)
atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
- err = pid_ns_prepare_proc(ns);
+ err = init_proc_perms(&ns->proc_perms);
if (err)
goto out_put_parent_pid_ns;
+ err = pid_ns_prepare_proc(ns);
+ if (err)
+ goto out_free_proc_perms;
+
return ns;
+out_free_proc_perms:
+ free_proc_perms(&ns->proc_perms);
out_put_parent_pid_ns:
put_pid_ns(parent_pid_ns);
out_free_map:
@@ -119,6 +125,8 @@ static void destroy_pid_namespace(struct pid_namespace *ns)
for (i = 0; i < PIDMAP_ENTRIES; i++)
kfree(ns->pidmap[i].page);
+
+ free_proc_perms(&ns->proc_perms);
kmem_cache_free(pid_ns_cachep, ns);
}
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f175d98..745b71e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -57,6 +57,7 @@
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
#include <linux/kmod.h>
+#include <linux/pid_namespace.h>
#include <asm/uaccess.h>
#include <asm/processor.h>
--
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [kernel-hardening] Re: procfs {tid,tgid,attr}_allowed mount options
2011-08-04 11:23 ` [kernel-hardening] " Vasiliy Kulikov
@ 2011-08-10 10:02 ` Vasiliy Kulikov
2011-08-10 11:22 ` [kernel-hardening] " Solar Designer
2011-08-10 11:25 ` Solar Designer
1 sibling, 1 reply; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-08-10 10:02 UTC (permalink / raw)
To: kernel-hardening
Solar,
On Thu, Aug 04, 2011 at 15:23 +0400, Vasiliy Kulikov wrote:
> New version. Cleanups/fixes here and there.
One question: do we really need gid= option? The only user I know is
identd, but does anybody use it nowadays?
With gid= I see 2 drawbacks:
1) Code becomes worse because of additional permission checks.
2) From the upstream's point of view it is very limited and unextendable
feature.
So, I'd go further without gid=, at least for the beginning.
Thanks,
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options
2011-08-10 10:02 ` Vasiliy Kulikov
@ 2011-08-10 11:22 ` Solar Designer
0 siblings, 0 replies; 29+ messages in thread
From: Solar Designer @ 2011-08-10 11:22 UTC (permalink / raw)
To: kernel-hardening
On Wed, Aug 10, 2011 at 02:02:27PM +0400, Vasiliy Kulikov wrote:
> One question: do we really need gid= option? The only user I know is
> identd, but does anybody use it nowadays?
identd is mostly obsolete, but I am using gid= with 2.4.x-ow kernels to
let a group of sysadmins see all users' processes and network
connections without having to use su. In fact, I use it on my very own
computers - again, to let my main desktop user account see everything,
while not letting my pseudo-user accounts (that I use for things such as
a web browser) also see everything (they're not in group proc).
> With gid= I see 2 drawbacks:
>
> 1) Code becomes worse because of additional permission checks.
>
> 2) From the upstream's point of view it is very limited and unextendable
> feature.
This feature is precisely what I needed and used for over a decade.
I never needed more flexibility. Maybe this says something.
> So, I'd go further without gid=, at least for the beginning.
I don't know what works best for upstream acceptance in the beginning,
but I definitely want this feature to get in, and it is a must for Owl.
Thanks,
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options
2011-08-04 11:23 ` [kernel-hardening] " Vasiliy Kulikov
2011-08-10 10:02 ` Vasiliy Kulikov
@ 2011-08-10 11:25 ` Solar Designer
2011-08-10 12:04 ` Vasiliy Kulikov
1 sibling, 1 reply; 29+ messages in thread
From: Solar Designer @ 2011-08-10 11:25 UTC (permalink / raw)
To: kernel-hardening
Vasiliy,
On Thu, Aug 04, 2011 at 03:23:31PM +0400, Vasiliy Kulikov wrote:
> New version. Cleanups/fixes here and there.
>
> It lacks net/ restriction, but IMO it is already complicated enough
> (more than 500 new lines). Such (relatively) simple thing as
> net_allowed= is an additional care. I'd achieve at least processes
> restrictions in upstream, after it will come networking.
That's a lot of code already. I tried reviewing it, but I felt that I
need an English description of its functionality first. Perhaps you're
preparing one for posting to LKML anyway - can you please post it in
here first?
Thanks,
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options
2011-08-10 11:25 ` Solar Designer
@ 2011-08-10 12:04 ` Vasiliy Kulikov
2011-08-10 13:34 ` Solar Designer
0 siblings, 1 reply; 29+ messages in thread
From: Vasiliy Kulikov @ 2011-08-10 12:04 UTC (permalink / raw)
To: kernel-hardening
Solar,
On Wed, Aug 10, 2011 at 15:25 +0400, Solar Designer wrote:
> On Thu, Aug 04, 2011 at 03:23:31PM +0400, Vasiliy Kulikov wrote:
> > New version. Cleanups/fixes here and there.
> >
> > It lacks net/ restriction, but IMO it is already complicated enough
> > (more than 500 new lines). Such (relatively) simple thing as
> > net_allowed= is an additional care. I'd achieve at least processes
> > restrictions in upstream, after it will come networking.
>
> That's a lot of code already.
Yes, so I'd try to get comments from net-less and gid-less patch first.
Probably the whole way of permissions handling would be ridicules for
kernel forlk :)
> I tried reviewing it, but I felt that I
> need an English description of its functionality first. Perhaps you're
> preparing one for posting to LKML anyway - can you please post it in
> here first?
Something like this:
This patch adds support of pid_allowed=XX and attr_allowed=YY mount
options for procfs. When set, all /proc/PID/ files are restricted to
the owner, except filenames passed via pid_allowed= argument. E.g. with
pid_allowed=sched sched file would be world readable and other files
would be restricted to the task owner. The same for /proc/PID/attr/
files and attr_allowed=YY.
The new struct file_operations proc_pid_perms_fops was created to deal
with permissions keeping in mind pid_allowed=. It is a wrapper for all
/proc/PID/* and /proc/PID/task/TID/* file_operations. It checks
current task permissions against the target task on each struct file
access to avoid races against execve().
All pid_entry's gain the following permission checking algorithm by
default:
1) if the filename is not passed via pid_allowed= then ptrace check is
applied.
2) Otherwise, if file POSIX permissions deny access to the world, ptrace
check is applied.
3) Otherwise, generic permission checking scheme is applied. For procfs
files it additionally includes ptrace check if current euid and the
procfs inode uid are equal.
For the simplicity meta values are allowed: "none" and "all". "all"
means all files are a subject of (2) and (3) checks. "none" means all
files are a subject of (1) check. "none" additionally restricts access
to the /proc/PID/ directory itself.
Old ptrace checks are removed from the handlers as they are now obsoleted by
proc_pid_perms_fops checks. Handlers may further restrict the
permission model by introducing additional checks. For some files the
ptrace check is too strong (environ, auxv, maps, numa_maps, and smaps),
so no implicit permission checking is applied for them. These files
should use pid_entry macros with "_PERMS" suffix.
Only /proc/PID/, /proc/PID/*, /proc/PID/task/TID/, /proc/PID/task/TID/*,
/proc/PID/attr/* files' permission checking is changed. fd/ and fdinfo/
are outstanding files and check the permissions on their own.
Thanks,
---
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options
2011-08-10 12:04 ` Vasiliy Kulikov
@ 2011-08-10 13:34 ` Solar Designer
2011-08-12 18:14 ` Simon Marechal
0 siblings, 1 reply; 29+ messages in thread
From: Solar Designer @ 2011-08-10 13:34 UTC (permalink / raw)
To: kernel-hardening
On Wed, Aug 10, 2011 at 04:04:39PM +0400, Vasiliy Kulikov wrote:
> On Wed, Aug 10, 2011 at 15:25 +0400, Solar Designer wrote:
> > That's a lot of code already.
>
> Yes, so I'd try to get comments from net-less and gid-less patch first.
> Probably the whole way of permissions handling would be ridicules for
> kernel forlk :)
I feel that you may have treated Andrew Morton's suggestion too
seriously. I think he was hoping for something not only more general,
but also simple.
With your invasive changes, even checking for (lack of) potential new
vulnerabilities (such as lack of ptrace check where it previously existed
and was needed) feels non-trivial. Or maybe I just did not look closely
enough.
Perhaps run this by LKML as RFC and see what they think? And be willing
to revert to your old approach, with more hard-coding, now that you have
this arguably overly complicated alternative. Maybe it will convince
Andrew Morton that something simpler and less flexible would be better.
Thanks,
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [kernel-hardening] procfs {tid,tgid,attr}_allowed mount options
2011-08-10 13:34 ` Solar Designer
@ 2011-08-12 18:14 ` Simon Marechal
0 siblings, 0 replies; 29+ messages in thread
From: Simon Marechal @ 2011-08-12 18:14 UTC (permalink / raw)
To: kernel-hardening
Le 10/08/2011 15:34, Solar Designer a écrit :
> Perhaps run this by LKML as RFC and see what they think? And be willing
> to revert to your old approach, with more hard-coding, now that you have
> this arguably overly complicated alternative. Maybe it will convince
> Andrew Morton that something simpler and less flexible would be better.
Just my opinion, but the gid option is simple and to the point. More
complex solution will likely :
* not be used at all
* not be relevant to people with very specific needs anyway
* introduce bugs and/or vulnerabilities, either from the code or from
misconfigurations
Point #2 is important. Very specific needs should not be addressed in
this specific patch, it should be configured in something with a global
scope, such as a LSM.
I believe having effective security systems enabled by default is more
important than having generalistic and configurable systems nobody care
about. For example, being able to let a process choose the set of system
calls it should use is more useful to me than having SELinux loaded.
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2011-08-12 18:14 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20110603191153.GB514@openwall.com>
2011-06-04 5:47 ` [kernel-hardening] Re: procfs mount options Vasiliy Kulikov
2011-06-04 13:20 ` [kernel-hardening] " Solar Designer
2011-06-04 20:09 ` Vasiliy Kulikov
2011-06-04 20:59 ` Solar Designer
2011-06-05 18:24 ` [kernel-hardening] [RFC v1] " Vasiliy Kulikov
2011-06-05 19:26 ` Solar Designer
2011-06-05 19:47 ` Vasiliy Kulikov
2011-06-05 20:10 ` Solar Designer
2011-06-06 18:08 ` Vasiliy Kulikov
2011-06-06 18:33 ` Solar Designer
2011-06-08 17:23 ` [kernel-hardening] [RFC v2] " Vasiliy Kulikov
2011-06-08 17:43 ` Vasiliy Kulikov
2011-06-12 2:39 ` Solar Designer
2011-07-24 18:55 ` Vasiliy Kulikov
[not found] ` <20110724185036.GC3510@albatros>
2011-07-26 14:50 ` Vasiliy Kulikov
2011-07-29 17:47 ` [kernel-hardening] procfs {tid,tgid,attr}_allowed " Vasiliy Kulikov
2011-08-04 11:23 ` [kernel-hardening] " Vasiliy Kulikov
2011-08-10 10:02 ` Vasiliy Kulikov
2011-08-10 11:22 ` [kernel-hardening] " Solar Designer
2011-08-10 11:25 ` Solar Designer
2011-08-10 12:04 ` Vasiliy Kulikov
2011-08-10 13:34 ` Solar Designer
2011-08-12 18:14 ` Simon Marechal
2011-06-06 19:20 ` [kernel-hardening] [RFC v1] procfs " Vasiliy Kulikov
2011-06-05 19:17 ` [kernel-hardening] " Vasiliy Kulikov
2011-06-05 19:40 ` Solar Designer
2011-06-05 19:53 ` Vasiliy Kulikov
2011-06-05 18:36 ` [kernel-hardening] Re: [owl-dev] " Vasiliy Kulikov
2011-06-05 18:47 ` [kernel-hardening] " Solar Designer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox