public inbox for kernel-hardening@lists.openwall.com
 help / color / mirror / Atom feed
* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
       [not found]             ` <20110901080508.GF30615@sun>
@ 2011-09-02 16:37               ` Vasiliy Kulikov
  2011-09-05 18:53                 ` Vasiliy Kulikov
  0 siblings, 1 reply; 25+ messages in thread
From: Vasiliy Kulikov @ 2011-09-02 16:37 UTC (permalink / raw)
  To: Cyrill Gorcunov, Andrew Morton
  Cc: Kirill A. Shutemov, containers, linux-kernel, linux-fsdevel,
	Nathan Lynch, kernel-hardening, Oren Laadan, Daniel Lezcano,
	Glauber Costa, James Bottomley, Tejun Heo, Alexey Dobriyan,
	Al Viro, Pavel Emelyanov

Hi,

On Thu, Sep 01, 2011 at 12:05 +0400, Cyrill Gorcunov wrote:
> ...
> > > +/*
> > > + * NOTE: The getattr/setattr for both /proc/$pid/map_files and
> > > + * /proc/$pid/fd seems to have share the code, so need to be
> > > + * unified and code duplication eliminated!
> > 
> > Why not do this now?
> 
> There are a couple of reasons. Yesterday I was talking to
> Vasiliy Kulikov about this snippet, so he seems about to send
> you patches related to /proc/$pid/fd update, and after those
> patches will be merged we are to drop code duplication.
> Vasiliy, what the status of the update?

It looks like protecting directories with sensible contents is a nasty
thing.  The problem here is that if the dentry is present in the cache,
->lookup() is not called at all and the permissions can be checked in
fop/dop/iop specific handler (getattr(), readlink(), etc.).  However, it
would be much simplier to hook ->lookup() only.  Otherwise, we have to
define procfs handlers for all operations, which don't call
->d_revalidate().

Is it possible to disable caching dentry for specific files?  It is not
performace critical thing in fd and map_files and it would much simplify
the task.  Creating handlers for all these op handler bloats procfs.

Also I'm not sure what other handlers might reveal dentry presence.
Besides ->getattr() I could find only one thing - ->link() (Cyrill,
AFAICS ->setattr() doesn't reveal files' presence).  Someone more
familiar with vfs than me - please, help to identify all infoleak
sources!


Thank you,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-02 16:37               ` [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6 Vasiliy Kulikov
@ 2011-09-05 18:53                 ` Vasiliy Kulikov
  2011-09-05 19:20                   ` Cyrill Gorcunov
  0 siblings, 1 reply; 25+ messages in thread
From: Vasiliy Kulikov @ 2011-09-05 18:53 UTC (permalink / raw)
  To: Cyrill Gorcunov, Andrew Morton
  Cc: Kirill A. Shutemov, containers, linux-kernel, linux-fsdevel,
	Nathan Lynch, kernel-hardening, Oren Laadan, Daniel Lezcano,
	Glauber Costa, James Bottomley, Tejun Heo, Alexey Dobriyan,
	Al Viro, Pavel Emelyanov

On Fri, Sep 02, 2011 at 20:37 +0400, Vasiliy Kulikov wrote:
> On Thu, Sep 01, 2011 at 12:05 +0400, Cyrill Gorcunov wrote:
> > ...
> > > > +/*
> > > > + * NOTE: The getattr/setattr for both /proc/$pid/map_files and
> > > > + * /proc/$pid/fd seems to have share the code, so need to be
> > > > + * unified and code duplication eliminated!
> > > 
> > > Why not do this now?
> > 
> > There are a couple of reasons. Yesterday I was talking to
> > Vasiliy Kulikov about this snippet, so he seems about to send
> > you patches related to /proc/$pid/fd update, and after those
> > patches will be merged we are to drop code duplication.
> > Vasiliy, what the status of the update?
> 
> It looks like protecting directories with sensible contents is a nasty
> thing.  The problem here is that if the dentry is present in the cache,
> ->lookup() is not called at all and the permissions can be checked in
> fop/dop/iop specific handler (getattr(), readlink(), etc.).  However, it
> would be much simplier to hook ->lookup() only.  Otherwise, we have to
> define procfs handlers for all operations, which don't call
> ->d_revalidate().
> 
> Is it possible to disable caching dentry for specific files?  It is not
> performace critical thing in fd and map_files and it would much simplify
> the task.  Creating handlers for all these op handler bloats procfs.

Looks like the following patch solves the problem.  Tested on stat() and
link().

diff --git a/fs/proc/base.c b/fs/proc/base.c
index d44c701..219588b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1665,46 +1665,12 @@ out:
 	return error;
 }
 
-static int proc_pid_fd_link_getattr(struct vfsmount *mnt, struct dentry *dentry,
-		struct kstat *stat)
-{
-	struct inode *inode = dentry->d_inode;
-	struct task_struct *task = get_proc_task(inode);
-	int rc;
-
-	if (task == NULL)
-		return -ESRCH;
-
-	rc = -EACCES;
-	if (lock_trace(task))
-		goto out_task;
-
-	generic_fillattr(inode, stat);
-	unlock_trace(task);
-	rc = 0;
-out_task:
-	put_task_struct(task);
-	return rc;
-}
-
 static const struct inode_operations proc_pid_link_inode_operations = {
 	.readlink	= proc_pid_readlink,
 	.follow_link	= proc_pid_follow_link,
 	.setattr	= proc_setattr,
 };
 
-static const struct inode_operations proc_fdinfo_link_inode_operations = {
-	.setattr	= proc_setattr,
-	.getattr	= proc_pid_fd_link_getattr,
-};
-
-static const struct inode_operations proc_fd_link_inode_operations = {
-	.readlink	= proc_pid_readlink,
-	.follow_link	= proc_pid_follow_link,
-	.setattr	= proc_setattr,
-	.getattr	= proc_pid_fd_link_getattr,
-};
-
 
 /* building an inode */
 
@@ -2044,9 +2010,18 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
 	return 0;
 }
 
+static int pid_no_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
+	if (nd && nd->flags & LOOKUP_RCU)
+		return -ECHILD;
+
+	d_drop(dentry);
+	return 0;
+}
+
 static const struct dentry_operations tid_fd_dentry_operations =
 {
-	.d_revalidate	= tid_fd_revalidate,
+	.d_revalidate	= pid_no_revalidate,
 	.d_delete	= pid_delete_dentry,
 };
 
@@ -2085,7 +2060,7 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
 	spin_unlock(&files->file_lock);
 	put_files_struct(files);
 
-	inode->i_op = &proc_fd_link_inode_operations;
+	inode->i_op = &proc_pid_link_inode_operations;
 	inode->i_size = 64;
 	ei->op.proc_get_link = proc_fd_link;
 	d_set_d_op(dentry, &tid_fd_dentry_operations);
@@ -2267,7 +2242,6 @@ static struct dentry *proc_fdinfo_instantiate(struct inode *dir,
 	ei->fd = fd;
 	inode->i_mode = S_IFREG | S_IRUSR;
 	inode->i_fop = &proc_fdinfo_file_operations;
-	inode->i_op = &proc_fdinfo_link_inode_operations;
 	d_set_d_op(dentry, &tid_fd_dentry_operations);
 	d_add(dentry, inode);
 	/* Close the race of the process dying before we return the dentry */
-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-05 18:53                 ` Vasiliy Kulikov
@ 2011-09-05 19:20                   ` Cyrill Gorcunov
  2011-09-05 19:49                     ` Vasiliy Kulikov
  0 siblings, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2011-09-05 19:20 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: Andrew Morton, Kirill A. Shutemov, containers, linux-kernel,
	linux-fsdevel, Nathan Lynch, kernel-hardening, Oren Laadan,
	Daniel Lezcano, Glauber Costa, James Bottomley, Tejun Heo,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Mon, Sep 05, 2011 at 10:53:58PM +0400, Vasiliy Kulikov wrote:
...
>  
> +static int pid_no_revalidate(struct dentry *dentry, struct nameidata *nd)
> +{
> +	if (nd && nd->flags & LOOKUP_RCU)
> +		return -ECHILD;
> +
> +	d_drop(dentry);
> +	return 0;
> +}
> +

Thanks Vasiliy! So every lookup will cause dcache to drop previous cached
entry and alloc and hash new one instead, pretty dramatic, espec in case
of huge number of files mapped ;) Still since it's not time critical operation
(at least for now) I tend to agree.

	Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-05 19:20                   ` Cyrill Gorcunov
@ 2011-09-05 19:49                     ` Vasiliy Kulikov
  2011-09-05 20:36                       ` Cyrill Gorcunov
  0 siblings, 1 reply; 25+ messages in thread
From: Vasiliy Kulikov @ 2011-09-05 19:49 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andrew Morton, Kirill A. Shutemov, containers, linux-kernel,
	linux-fsdevel, Nathan Lynch, kernel-hardening, Oren Laadan,
	Daniel Lezcano, Glauber Costa, James Bottomley, Tejun Heo,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Mon, Sep 05, 2011 at 23:20 +0400, Cyrill Gorcunov wrote:
> On Mon, Sep 05, 2011 at 10:53:58PM +0400, Vasiliy Kulikov wrote:
> ...
> >  
> > +static int pid_no_revalidate(struct dentry *dentry, struct nameidata *nd)
> > +{
> > +	if (nd && nd->flags & LOOKUP_RCU)
> > +		return -ECHILD;
> > +
> > +	d_drop(dentry);
> > +	return 0;
> > +}
> > +
> 
> Thanks Vasiliy! So every lookup will cause dcache to drop previous cached
> entry and alloc and hash new one instead, pretty dramatic, espec in case
> of huge number of files mapped ;) Still since it's not time critical operation
> (at least for now) I tend to agree.

Actually, it can be speed up by introducing the same ptrace check.  If
ptrace check fails, then just drop the dentry, otherwise continue to use
it.  Then each revalidate would trigger ptrace check instead of full
drop-lookup-alloc cycle.  If one process actively looks into
map_files/ or fd/, it will not become significantly slower.  However, it
will trigger 2 capable() fail alerts in ptrace_may_access() instead of
one :)


But I still see one very nasty issue - one may trigger this ptrace check,
trigger d_drop() and then look at /proc/slabinfo at "dentry" row.  If
the number has changed, then the interested dentry existed before the
revalidate call.  This infoleak is tricky to fix without any race.

Probably it's time to close /proc/slabinfo infoleak?


Thanks,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-05 19:49                     ` Vasiliy Kulikov
@ 2011-09-05 20:36                       ` Cyrill Gorcunov
  2011-09-06 10:15                         ` Vasiliy Kulikov
  0 siblings, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2011-09-05 20:36 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: Andrew Morton, Kirill A. Shutemov, containers, linux-kernel,
	linux-fsdevel, Nathan Lynch, kernel-hardening, Oren Laadan,
	Daniel Lezcano, Glauber Costa, James Bottomley, Tejun Heo,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Mon, Sep 05, 2011 at 11:49:08PM +0400, Vasiliy Kulikov wrote:
...
> 
> Actually, it can be speed up by introducing the same ptrace check.  If
> ptrace check fails, then just drop the dentry, otherwise continue to use
> it.  Then each revalidate would trigger ptrace check instead of full
> drop-lookup-alloc cycle.  If one process actively looks into
> map_files/ or fd/, it will not become significantly slower.  However, it
> will trigger 2 capable() fail alerts in ptrace_may_access() instead of
> one :)

Hmm, at least it's better than trashing dcache I think.

> 
> But I still see one very nasty issue - one may trigger this ptrace check,
> trigger d_drop() and then look at /proc/slabinfo at "dentry" row.  If
> the number has changed, then the interested dentry existed before the
> revalidate call.  This infoleak is tricky to fix without any race.
> 
> Probably it's time to close /proc/slabinfo infoleak? 
> 

Actually I miss to see how exactly this infoleak can be used by attacker
or whoever. So, Vasiliy, what the security issue there?

	Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-05 20:36                       ` Cyrill Gorcunov
@ 2011-09-06 10:15                         ` Vasiliy Kulikov
  2011-09-06 16:51                           ` Tejun Heo
  0 siblings, 1 reply; 25+ messages in thread
From: Vasiliy Kulikov @ 2011-09-06 10:15 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andrew Morton, Kirill A. Shutemov, containers, linux-kernel,
	linux-fsdevel, Nathan Lynch, kernel-hardening, Oren Laadan,
	Daniel Lezcano, Glauber Costa, James Bottomley, Tejun Heo,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Tue, Sep 06, 2011 at 00:36 +0400, Cyrill Gorcunov wrote:
> > But I still see one very nasty issue - one may trigger this ptrace check,
> > trigger d_drop() and then look at /proc/slabinfo at "dentry" row.  If
> > the number has changed, then the interested dentry existed before the
> > revalidate call.  This infoleak is tricky to fix without any race.
> > 
> > Probably it's time to close /proc/slabinfo infoleak? 
> > 
> 
> Actually I miss to see how exactly this infoleak can be used by attacker
> or whoever. So, Vasiliy, what the security issue there?

The security model of procfs is: /proc/PID/fd/ is available to users
that may ptrace PID only.  Particularly, the number of opened file
descriptors is a private information.  If other task that may not ptrace
PID is able to get this information, this is an issue.  Keeping opened
file descriptor of /proc/PID/fd/ and exec'ing some setxid binary as PID
might lead to the infoleak.  It can be used in certain rare cases when
the knowledge of whether specific fd is opened/closed gains some
important information, e.g. whether some security check has
failed/succeeded (which is indirectly signaled by the kept fd).  As for
map_files/ it may reveal ASLR offsets (but only some bits, not all of
them, I guess).

Without dropping denries it can be identified by calling stat() or
link() against dentries existing in the cache.  In more details:

1) an attacker has a task with pid=PID with many opened fds.

2) Other task (PID2) opens /proc/PID/fd/ and fills the dentry cache.
Now dcache contains procfs entries for file descriptors of PID.

3) PID execve's setxid binary.  (From this point PID2 should not get
_any_ information about /proc/PID/fd/, but this rule is violated in (4).)

4) PID2 does something to learn whether any fd of PID is opened/closed.

  a) before "proc: fix races against execve() of /proc/PID/fd**" patch
     PID2 could simply do getdents() against kept file descriptor of
     /proc/PID/fd and get the list of opened fds.

  b) Without dentry dropping on each access PID2 could use link(2) to
     read /proc/PID/fd/* dentries from dcache.  As they are in the
     dcache since (2), ptrace check from ->lookup() is not applied.

  c) If dentry is lazily dropped on each access attempt (or each illegal
     access) then PID2 can:

     i) read dentry line of /proc/slabinfo
     ii) call link(2) against /proc/PID/fd, which invalidates the
         specific dentry
     iii) re-read dentry line of /proc/slabinfo.  If it has decreased by
         one, the dentry existed before (ii).


Is it possible to either allocate already dropped dentry or to force
->lookup() without invalidating dentry?  The latter would potentially
pollute the dchache, though.


Thanks,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-06 10:15                         ` Vasiliy Kulikov
@ 2011-09-06 16:51                           ` Tejun Heo
  2011-09-06 17:29                             ` Vasiliy Kulikov
  0 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2011-09-06 16:51 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: Cyrill Gorcunov, Andrew Morton, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

Hello, Vasiliy.

On Tue, Sep 06, 2011 at 02:15:18PM +0400, Vasiliy Kulikov wrote:
>   c) If dentry is lazily dropped on each access attempt (or each illegal
>      access) then PID2 can:
> 
>      i) read dentry line of /proc/slabinfo
>      ii) call link(2) against /proc/PID/fd, which invalidates the
>          specific dentry
>      iii) re-read dentry line of /proc/slabinfo.  If it has decreased by
>          one, the dentry existed before (ii).

If we really worry about this, probably the right thing to do is
hiding slabinfo from mortal UIDs instead of worrying about what
exactly are freed or not from each user.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-06 16:51                           ` Tejun Heo
@ 2011-09-06 17:29                             ` Vasiliy Kulikov
  2011-09-06 17:33                               ` Tejun Heo
  0 siblings, 1 reply; 25+ messages in thread
From: Vasiliy Kulikov @ 2011-09-06 17:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Cyrill Gorcunov, Andrew Morton, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

Hi Tejun,

On Wed, Sep 07, 2011 at 01:51 +0900, Tejun Heo wrote:
> On Tue, Sep 06, 2011 at 02:15:18PM +0400, Vasiliy Kulikov wrote:
> >   c) If dentry is lazily dropped on each access attempt (or each illegal
> >      access) then PID2 can:
> > 
> >      i) read dentry line of /proc/slabinfo
> >      ii) call link(2) against /proc/PID/fd, which invalidates the
> >          specific dentry
> >      iii) re-read dentry line of /proc/slabinfo.  If it has decreased by
> >          one, the dentry existed before (ii).
> 
> If we really worry about this, probably the right thing to do is
> hiding slabinfo from mortal UIDs instead of worrying about what
> exactly are freed or not from each user.

I agree with you.  I don't think that showing system-global debug
information to all users by default is the right thing.  But some people
doesn't agree with this point of view:

http://thread.gmane.org/gmane.linux.kernel/1108378

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-06 17:29                             ` Vasiliy Kulikov
@ 2011-09-06 17:33                               ` Tejun Heo
  2011-09-06 18:15                                 ` Cyrill Gorcunov
  2011-09-07 11:23                                 ` Vasiliy Kulikov
  0 siblings, 2 replies; 25+ messages in thread
From: Tejun Heo @ 2011-09-06 17:33 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: Cyrill Gorcunov, Andrew Morton, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

Hello,

On Tue, Sep 06, 2011 at 09:29:52PM +0400, Vasiliy Kulikov wrote:
> I agree with you.  I don't think that showing system-global debug
> information to all users by default is the right thing.  But some people
> doesn't agree with this point of view:
> 
> http://thread.gmane.org/gmane.linux.kernel/1108378

Yeap, I know there are two sides of the discussion but if one takes
the position that hiding such global debug info is more harmful, it's
only crazier to hide such information from each individual users of
the said global facility.  So, let's just forget about information
leak via freeing or not freeing here.  It's the wrong battle field.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-06 17:33                               ` Tejun Heo
@ 2011-09-06 18:15                                 ` Cyrill Gorcunov
  2011-09-07 11:23                                 ` Vasiliy Kulikov
  1 sibling, 0 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2011-09-06 18:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Vasiliy Kulikov, Andrew Morton, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Wed, Sep 07, 2011 at 02:33:41AM +0900, Tejun Heo wrote:
...
> 
> Yeap, I know there are two sides of the discussion but if one takes
> the position that hiding such global debug info is more harmful, it's
> only crazier to hide such information from each individual users of
> the said global facility.  So, let's just forget about information
> leak via freeing or not freeing here.  It's the wrong battle field.
>

Heh, I definitely didn't expect all this would turn out into slabinfo ;)
 
	Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-06 17:33                               ` Tejun Heo
  2011-09-06 18:15                                 ` Cyrill Gorcunov
@ 2011-09-07 11:23                                 ` Vasiliy Kulikov
  2011-09-07 21:53                                   ` Cyrill Gorcunov
  1 sibling, 1 reply; 25+ messages in thread
From: Vasiliy Kulikov @ 2011-09-07 11:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Cyrill Gorcunov, Andrew Morton, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

Hi,

On Wed, Sep 07, 2011 at 02:33 +0900, Tejun Heo wrote:
> On Tue, Sep 06, 2011 at 09:29:52PM +0400, Vasiliy Kulikov wrote:
> > I agree with you.  I don't think that showing system-global debug
> > information to all users by default is the right thing.  But some people
> > doesn't agree with this point of view:
> > 
> > http://thread.gmane.org/gmane.linux.kernel/1108378
> 
> Yeap, I know there are two sides of the discussion but if one takes
> the position that hiding such global debug info is more harmful, it's
> only crazier to hide such information from each individual users of
> the said global facility.  So, let's just forget about information
> leak via freeing or not freeing here.  It's the wrong battle field.

Andrew, are you OK with closing the hole with pid_no_revalidate()
and 0600 /proc/slabinfo?  If so, I feel I have to start this discussion
with people participating in the discussion above: Theodore, Dan, Linus, etc.

Thanks,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-07 11:23                                 ` Vasiliy Kulikov
@ 2011-09-07 21:53                                   ` Cyrill Gorcunov
  2011-09-07 22:13                                     ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2011-09-07 21:53 UTC (permalink / raw)
  To: Vasiliy Kulikov, Andrew Morton
  Cc: Tejun Heo, Kirill A. Shutemov, containers, linux-kernel,
	linux-fsdevel, Nathan Lynch, kernel-hardening, Oren Laadan,
	Daniel Lezcano, Glauber Costa, James Bottomley, Alexey Dobriyan,
	Al Viro, Pavel Emelyanov

On Wed, Sep 07, 2011 at 03:23:01PM +0400, Vasiliy Kulikov wrote:
> Hi,
> 
> On Wed, Sep 07, 2011 at 02:33 +0900, Tejun Heo wrote:
> > On Tue, Sep 06, 2011 at 09:29:52PM +0400, Vasiliy Kulikov wrote:
> > > I agree with you.  I don't think that showing system-global debug
> > > information to all users by default is the right thing.  But some people
> > > doesn't agree with this point of view:
> > > 
> > > http://thread.gmane.org/gmane.linux.kernel/1108378
> > 
> > Yeap, I know there are two sides of the discussion but if one takes
> > the position that hiding such global debug info is more harmful, it's
> > only crazier to hide such information from each individual users of
> > the said global facility.  So, let's just forget about information
> > leak via freeing or not freeing here.  It's the wrong battle field.
> 
> Andrew, are you OK with closing the hole with pid_no_revalidate()
> and 0600 /proc/slabinfo?  If so, I feel I have to start this discussion
> with people participating in the discussion above: Theodore, Dan, Linus, etc.
> 
> Thanks,

Since kernel.org is still down (and Andrew, I can't download -mm bundle
as well for this very reason), here is an updated version for review.
I've updated map_files_d_revalidate to include ptrace hook, and switched
to flex_array. So while there is uncertainty in would we use pid_no_revalidate
or we wouldn't (seems calling ptrace hook there instead would help) i remain
the get/setattr untouched for a while. Also I hope an updated changelog
would make it more clear why we need this feature.

> By Andrew Morton
>
> But do we *really* need to do it in two passes?  Avoiding the temporary
> storage would involve doing more work under mmap_sem, and a put_filp()
> under mmap_sem might be problematic.

I fear we still need to use two passes in proc_map_files_readdir, I found no way
to escape lockdep complains when doing all work in one pass with mmap_sem taken.
The /maps does the same thing -- ie it fills maps file with mmap_sem taken to produce
robust data. And I'm not really sure what you mean with problematic put_filp?

	Cyrill
---
fs, proc: Introduce the /proc/<pid>/map_files/ directory v10

From: Pavel Emelyanov <xemul@parallels.com>

This one behaves similarly to the /proc/<pid>/fd/ one - it contains symlinks
one for each mapping with file, the name of a symlink is "vma->vm_start-vma->vm_end",
the target is the file. Opening a symlink results in a file that point exactly
to the same inode as them vma's one.

For example the ls -l of some arbitrary /proc/<pid>/map_files/

 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80403000-7f8f80404000 -> /lib64/libc-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f8061e000-7f8f80620000 -> /lib64/libselinux.so.1
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80826000-7f8f80827000 -> /lib64/libacl.so.1.1.0
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a2f000-7f8f80a30000 -> /lib64/librt-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a30000-7f8f80a4c000 -> /lib64/ld-2.5.so

This *helps* checkpointing process in three ways:

1. When dumping a task mappings we do know exact file that is mapped by particular
   region. We do this by opening /proc/$pid/map_files/address symlink the way we do
   with file descriptors.

2. This also helps in determining which anonymous shared mappings are shared with
   each other by comparing the inodes of them.

3. When restoring a set of process in case two of them has a mapping shared, we map
   the memory by the 1st one and then open its /proc/$pid/map_files/address file and
   map it by the 2nd task.

Using /proc/$pid/maps for this is quite inconvenient since it brings repeatable
re-reading and reparsing for this text file which slows down restore procesure
significantly. Also as being pointed in (3) it is a way easier to use top level
shared mapping in children as /proc/$pid/map_files/address when needed.

v2: (spotted by Tejun Heo)
 - /proc/<pid>/mfd changed to /proc/<pid>/map_files
 - find_vma helper is used instead of linear search
 - routines are re-grouped
 - d_revalidate is set now

v3:
 - d_revalidate reworked, now it should drops no longer valid dentries (Tejun Heo)
 - ptrace_may_access added into proc_map_files_lookup (Vasiliy Kulikov)
 - because of filldir (which eventually might need to lock mmap_sem)
   the proc_map_files_readdir() was reworked to call proc_fill_cache()
   with unlocked mmap_sem

v4: (feedback by Tejun Heo and Vasiliy Kulikov)
 - instead of saving data in proc_inode we rather make a dentry name
   to keep both vm_start and vm_end accordingly
 - d_revalidate now honor task credentials

v5: (feedback by Kirill A. Shutemov)
 - don't forget to release mmap_sem on error path

v6:
 - sizeof get used in map_files_info which shrink member a bit on
   x86-32 (by Kirill A. Shutemov)
 - map_name_to_addr returns -EINVAL instead of -1
   which is more appropriate (by Tejun Heo)

v7:
 - add [get/set]attr handlers for
   proc_map_files_inode_operations (by Vasiliy Kulikov)

v8:
 - Kirill A. Shutemov spotted a parasite semicolon
   which ruined the ptrace_check call, fixed.

v9: (feedback by Andrew Morton)
 - find_exact_vma moved into include/linux/mm.h as an inline helper
 - proc_map_files_setattr uses either kmalloc or vmalloc depending
   on how many ojects are to be allocated
 - no more map_name_to_addr but dname_to_vma_addr introduced instead
   and it uses sscanf because in one case the find_exact_vma() is used
   only to confirm existence of vma area the boolean flag is used
 - fancy justification dropped
 - still the proc_map_files_get/setattr leaved untouched
   until additional fd/ patches applied first.

v10: (feedback by Andrew Morton)
 - flex_arrays are used instead of kmalloc/vmalloc calls
 - map_files_d_revalidate use ptrace_may_access for
   security reason (by Vasiliy Kulikov)

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Tejun Heo <tj@kernel.org>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: "Kirill A. Shutemov" <kirill@shutemov.name>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Al Viro <viro@ZenIV.linux.org.uk>
CC: Andrew Morton <akpm@linux-foundation.org>
---
 fs/proc/base.c     |  366 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm.h |   12 +
 2 files changed, 378 insertions(+)

Index: linux-2.6.git/fs/proc/base.c
===================================================================
--- linux-2.6.git.orig/fs/proc/base.c
+++ linux-2.6.git/fs/proc/base.c
@@ -83,6 +83,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/fs_struct.h>
 #include <linux/slab.h>
+#include <linux/flex_array.h>
 #ifdef CONFIG_HARDWALL
 #include <asm/hardwall.h>
 #endif
@@ -2171,6 +2172,370 @@ static const struct file_operations proc
 };
 
 /*
+ * dname_to_vma_addr - maps a dentry name into two unsigned longs
+ * which represent vma start and end addresses.
+ */
+static int dname_to_vma_addr(struct dentry *dentry,
+			     unsigned long *start, unsigned long *end)
+{
+	if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int map_files_d_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
+	unsigned long vm_start, vm_end;
+	bool exact_vma_exists = false;
+	struct task_struct *task;
+	const struct cred *cred;
+	struct mm_struct *mm;
+	struct inode *inode;
+
+	if (nd && nd->flags & LOOKUP_RCU)
+		return -ECHILD;
+
+	inode = dentry->d_inode;
+	task = get_proc_task(inode);
+	if (!task)
+		goto out;
+
+	if (!ptrace_may_access(task, PTRACE_MODE_READ))
+		goto out;
+
+	mm = get_task_mm(task);
+	put_task_struct(task);
+	if (!mm)
+		goto out;
+
+	if (!dname_to_vma_addr(dentry, &vm_start, &vm_end)) {
+		down_read(&mm->mmap_sem);
+		exact_vma_exists = !!find_exact_vma(mm, vm_start, vm_end);
+		up_read(&mm->mmap_sem);
+	}
+
+	mmput(mm);
+
+	if (exact_vma_exists) {
+		if (task_dumpable(task)) {
+			rcu_read_lock();
+			cred = __task_cred(task);
+			inode->i_uid = cred->euid;
+			inode->i_gid = cred->egid;
+			rcu_read_unlock();
+		} else {
+			inode->i_uid = 0;
+			inode->i_gid = 0;
+		}
+		security_task_to_inode(task, inode);
+		return 1;
+	}
+out:
+	d_drop(dentry);
+	return 0;
+}
+
+static const struct dentry_operations tid_map_files_dentry_operations = {
+	.d_revalidate	= map_files_d_revalidate,
+	.d_delete	= pid_delete_dentry,
+};
+
+static int proc_map_files_get_link(struct dentry *dentry, struct path *path)
+{
+	unsigned long vm_start, vm_end;
+	struct vm_area_struct *vma;
+	struct task_struct *task;
+	struct mm_struct *mm;
+	int rc = -ENOENT;
+
+	task = get_proc_task(dentry->d_inode);
+	if (!task)
+		goto out;
+
+	mm = get_task_mm(task);
+	put_task_struct(task);
+	if (!mm)
+		goto out;
+
+	rc = dname_to_vma_addr(dentry, &vm_start, &vm_end);
+	if (rc)
+		goto out_mmput;
+
+	down_read(&mm->mmap_sem);
+	vma = find_exact_vma(mm, vm_start, vm_end);
+	if (vma && vma->vm_file) {
+		*path = vma->vm_file->f_path;
+		path_get(path);
+		rc = 0;
+	}
+	up_read(&mm->mmap_sem);
+
+out_mmput:
+	mmput(mm);
+out:
+	return rc;
+}
+
+struct map_files_info {
+	struct file	*file;
+	unsigned long	len;
+	unsigned char	name[4*sizeof(long)+2]; /* max: %lx-%lx\0 */
+};
+
+static struct dentry *
+proc_map_files_instantiate(struct inode *dir, struct dentry *dentry,
+			   struct task_struct *task, const void *ptr)
+{
+	const struct file *file = ptr;
+	struct proc_inode *ei;
+	struct inode *inode;
+
+	if (!file)
+		return ERR_PTR(-ENOENT);
+
+	inode = proc_pid_make_inode(dir->i_sb, task);
+	if (!inode)
+		return ERR_PTR(-ENOENT);
+
+	ei = PROC_I(inode);
+	ei->op.proc_get_link = proc_map_files_get_link;
+
+	inode->i_op = &proc_pid_link_inode_operations;
+	inode->i_size = 64;
+	inode->i_mode = S_IFLNK;
+
+	if (file->f_mode & FMODE_READ)
+		inode->i_mode |= S_IRUSR | S_IXUSR;
+	if (file->f_mode & FMODE_WRITE)
+		inode->i_mode |= S_IWUSR | S_IXUSR;
+
+	d_set_d_op(dentry, &tid_map_files_dentry_operations);
+	d_add(dentry, inode);
+
+	return NULL;
+}
+
+static struct dentry *proc_map_files_lookup(struct inode *dir,
+		struct dentry *dentry, struct nameidata *nd)
+{
+	unsigned long vm_start, vm_end;
+	struct task_struct *task;
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	struct dentry *result;
+
+	result = ERR_PTR(-ENOENT);
+	task = get_proc_task(dir);
+	if (!task)
+		goto out_no_task;
+
+	result = ERR_PTR(-EPERM);
+	if (!ptrace_may_access(task, PTRACE_MODE_READ))
+		goto out_no_mm;
+
+	result = ERR_PTR(-ENOENT);
+	if (dname_to_vma_addr(dentry, &vm_start, &vm_end))
+		goto out_no_mm;
+
+	mm = get_task_mm(task);
+	if (!mm)
+		goto out_no_mm;
+
+	down_read(&mm->mmap_sem);
+	vma = find_exact_vma(mm, vm_start, vm_end);
+	if (!vma)
+		goto out_no_vma;
+
+	result = proc_map_files_instantiate(dir, dentry, task, vma->vm_file);
+
+out_no_vma:
+	up_read(&mm->mmap_sem);
+	mmput(mm);
+out_no_mm:
+	put_task_struct(task);
+out_no_task:
+	return result;
+}
+
+static int proc_map_files_setattr(struct dentry *dentry, struct iattr *attr)
+{
+	struct inode *inode = dentry->d_inode;
+	struct task_struct *task;
+	int ret = -EACCES;
+
+	task = get_proc_task(inode);
+	if (!task)
+		return -ESRCH;
+
+	if (!lock_trace(task)) {
+		ret = proc_setattr(dentry, attr);
+		unlock_trace(task);
+	}
+
+	put_task_struct(task);
+	return ret;
+}
+
+static int proc_map_files_getattr(struct vfsmount *mnt, struct dentry *dentry,
+				  struct kstat *stat)
+{
+	struct inode *inode = dentry->d_inode;
+	struct task_struct *task;
+	int ret = -EACCES;
+
+	task = get_proc_task(inode);
+	if (!task)
+		return -ESRCH;
+
+	if (!lock_trace(task)) {
+		generic_fillattr(inode, stat);
+		unlock_trace(task);
+		ret = 0;
+	}
+
+	put_task_struct(task);
+	return ret;
+}
+
+static const struct inode_operations proc_map_files_inode_operations = {
+	.lookup		= proc_map_files_lookup,
+	.setattr	= proc_map_files_setattr,
+	.getattr	= proc_map_files_getattr,
+};
+
+static int proc_map_files_readdir(struct file *filp, void *dirent, filldir_t filldir)
+{
+	struct dentry *dentry = filp->f_path.dentry;
+	struct inode *inode = dentry->d_inode;
+	struct vm_area_struct *vma;
+	struct task_struct *task;
+	struct mm_struct *mm;
+	ino_t ino;
+	int ret;
+
+	ret = -ENOENT;
+	task = get_proc_task(inode);
+	if (!task)
+		goto out_no_task;
+
+	ret = -EPERM;
+	if (!ptrace_may_access(task, PTRACE_MODE_READ))
+		goto out;
+
+	ret = 0;
+	switch (filp->f_pos) {
+	case 0:
+		ino = inode->i_ino;
+		if (filldir(dirent, ".", 1, 0, ino, DT_DIR) < 0)
+			goto out;
+		filp->f_pos++;
+	case 1:
+		ino = parent_ino(dentry);
+		if (filldir(dirent, "..", 2, 1, ino, DT_DIR) < 0)
+			goto out;
+		filp->f_pos++;
+	default:
+	{
+		unsigned long nr_files, used, pos, i;
+		struct flex_array *fa = NULL;
+		struct map_files_info info;
+		struct map_files_info *p;
+
+		mm = get_task_mm(task);
+		if (!mm)
+			goto out;
+		down_read(&mm->mmap_sem);
+
+		nr_files = 0;
+		used = 0;
+
+		/*
+		 * We need two passes here:
+		 *
+		 *  1) Collect vmas of mapped files with mmap_sem taken
+		 *  2) Release mmap_sem and instantiate entries
+		 *
+		 * otherwise we get lockdep complained, since filldir()
+		 * routine might require mmap_sem taken in might_fault().
+		 */
+
+		for (vma = mm->mmap; vma; vma = vma->vm_next) {
+			if (vma->vm_file)
+				nr_files++;
+		}
+
+		if (nr_files) {
+			ret = -ENOMEM;
+			fa = flex_array_alloc(sizeof(info), nr_files, GFP_KERNEL);
+			if (!fa)
+				goto err;
+			if (flex_array_prealloc(fa, 0, nr_files, GFP_KERNEL))
+				goto err;
+			for (vma = mm->mmap, pos = 2; vma; vma = vma->vm_next) {
+				if (!vma->vm_file)
+					continue;
+				if (++pos <= filp->f_pos)
+					continue;
+
+				get_file(vma->vm_file);
+				info.file = vma->vm_file;
+				info.len = snprintf(info.name, sizeof(info.name),
+						    "%lx-%lx", vma->vm_start,
+						    vma->vm_end);
+				if (flex_array_put(fa, used, &info, GFP_KERNEL)) {
+					/*
+					 * This must never happen on preallocated array,
+					 * but just to be sure.
+					 */
+					WARN_ON_ONCE(1);
+					put_filp(vma->vm_file);
+					goto err;
+				}
+				used++;
+			}
+			ret = 0;
+		}
+err:
+		up_read(&mm->mmap_sem);
+
+		for (i = 0; i < used && !ret; i++) {
+			p = flex_array_get(fa, i);
+			ret = proc_fill_cache(filp, dirent, filldir,
+					      p->name, p->len,
+					      proc_map_files_instantiate,
+					      task, p->file);
+			if (ret)
+				break;
+			filp->f_pos++;
+			put_filp(p->file);
+		}
+
+		for (; i < used; i++) {
+			p = flex_array_get(fa, i);
+			put_filp(p->file);
+		}
+
+		if (fa)
+			flex_array_free(fa);
+
+		mmput(mm);
+	}
+	}
+
+out:
+	put_task_struct(task);
+out_no_task:
+	return ret;
+}
+
+static const struct file_operations proc_map_files_operations = {
+	.read		= generic_read_dir,
+	.readdir	= proc_map_files_readdir,
+	.llseek		= default_llseek,
+};
+
+/*
  * /proc/pid/fd needs a special permission handler so that a process can still
  * access /proc/self/fd after it has executed a setuid().
  */
@@ -2785,6 +3150,7 @@ static const struct inode_operations pro
 static const struct pid_entry tgid_base_stuff[] = {
 	DIR("task",       S_IRUGO|S_IXUGO, proc_task_inode_operations, proc_task_operations),
 	DIR("fd",         S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations),
+	DIR("map_files",  S_IRUSR|S_IXUSR, proc_map_files_inode_operations, proc_map_files_operations),
 	DIR("fdinfo",     S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations),
 	DIR("ns",	  S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations),
 #ifdef CONFIG_NET
Index: linux-2.6.git/include/linux/mm.h
===================================================================
--- linux-2.6.git.orig/include/linux/mm.h
+++ linux-2.6.git/include/linux/mm.h
@@ -1491,6 +1491,18 @@ static inline unsigned long vma_pages(st
 	return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 }
 
+/* Look up the first VMA which exactly match the interval vm_start ... vm_end */
+static inline struct vm_area_struct *
+find_exact_vma(struct mm_struct *mm, unsigned long vm_start, unsigned long vm_end)
+{
+	struct vm_area_struct *vma = find_vma(mm, vm_start);
+
+	if (vma && (vma->vm_start != vm_start || vma->vm_end != vm_end))
+		vma = NULL;
+
+	return vma;
+}
+
 #ifdef CONFIG_MMU
 pgprot_t vm_get_page_prot(unsigned long vm_flags);
 #else

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-07 21:53                                   ` Cyrill Gorcunov
@ 2011-09-07 22:13                                     ` Andrew Morton
  2011-09-07 22:42                                       ` Cyrill Gorcunov
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2011-09-07 22:13 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Vasiliy Kulikov, Tejun Heo, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Thu, 8 Sep 2011 01:53:29 +0400
Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> On Wed, Sep 07, 2011 at 03:23:01PM +0400, Vasiliy Kulikov wrote:
> > Hi,
> > 
> > On Wed, Sep 07, 2011 at 02:33 +0900, Tejun Heo wrote:
> > > On Tue, Sep 06, 2011 at 09:29:52PM +0400, Vasiliy Kulikov wrote:
> > > > I agree with you.  I don't think that showing system-global debug
> > > > information to all users by default is the right thing.  But some people
> > > > doesn't agree with this point of view:
> > > > 
> > > > http://thread.gmane.org/gmane.linux.kernel/1108378
> > > 
> > > Yeap, I know there are two sides of the discussion but if one takes
> > > the position that hiding such global debug info is more harmful, it's
> > > only crazier to hide such information from each individual users of
> > > the said global facility.  So, let's just forget about information
> > > leak via freeing or not freeing here.  It's the wrong battle field.
> > 
> > Andrew, are you OK with closing the hole with pid_no_revalidate()
> > and 0600 /proc/slabinfo?  If so, I feel I have to start this discussion
> > with people participating in the discussion above: Theodore, Dan, Linus, etc.

I fell asleep a long time ago and don't know what pid_no_revalidate()
and slabinfo permissions have to do with this.  Perhaps summarising the
issues in the changelog would be appropriate, dunno.

> > By Andrew Morton
> >
> > But do we *really* need to do it in two passes?  Avoiding the temporary
> > storage would involve doing more work under mmap_sem, and a put_filp()
> > under mmap_sem might be problematic.
> 
> I fear we still need to use two passes in proc_map_files_readdir, I found no way
> to escape lockdep complains when doing all work in one pass with mmap_sem taken.
> The /maps does the same thing -- ie it fills maps file with mmap_sem taken to produce
> robust data.

The code's using three passes.

> And I'm not really sure what you mean with problematic put_filp?

I was thinking fput(), which can do a hell of a lot of stuff if it's
the final put on the inode.

>
> ...
>
> +static int proc_map_files_readdir(struct file *filp, void *dirent, filldir_t filldir)
> +{
> +	struct dentry *dentry = filp->f_path.dentry;
> +	struct inode *inode = dentry->d_inode;
> +	struct vm_area_struct *vma;
> +	struct task_struct *task;
> +	struct mm_struct *mm;
> +	ino_t ino;
> +	int ret;
> +
> +	ret = -ENOENT;
> +	task = get_proc_task(inode);
> +	if (!task)
> +		goto out_no_task;
> +
> +	ret = -EPERM;
> +	if (!ptrace_may_access(task, PTRACE_MODE_READ))
> +		goto out;
> +
> +	ret = 0;
> +	switch (filp->f_pos) {
> +	case 0:
> +		ino = inode->i_ino;
> +		if (filldir(dirent, ".", 1, 0, ino, DT_DIR) < 0)
> +			goto out;
> +		filp->f_pos++;
> +	case 1:
> +		ino = parent_ino(dentry);
> +		if (filldir(dirent, "..", 2, 1, ino, DT_DIR) < 0)
> +			goto out;
> +		filp->f_pos++;
> +	default:
> +	{
> +		unsigned long nr_files, used, pos, i;
> +		struct flex_array *fa = NULL;
> +		struct map_files_info info;
> +		struct map_files_info *p;
> +
> +		mm = get_task_mm(task);
> +		if (!mm)
> +			goto out;
> +		down_read(&mm->mmap_sem);
> +
> +		nr_files = 0;
> +		used = 0;
> +
> +		/*
> +		 * We need two passes here:
> +		 *
> +		 *  1) Collect vmas of mapped files with mmap_sem taken
> +		 *  2) Release mmap_sem and instantiate entries
> +		 *
> +		 * otherwise we get lockdep complained, since filldir()
> +		 * routine might require mmap_sem taken in might_fault().
> +		 */
> +
> +		for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +			if (vma->vm_file)
> +				nr_files++;
> +		}
> +
> +		if (nr_files) {
> +			ret = -ENOMEM;
> +			fa = flex_array_alloc(sizeof(info), nr_files, GFP_KERNEL);
> +			if (!fa)
> +				goto err;
> +			if (flex_array_prealloc(fa, 0, nr_files, GFP_KERNEL))
> +				goto err;
> +			for (vma = mm->mmap, pos = 2; vma; vma = vma->vm_next) {
> +				if (!vma->vm_file)
> +					continue;
> +				if (++pos <= filp->f_pos)
> +					continue;
> +
> +				get_file(vma->vm_file);
> +				info.file = vma->vm_file;
> +				info.len = snprintf(info.name, sizeof(info.name),
> +						    "%lx-%lx", vma->vm_start,
> +						    vma->vm_end);
> +				if (flex_array_put(fa, used, &info, GFP_KERNEL)) {
> +					/*
> +					 * This must never happen on preallocated array,
> +					 * but just to be sure.
> +					 */
> +					WARN_ON_ONCE(1);
> +					put_filp(vma->vm_file);
> +					goto err;
> +				}
> +				used++;
> +			}
> +			ret = 0;
> +		}
> +err:
> +		up_read(&mm->mmap_sem);
> +
> +		for (i = 0; i < used && !ret; i++) {

The "&& !ret" is unneeded?

> +			p = flex_array_get(fa, i);
> +			ret = proc_fill_cache(filp, dirent, filldir,
> +					      p->name, p->len,
> +					      proc_map_files_instantiate,
> +					      task, p->file);
> +			if (ret)
> +				break;
> +			filp->f_pos++;
> +			put_filp(p->file);
> +		}
> +
> +		for (; i < used; i++) {
> +			p = flex_array_get(fa, i);
> +			put_filp(p->file);
> +		}

Still unclear why we need the third loop.

> +		if (fa)
> +			flex_array_free(fa);
> +
> +		mmput(mm);
> +	}
> +	}
> +
> +out:
> +	put_task_struct(task);
> +out_no_task:
> +	return ret;
> +}
>
> ...
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-07 22:13                                     ` Andrew Morton
@ 2011-09-07 22:42                                       ` Cyrill Gorcunov
  2011-09-07 22:53                                         ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2011-09-07 22:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vasiliy Kulikov, Tejun Heo, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Wed, Sep 07, 2011 at 03:13:23PM -0700, Andrew Morton wrote:
...
> > > 
> > > Andrew, are you OK with closing the hole with pid_no_revalidate()
> > > and 0600 /proc/slabinfo?  If so, I feel I have to start this discussion
> > > with people participating in the discussion above: Theodore, Dan, Linus, etc.
> 
> I fell asleep a long time ago and don't know what pid_no_revalidate()
> and slabinfo permissions have to do with this.  Perhaps summarising the
> issues in the changelog would be appropriate, dunno.

Well, time to poke Vasiliy ;)

...
> > 
> > I fear we still need to use two passes in proc_map_files_readdir, I found no way
> > to escape lockdep complains when doing all work in one pass with mmap_sem taken.
> > The /maps does the same thing -- ie it fills maps file with mmap_sem taken to produce
> > robust data.
> 
> The code's using three passes.

Yes, and I didn't find thy way to escape it (actually if there would not
be filldir+might_fault tuple I would create this all under mmap_sem and
would not need this flex_array or any temporary storage at all and code
would be a way simplier).

> 
> > And I'm not really sure what you mean with problematic put_filp?
> 
> I was thinking fput(), which can do a hell of a lot of stuff if it's
> the final put on the inode.

Ouch, somehow missed it, thanks!

> > +err:
> > +		up_read(&mm->mmap_sem);
> > +
> > +		for (i = 0; i < used && !ret; i++) {
> 
> The "&& !ret" is unneeded?

No, it's needed, since it makes sure that if "impossible"
scenario happens and flex-arrays fails with preallocated
data so we will reach this point with used > 0 and ret = -ENOMEM
and thus will not call for proc_map_files_instantiate as needed.

> 
> > +			p = flex_array_get(fa, i);
> > +			ret = proc_fill_cache(filp, dirent, filldir,
> > +					      p->name, p->len,
> > +					      proc_map_files_instantiate,
> > +					      task, p->file);
> > +			if (ret)
> > +				break;

1: Say we failed here

> > +			filp->f_pos++;
> > +			put_filp(p->file);
> > +		}
> > +
> > +		for (; i < used; i++) {
> > +			p = flex_array_get(fa, i);
> > +			put_filp(p->file);
> > +		}
> 
> Still unclear why we need the third loop.

Due to (1) -- so we will have a number of files reference
taken and need to put them back.

	Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-07 22:42                                       ` Cyrill Gorcunov
@ 2011-09-07 22:53                                         ` Andrew Morton
  2011-09-08  5:48                                           ` Cyrill Gorcunov
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2011-09-07 22:53 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Vasiliy Kulikov, Tejun Heo, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Thu, 8 Sep 2011 02:42:34 +0400
Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> > > +err:
> > > +		up_read(&mm->mmap_sem);
> > > +
> > > +		for (i = 0; i < used && !ret; i++) {
> > 
> > The "&& !ret" is unneeded?
> 
> No, it's needed, since it makes sure that if "impossible"
> scenario happens and flex-arrays fails with preallocated
> data so we will reach this point with used > 0 and ret = -ENOMEM
> and thus will not call for proc_map_files_instantiate as needed.

Well, it doesn't need to be tested on each pass around the loop - that's
misleading and inefficient (unless the compiler is being particularly clever).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-07 22:53                                         ` Andrew Morton
@ 2011-09-08  5:48                                           ` Cyrill Gorcunov
  2011-09-08  5:50                                             ` Cyrill Gorcunov
  0 siblings, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2011-09-08  5:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vasiliy Kulikov, Tejun Heo, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Wed, Sep 07, 2011 at 03:53:32PM -0700, Andrew Morton wrote:
...
> 
> > > > +err:
> > > > +		up_read(&mm->mmap_sem);
> > > > +
> > > > +		for (i = 0; i < used && !ret; i++) {
> > > 
> > > The "&& !ret" is unneeded?
> > 
> > No, it's needed, since it makes sure that if "impossible"
> > scenario happens and flex-arrays fails with preallocated
> > data so we will reach this point with used > 0 and ret = -ENOMEM
> > and thus will not call for proc_map_files_instantiate as needed.
> 
> Well, it doesn't need to be tested on each pass around the loop - that's
> misleading and inefficient (unless the compiler is being particularly clever).

Here an update is coming, please take a look. Note I simply added BUG() in case
if preallocated array failed since found that it's what other kernel code
does as well (ie grep for flex_array_put_ptr). Other notes are at v11: mark
below. Thanks.

	Cyrill
---
fs, proc: Introduce the /proc/<pid>/map_files/ directory v11

From: Pavel Emelyanov <xemul@parallels.com>

This one behaves similarly to the /proc/<pid>/fd/ one - it contains symlinks
one for each mapping with file, the name of a symlink is "vma->vm_start-vma->vm_end",
the target is the file. Opening a symlink results in a file that point exactly
to the same inode as them vma's one.

For example the ls -l of some arbitrary /proc/<pid>/map_files/

 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80403000-7f8f80404000 -> /lib64/libc-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f8061e000-7f8f80620000 -> /lib64/libselinux.so.1
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80826000-7f8f80827000 -> /lib64/libacl.so.1.1.0
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a2f000-7f8f80a30000 -> /lib64/librt-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a30000-7f8f80a4c000 -> /lib64/ld-2.5.so

This *helps* checkpointing process in three ways:

1. When dumping a task mappings we do know exact file that is mapped by particular
   region. We do this by opening /proc/$pid/map_files/address symlink the way we do
   with file descriptors.

2. This also helps in determining which anonymous shared mappings are shared with
   each other by comparing the inodes of them.

3. When restoring a set of process in case two of them has a mapping shared, we map
   the memory by the 1st one and then open its /proc/$pid/map_files/address file and
   map it by the 2nd task.

Using /proc/$pid/maps for this is quite inconvenient since it brings repeatable
re-reading and reparsing for this text file which slows down restore procesure
significantly. Also as being pointed in (3) it is a way easier to use top level
shared mapping in children as /proc/$pid/map_files/address when needed.

v2: (spotted by Tejun Heo)
 - /proc/<pid>/mfd changed to /proc/<pid>/map_files
 - find_vma helper is used instead of linear search
 - routines are re-grouped
 - d_revalidate is set now

v3:
 - d_revalidate reworked, now it should drops no longer valid dentries (Tejun Heo)
 - ptrace_may_access added into proc_map_files_lookup (Vasiliy Kulikov)
 - because of filldir (which eventually might need to lock mmap_sem)
   the proc_map_files_readdir() was reworked to call proc_fill_cache()
   with unlocked mmap_sem

v4: (feedback by Tejun Heo and Vasiliy Kulikov)
 - instead of saving data in proc_inode we rather make a dentry name
   to keep both vm_start and vm_end accordingly
 - d_revalidate now honor task credentials

v5: (feedback by Kirill A. Shutemov)
 - don't forget to release mmap_sem on error path

v6:
 - sizeof get used in map_files_info which shrink member a bit on
   x86-32 (by Kirill A. Shutemov)
 - map_name_to_addr returns -EINVAL instead of -1
   which is more appropriate (by Tejun Heo)

v7:
 - add [get/set]attr handlers for
   proc_map_files_inode_operations (by Vasiliy Kulikov)

v8:
 - Kirill A. Shutemov spotted a parasite semicolon
   which ruined the ptrace_check call, fixed.

v9: (feedback by Andrew Morton)
 - find_exact_vma moved into include/linux/mm.h as an inline helper
 - proc_map_files_setattr uses either kmalloc or vmalloc depending
   on how many ojects are to be allocated
 - no more map_name_to_addr but dname_to_vma_addr introduced instead
   and it uses sscanf because in one case the find_exact_vma() is used
   only to confirm existence of vma area the boolean flag is used
 - fancy justification dropped
 - still the proc_map_files_get/setattr leaved untouched
   until additional fd/ patches applied first.

v10: (feedback by Andrew Morton)
 - flex_arrays are used instead of kmalloc/vmalloc calls
 - map_files_d_revalidate use ptrace_may_access for
   security reason (by Vasiliy Kulikov)

v11:
 - should use fput and drop !ret test from a loop code
   (feedback by Andrew Morton)
 - no need for 'used' variable, use existing
   nr_files with file->pos predicate

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Tejun Heo <tj@kernel.org>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: "Kirill A. Shutemov" <kirill@shutemov.name>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Al Viro <viro@ZenIV.linux.org.uk>
CC: Andrew Morton <akpm@linux-foundation.org>
---
 fs/proc/base.c     |  365 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm.h |   12 +
 2 files changed, 377 insertions(+)

Index: linux-2.6.git/fs/proc/base.c
===================================================================
--- linux-2.6.git.orig/fs/proc/base.c
+++ linux-2.6.git/fs/proc/base.c
@@ -83,6 +83,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/fs_struct.h>
 #include <linux/slab.h>
+#include <linux/flex_array.h>
 #ifdef CONFIG_HARDWALL
 #include <asm/hardwall.h>
 #endif
@@ -2171,6 +2172,369 @@ static const struct file_operations proc
 };
 
 /*
+ * dname_to_vma_addr - maps a dentry name into two unsigned longs
+ * which represent vma start and end addresses.
+ */
+static int dname_to_vma_addr(struct dentry *dentry,
+			     unsigned long *start, unsigned long *end)
+{
+	if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int map_files_d_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
+	unsigned long vm_start, vm_end;
+	bool exact_vma_exists = false;
+	struct task_struct *task;
+	const struct cred *cred;
+	struct mm_struct *mm;
+	struct inode *inode;
+
+	if (nd && nd->flags & LOOKUP_RCU)
+		return -ECHILD;
+
+	inode = dentry->d_inode;
+	task = get_proc_task(inode);
+	if (!task)
+		goto out;
+
+	if (!ptrace_may_access(task, PTRACE_MODE_READ))
+		goto out;
+
+	mm = get_task_mm(task);
+	put_task_struct(task);
+	if (!mm)
+		goto out;
+
+	if (!dname_to_vma_addr(dentry, &vm_start, &vm_end)) {
+		down_read(&mm->mmap_sem);
+		exact_vma_exists = !!find_exact_vma(mm, vm_start, vm_end);
+		up_read(&mm->mmap_sem);
+	}
+
+	mmput(mm);
+
+	if (exact_vma_exists) {
+		if (task_dumpable(task)) {
+			rcu_read_lock();
+			cred = __task_cred(task);
+			inode->i_uid = cred->euid;
+			inode->i_gid = cred->egid;
+			rcu_read_unlock();
+		} else {
+			inode->i_uid = 0;
+			inode->i_gid = 0;
+		}
+		security_task_to_inode(task, inode);
+		return 1;
+	}
+out:
+	d_drop(dentry);
+	return 0;
+}
+
+static const struct dentry_operations tid_map_files_dentry_operations = {
+	.d_revalidate	= map_files_d_revalidate,
+	.d_delete	= pid_delete_dentry,
+};
+
+static int proc_map_files_get_link(struct dentry *dentry, struct path *path)
+{
+	unsigned long vm_start, vm_end;
+	struct vm_area_struct *vma;
+	struct task_struct *task;
+	struct mm_struct *mm;
+	int rc = -ENOENT;
+
+	task = get_proc_task(dentry->d_inode);
+	if (!task)
+		goto out;
+
+	mm = get_task_mm(task);
+	put_task_struct(task);
+	if (!mm)
+		goto out;
+
+	rc = dname_to_vma_addr(dentry, &vm_start, &vm_end);
+	if (rc)
+		goto out_mmput;
+
+	down_read(&mm->mmap_sem);
+	vma = find_exact_vma(mm, vm_start, vm_end);
+	if (vma && vma->vm_file) {
+		*path = vma->vm_file->f_path;
+		path_get(path);
+		rc = 0;
+	}
+	up_read(&mm->mmap_sem);
+
+out_mmput:
+	mmput(mm);
+out:
+	return rc;
+}
+
+struct map_files_info {
+	struct file	*file;
+	unsigned long	len;
+	unsigned char	name[4*sizeof(long)+2]; /* max: %lx-%lx\0 */
+};
+
+static struct dentry *
+proc_map_files_instantiate(struct inode *dir, struct dentry *dentry,
+			   struct task_struct *task, const void *ptr)
+{
+	const struct file *file = ptr;
+	struct proc_inode *ei;
+	struct inode *inode;
+
+	if (!file)
+		return ERR_PTR(-ENOENT);
+
+	inode = proc_pid_make_inode(dir->i_sb, task);
+	if (!inode)
+		return ERR_PTR(-ENOENT);
+
+	ei = PROC_I(inode);
+	ei->op.proc_get_link = proc_map_files_get_link;
+
+	inode->i_op = &proc_pid_link_inode_operations;
+	inode->i_size = 64;
+	inode->i_mode = S_IFLNK;
+
+	if (file->f_mode & FMODE_READ)
+		inode->i_mode |= S_IRUSR | S_IXUSR;
+	if (file->f_mode & FMODE_WRITE)
+		inode->i_mode |= S_IWUSR | S_IXUSR;
+
+	d_set_d_op(dentry, &tid_map_files_dentry_operations);
+	d_add(dentry, inode);
+
+	return NULL;
+}
+
+static struct dentry *proc_map_files_lookup(struct inode *dir,
+		struct dentry *dentry, struct nameidata *nd)
+{
+	unsigned long vm_start, vm_end;
+	struct task_struct *task;
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	struct dentry *result;
+
+	result = ERR_PTR(-ENOENT);
+	task = get_proc_task(dir);
+	if (!task)
+		goto out_no_task;
+
+	result = ERR_PTR(-EPERM);
+	if (!ptrace_may_access(task, PTRACE_MODE_READ))
+		goto out_no_mm;
+
+	result = ERR_PTR(-ENOENT);
+	if (dname_to_vma_addr(dentry, &vm_start, &vm_end))
+		goto out_no_mm;
+
+	mm = get_task_mm(task);
+	if (!mm)
+		goto out_no_mm;
+
+	down_read(&mm->mmap_sem);
+	vma = find_exact_vma(mm, vm_start, vm_end);
+	if (!vma)
+		goto out_no_vma;
+
+	result = proc_map_files_instantiate(dir, dentry, task, vma->vm_file);
+
+out_no_vma:
+	up_read(&mm->mmap_sem);
+	mmput(mm);
+out_no_mm:
+	put_task_struct(task);
+out_no_task:
+	return result;
+}
+
+static int proc_map_files_setattr(struct dentry *dentry, struct iattr *attr)
+{
+	struct inode *inode = dentry->d_inode;
+	struct task_struct *task;
+	int ret = -EACCES;
+
+	task = get_proc_task(inode);
+	if (!task)
+		return -ESRCH;
+
+	if (!lock_trace(task)) {
+		ret = proc_setattr(dentry, attr);
+		unlock_trace(task);
+	}
+
+	put_task_struct(task);
+	return ret;
+}
+
+static int proc_map_files_getattr(struct vfsmount *mnt, struct dentry *dentry,
+				  struct kstat *stat)
+{
+	struct inode *inode = dentry->d_inode;
+	struct task_struct *task;
+	int ret = -EACCES;
+
+	task = get_proc_task(inode);
+	if (!task)
+		return -ESRCH;
+
+	if (!lock_trace(task)) {
+		generic_fillattr(inode, stat);
+		unlock_trace(task);
+		ret = 0;
+	}
+
+	put_task_struct(task);
+	return ret;
+}
+
+static const struct inode_operations proc_map_files_inode_operations = {
+	.lookup		= proc_map_files_lookup,
+	.setattr	= proc_map_files_setattr,
+	.getattr	= proc_map_files_getattr,
+};
+
+static int proc_map_files_readdir(struct file *filp, void *dirent, filldir_t filldir)
+{
+	struct dentry *dentry = filp->f_path.dentry;
+	struct inode *inode = dentry->d_inode;
+	struct vm_area_struct *vma;
+	struct task_struct *task;
+	struct mm_struct *mm;
+	ino_t ino;
+	int ret;
+
+	ret = -ENOENT;
+	task = get_proc_task(inode);
+	if (!task)
+		goto out_no_task;
+
+	ret = -EPERM;
+	if (!ptrace_may_access(task, PTRACE_MODE_READ))
+		goto out;
+
+	ret = 0;
+	switch (filp->f_pos) {
+	case 0:
+		ino = inode->i_ino;
+		if (filldir(dirent, ".", 1, 0, ino, DT_DIR) < 0)
+			goto out;
+		filp->f_pos++;
+	case 1:
+		ino = parent_ino(dentry);
+		if (filldir(dirent, "..", 2, 1, ino, DT_DIR) < 0)
+			goto out;
+		filp->f_pos++;
+	default:
+	{
+		unsigned long nr_files, pos, i;
+		struct flex_array *fa = NULL;
+		struct map_files_info info;
+		struct map_files_info *p;
+
+		mm = get_task_mm(task);
+		if (!mm)
+			goto out;
+		down_read(&mm->mmap_sem);
+
+		nr_files = 0;
+
+		/*
+		 * We need two passes here:
+		 *
+		 *  1) Collect vmas of mapped files with mmap_sem taken
+		 *  2) Release mmap_sem and instantiate entries
+		 *
+		 * otherwise we get lockdep complained, since filldir()
+		 * routine might require mmap_sem taken in might_fault().
+		 */
+
+		for (vma = mm->mmap, pos = 2; vma; vma = vma->vm_next) {
+			if (vma->vm_file && ++pos > filp->f_pos)
+				nr_files++;
+		}
+
+		if (nr_files) {
+			ret = -ENOMEM;
+			fa = flex_array_alloc(sizeof(info), nr_files, GFP_KERNEL);
+			if (!fa)
+				goto nomem;
+			if (flex_array_prealloc(fa, 0, nr_files, GFP_KERNEL)) {
+				flex_array_free(fa);
+				fa = NULL;
+				goto nomem;
+			}
+			for (i = 0, vma = mm->mmap, pos = 2; vma; vma = vma->vm_next) {
+				if (!vma->vm_file)
+					continue;
+				if (++pos <= filp->f_pos)
+					continue;
+
+				get_file(vma->vm_file);
+				info.file = vma->vm_file;
+				info.len = snprintf(info.name, sizeof(info.name),
+						    "%lx-%lx", vma->vm_start,
+						    vma->vm_end);
+				if (flex_array_put(fa, i++, &info, GFP_KERNEL))
+					BUG();
+			}
+			ret = 0;
+		}
+nomem:
+		up_read(&mm->mmap_sem);
+
+		if (fa) {
+			for (i = 0; i < nr_files; i++) {
+				p = flex_array_get(fa, i);
+				ret = proc_fill_cache(filp, dirent, filldir,
+						      p->name, p->len,
+						      proc_map_files_instantiate,
+						      task, p->file);
+				if (ret)
+					break;
+				filp->f_pos++;
+				fput(p->file);
+			}
+
+			for (; i < nr_files; i++) {
+				/*
+				 * In case of error don't forget
+				 * to put rest of file refs.
+				 */
+				p = flex_array_get(fa, i);
+				fput(p->file);
+			}
+
+			flex_array_free(fa);
+		}
+
+		mmput(mm);
+	}
+	}
+
+out:
+	put_task_struct(task);
+out_no_task:
+	return ret;
+}
+
+static const struct file_operations proc_map_files_operations = {
+	.read		= generic_read_dir,
+	.readdir	= proc_map_files_readdir,
+	.llseek		= default_llseek,
+};
+
+/*
  * /proc/pid/fd needs a special permission handler so that a process can still
  * access /proc/self/fd after it has executed a setuid().
  */
@@ -2785,6 +3149,7 @@ static const struct inode_operations pro
 static const struct pid_entry tgid_base_stuff[] = {
 	DIR("task",       S_IRUGO|S_IXUGO, proc_task_inode_operations, proc_task_operations),
 	DIR("fd",         S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations),
+	DIR("map_files",  S_IRUSR|S_IXUSR, proc_map_files_inode_operations, proc_map_files_operations),
 	DIR("fdinfo",     S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations),
 	DIR("ns",	  S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations),
 #ifdef CONFIG_NET
Index: linux-2.6.git/include/linux/mm.h
===================================================================
--- linux-2.6.git.orig/include/linux/mm.h
+++ linux-2.6.git/include/linux/mm.h
@@ -1491,6 +1491,18 @@ static inline unsigned long vma_pages(st
 	return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 }
 
+/* Look up the first VMA which exactly match the interval vm_start ... vm_end */
+static inline struct vm_area_struct *
+find_exact_vma(struct mm_struct *mm, unsigned long vm_start, unsigned long vm_end)
+{
+	struct vm_area_struct *vma = find_vma(mm, vm_start);
+
+	if (vma && (vma->vm_start != vm_start || vma->vm_end != vm_end))
+		vma = NULL;
+
+	return vma;
+}
+
 #ifdef CONFIG_MMU
 pgprot_t vm_get_page_prot(unsigned long vm_flags);
 #else

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-08  5:48                                           ` Cyrill Gorcunov
@ 2011-09-08  5:50                                             ` Cyrill Gorcunov
  2011-09-08  6:04                                               ` Cyrill Gorcunov
  0 siblings, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2011-09-08  5:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vasiliy Kulikov, Tejun Heo, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Thu, Sep 08, 2011 at 09:48:26AM +0400, Cyrill Gorcunov wrote:
...
> 
> Here an update is coming, please take a look. Note I simply added BUG() in case
> if preallocated array failed since found that it's what other kernel code
> does as well (ie grep for flex_array_put_ptr). Other notes are at v11: mark
> below. Thanks.
> 
> 	Cyrill
> ---
> fs, proc: Introduce the /proc/<pid>/map_files/ directory v11
> 

Crap. Andrew drop this one please.

	Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-08  5:50                                             ` Cyrill Gorcunov
@ 2011-09-08  6:04                                               ` Cyrill Gorcunov
  2011-09-08 23:52                                                 ` Andrew Morton
  2011-09-10 13:21                                                 ` Vasiliy Kulikov
  0 siblings, 2 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2011-09-08  6:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vasiliy Kulikov, Tejun Heo, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Thu, Sep 08, 2011 at 09:50:25AM +0400, Cyrill Gorcunov wrote:
> On Thu, Sep 08, 2011 at 09:48:26AM +0400, Cyrill Gorcunov wrote:
> ...
> > 
> > Here an update is coming, please take a look. Note I simply added BUG() in case
> > if preallocated array failed since found that it's what other kernel code
> > does as well (ie grep for flex_array_put_ptr). Other notes are at v11: mark
> > below. Thanks.
> > 
> > 	Cyrill
> > ---
> > fs, proc: Introduce the /proc/<pid>/map_files/ directory v11
> > 
> 
> Crap. Andrew drop this one please.
> 

This one should pass better. The changes from previous version
are at proc_map_files_readdir().

	Cyrill
---
fs, proc: Introduce the /proc/<pid>/map_files/ directory v11

From: Pavel Emelyanov <xemul@parallels.com>

This one behaves similarly to the /proc/<pid>/fd/ one - it contains symlinks
one for each mapping with file, the name of a symlink is "vma->vm_start-vma->vm_end",
the target is the file. Opening a symlink results in a file that point exactly
to the same inode as them vma's one.

For example the ls -l of some arbitrary /proc/<pid>/map_files/

 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80403000-7f8f80404000 -> /lib64/libc-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f8061e000-7f8f80620000 -> /lib64/libselinux.so.1
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80826000-7f8f80827000 -> /lib64/libacl.so.1.1.0
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a2f000-7f8f80a30000 -> /lib64/librt-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a30000-7f8f80a4c000 -> /lib64/ld-2.5.so

This *helps* checkpointing process in three ways:

1. When dumping a task mappings we do know exact file that is mapped by particular
   region. We do this by opening /proc/$pid/map_files/address symlink the way we do
   with file descriptors.

2. This also helps in determining which anonymous shared mappings are shared with
   each other by comparing the inodes of them.

3. When restoring a set of process in case two of them has a mapping shared, we map
   the memory by the 1st one and then open its /proc/$pid/map_files/address file and
   map it by the 2nd task.

Using /proc/$pid/maps for this is quite inconvenient since it brings repeatable
re-reading and reparsing for this text file which slows down restore procesure
significantly. Also as being pointed in (3) it is a way easier to use top level
shared mapping in children as /proc/$pid/map_files/address when needed.

v2: (spotted by Tejun Heo)
 - /proc/<pid>/mfd changed to /proc/<pid>/map_files
 - find_vma helper is used instead of linear search
 - routines are re-grouped
 - d_revalidate is set now

v3:
 - d_revalidate reworked, now it should drops no longer valid dentries (Tejun Heo)
 - ptrace_may_access added into proc_map_files_lookup (Vasiliy Kulikov)
 - because of filldir (which eventually might need to lock mmap_sem)
   the proc_map_files_readdir() was reworked to call proc_fill_cache()
   with unlocked mmap_sem

v4: (feedback by Tejun Heo and Vasiliy Kulikov)
 - instead of saving data in proc_inode we rather make a dentry name
   to keep both vm_start and vm_end accordingly
 - d_revalidate now honor task credentials

v5: (feedback by Kirill A. Shutemov)
 - don't forget to release mmap_sem on error path

v6:
 - sizeof get used in map_files_info which shrink member a bit on
   x86-32 (by Kirill A. Shutemov)
 - map_name_to_addr returns -EINVAL instead of -1
   which is more appropriate (by Tejun Heo)

v7:
 - add [get/set]attr handlers for
   proc_map_files_inode_operations (by Vasiliy Kulikov)

v8:
 - Kirill A. Shutemov spotted a parasite semicolon
   which ruined the ptrace_check call, fixed.

v9: (feedback by Andrew Morton)
 - find_exact_vma moved into include/linux/mm.h as an inline helper
 - proc_map_files_setattr uses either kmalloc or vmalloc depending
   on how many ojects are to be allocated
 - no more map_name_to_addr but dname_to_vma_addr introduced instead
   and it uses sscanf because in one case the find_exact_vma() is used
   only to confirm existence of vma area the boolean flag is used
 - fancy justification dropped
 - still the proc_map_files_get/setattr leaved untouched
   until additional fd/ patches applied first.

v10: (feedback by Andrew Morton)
 - flex_arrays are used instead of kmalloc/vmalloc calls
 - map_files_d_revalidate use ptrace_may_access for
   security reason (by Vasiliy Kulikov)

v11:
 - should use fput and drop !ret test from a loop code
   (feedback by Andrew Morton)
 - no need for 'used' variable, use existing
   nr_files with file->pos predicate
 - if preallocation fails no need to go further,
   simply release mmap semaphore and jump out

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Tejun Heo <tj@kernel.org>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: "Kirill A. Shutemov" <kirill@shutemov.name>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Al Viro <viro@ZenIV.linux.org.uk>
CC: Andrew Morton <akpm@linux-foundation.org>
---
 fs/proc/base.c     |  359 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm.h |   12 +
 2 files changed, 371 insertions(+)

Index: linux-2.6.git/fs/proc/base.c
===================================================================
--- linux-2.6.git.orig/fs/proc/base.c
+++ linux-2.6.git/fs/proc/base.c
@@ -83,6 +83,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/fs_struct.h>
 #include <linux/slab.h>
+#include <linux/flex_array.h>
 #ifdef CONFIG_HARDWALL
 #include <asm/hardwall.h>
 #endif
@@ -2171,6 +2172,363 @@ static const struct file_operations proc
 };
 
 /*
+ * dname_to_vma_addr - maps a dentry name into two unsigned longs
+ * which represent vma start and end addresses.
+ */
+static int dname_to_vma_addr(struct dentry *dentry,
+			     unsigned long *start, unsigned long *end)
+{
+	if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int map_files_d_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
+	unsigned long vm_start, vm_end;
+	bool exact_vma_exists = false;
+	struct task_struct *task;
+	const struct cred *cred;
+	struct mm_struct *mm;
+	struct inode *inode;
+
+	if (nd && nd->flags & LOOKUP_RCU)
+		return -ECHILD;
+
+	inode = dentry->d_inode;
+	task = get_proc_task(inode);
+	if (!task)
+		goto out;
+
+	if (!ptrace_may_access(task, PTRACE_MODE_READ))
+		goto out;
+
+	mm = get_task_mm(task);
+	put_task_struct(task);
+	if (!mm)
+		goto out;
+
+	if (!dname_to_vma_addr(dentry, &vm_start, &vm_end)) {
+		down_read(&mm->mmap_sem);
+		exact_vma_exists = !!find_exact_vma(mm, vm_start, vm_end);
+		up_read(&mm->mmap_sem);
+	}
+
+	mmput(mm);
+
+	if (exact_vma_exists) {
+		if (task_dumpable(task)) {
+			rcu_read_lock();
+			cred = __task_cred(task);
+			inode->i_uid = cred->euid;
+			inode->i_gid = cred->egid;
+			rcu_read_unlock();
+		} else {
+			inode->i_uid = 0;
+			inode->i_gid = 0;
+		}
+		security_task_to_inode(task, inode);
+		return 1;
+	}
+out:
+	d_drop(dentry);
+	return 0;
+}
+
+static const struct dentry_operations tid_map_files_dentry_operations = {
+	.d_revalidate	= map_files_d_revalidate,
+	.d_delete	= pid_delete_dentry,
+};
+
+static int proc_map_files_get_link(struct dentry *dentry, struct path *path)
+{
+	unsigned long vm_start, vm_end;
+	struct vm_area_struct *vma;
+	struct task_struct *task;
+	struct mm_struct *mm;
+	int rc = -ENOENT;
+
+	task = get_proc_task(dentry->d_inode);
+	if (!task)
+		goto out;
+
+	mm = get_task_mm(task);
+	put_task_struct(task);
+	if (!mm)
+		goto out;
+
+	rc = dname_to_vma_addr(dentry, &vm_start, &vm_end);
+	if (rc)
+		goto out_mmput;
+
+	down_read(&mm->mmap_sem);
+	vma = find_exact_vma(mm, vm_start, vm_end);
+	if (vma && vma->vm_file) {
+		*path = vma->vm_file->f_path;
+		path_get(path);
+		rc = 0;
+	}
+	up_read(&mm->mmap_sem);
+
+out_mmput:
+	mmput(mm);
+out:
+	return rc;
+}
+
+struct map_files_info {
+	struct file	*file;
+	unsigned long	len;
+	unsigned char	name[4*sizeof(long)+2]; /* max: %lx-%lx\0 */
+};
+
+static struct dentry *
+proc_map_files_instantiate(struct inode *dir, struct dentry *dentry,
+			   struct task_struct *task, const void *ptr)
+{
+	const struct file *file = ptr;
+	struct proc_inode *ei;
+	struct inode *inode;
+
+	if (!file)
+		return ERR_PTR(-ENOENT);
+
+	inode = proc_pid_make_inode(dir->i_sb, task);
+	if (!inode)
+		return ERR_PTR(-ENOENT);
+
+	ei = PROC_I(inode);
+	ei->op.proc_get_link = proc_map_files_get_link;
+
+	inode->i_op = &proc_pid_link_inode_operations;
+	inode->i_size = 64;
+	inode->i_mode = S_IFLNK;
+
+	if (file->f_mode & FMODE_READ)
+		inode->i_mode |= S_IRUSR | S_IXUSR;
+	if (file->f_mode & FMODE_WRITE)
+		inode->i_mode |= S_IWUSR | S_IXUSR;
+
+	d_set_d_op(dentry, &tid_map_files_dentry_operations);
+	d_add(dentry, inode);
+
+	return NULL;
+}
+
+static struct dentry *proc_map_files_lookup(struct inode *dir,
+		struct dentry *dentry, struct nameidata *nd)
+{
+	unsigned long vm_start, vm_end;
+	struct task_struct *task;
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	struct dentry *result;
+
+	result = ERR_PTR(-ENOENT);
+	task = get_proc_task(dir);
+	if (!task)
+		goto out_no_task;
+
+	result = ERR_PTR(-EPERM);
+	if (!ptrace_may_access(task, PTRACE_MODE_READ))
+		goto out_no_mm;
+
+	result = ERR_PTR(-ENOENT);
+	if (dname_to_vma_addr(dentry, &vm_start, &vm_end))
+		goto out_no_mm;
+
+	mm = get_task_mm(task);
+	if (!mm)
+		goto out_no_mm;
+
+	down_read(&mm->mmap_sem);
+	vma = find_exact_vma(mm, vm_start, vm_end);
+	if (!vma)
+		goto out_no_vma;
+
+	result = proc_map_files_instantiate(dir, dentry, task, vma->vm_file);
+
+out_no_vma:
+	up_read(&mm->mmap_sem);
+	mmput(mm);
+out_no_mm:
+	put_task_struct(task);
+out_no_task:
+	return result;
+}
+
+static int proc_map_files_setattr(struct dentry *dentry, struct iattr *attr)
+{
+	struct inode *inode = dentry->d_inode;
+	struct task_struct *task;
+	int ret = -EACCES;
+
+	task = get_proc_task(inode);
+	if (!task)
+		return -ESRCH;
+
+	if (!lock_trace(task)) {
+		ret = proc_setattr(dentry, attr);
+		unlock_trace(task);
+	}
+
+	put_task_struct(task);
+	return ret;
+}
+
+static int proc_map_files_getattr(struct vfsmount *mnt, struct dentry *dentry,
+				  struct kstat *stat)
+{
+	struct inode *inode = dentry->d_inode;
+	struct task_struct *task;
+	int ret = -EACCES;
+
+	task = get_proc_task(inode);
+	if (!task)
+		return -ESRCH;
+
+	if (!lock_trace(task)) {
+		generic_fillattr(inode, stat);
+		unlock_trace(task);
+		ret = 0;
+	}
+
+	put_task_struct(task);
+	return ret;
+}
+
+static const struct inode_operations proc_map_files_inode_operations = {
+	.lookup		= proc_map_files_lookup,
+	.setattr	= proc_map_files_setattr,
+	.getattr	= proc_map_files_getattr,
+};
+
+static int proc_map_files_readdir(struct file *filp, void *dirent, filldir_t filldir)
+{
+	struct dentry *dentry = filp->f_path.dentry;
+	struct inode *inode = dentry->d_inode;
+	struct vm_area_struct *vma;
+	struct task_struct *task;
+	struct mm_struct *mm;
+	ino_t ino;
+	int ret;
+
+	ret = -ENOENT;
+	task = get_proc_task(inode);
+	if (!task)
+		goto out_no_task;
+
+	ret = -EPERM;
+	if (!ptrace_may_access(task, PTRACE_MODE_READ))
+		goto out;
+
+	ret = 0;
+	switch (filp->f_pos) {
+	case 0:
+		ino = inode->i_ino;
+		if (filldir(dirent, ".", 1, 0, ino, DT_DIR) < 0)
+			goto out;
+		filp->f_pos++;
+	case 1:
+		ino = parent_ino(dentry);
+		if (filldir(dirent, "..", 2, 1, ino, DT_DIR) < 0)
+			goto out;
+		filp->f_pos++;
+	default:
+	{
+		unsigned long nr_files, pos, i;
+		struct flex_array *fa = NULL;
+		struct map_files_info info;
+		struct map_files_info *p;
+
+		mm = get_task_mm(task);
+		if (!mm)
+			goto out;
+		down_read(&mm->mmap_sem);
+
+		nr_files = 0;
+
+		/*
+		 * We need two passes here:
+		 *
+		 *  1) Collect vmas of mapped files with mmap_sem taken
+		 *  2) Release mmap_sem and instantiate entries
+		 *
+		 * otherwise we get lockdep complained, since filldir()
+		 * routine might require mmap_sem taken in might_fault().
+		 */
+
+		for (vma = mm->mmap, pos = 2; vma; vma = vma->vm_next) {
+			if (vma->vm_file && ++pos > filp->f_pos)
+				nr_files++;
+		}
+
+		if (nr_files) {
+			fa = flex_array_alloc(sizeof(info), nr_files, GFP_KERNEL);
+			if (!fa || flex_array_prealloc(fa, 0, nr_files, GFP_KERNEL)) {
+				ret = -ENOMEM;
+				if (fa)
+					flex_array_free(fa);
+				up_read(&mm->mmap_sem);
+				mmput(mm);
+				goto out;
+			}
+			for (i = 0, vma = mm->mmap, pos = 2; vma; vma = vma->vm_next) {
+				if (!vma->vm_file)
+					continue;
+				if (++pos <= filp->f_pos)
+					continue;
+
+				get_file(vma->vm_file);
+				info.file = vma->vm_file;
+				info.len = snprintf(info.name, sizeof(info.name),
+						    "%lx-%lx", vma->vm_start,
+						    vma->vm_end);
+				if (flex_array_put(fa, i++, &info, GFP_KERNEL))
+					BUG();
+			}
+		}
+		up_read(&mm->mmap_sem);
+
+		for (i = 0; i < nr_files; i++) {
+			p = flex_array_get(fa, i);
+			ret = proc_fill_cache(filp, dirent, filldir,
+					      p->name, p->len,
+					      proc_map_files_instantiate,
+					      task, p->file);
+			if (ret)
+				break;
+			filp->f_pos++;
+			fput(p->file);
+		}
+		for (; i < nr_files; i++) {
+			/*
+			 * In case of error don't forget
+			 * to put rest of file refs.
+			 */
+			p = flex_array_get(fa, i);
+			fput(p->file);
+		}
+		if (fa)
+			flex_array_free(fa);
+		mmput(mm);
+	}
+	}
+
+out:
+	put_task_struct(task);
+out_no_task:
+	return ret;
+}
+
+static const struct file_operations proc_map_files_operations = {
+	.read		= generic_read_dir,
+	.readdir	= proc_map_files_readdir,
+	.llseek		= default_llseek,
+};
+
+/*
  * /proc/pid/fd needs a special permission handler so that a process can still
  * access /proc/self/fd after it has executed a setuid().
  */
@@ -2785,6 +3143,7 @@ static const struct inode_operations pro
 static const struct pid_entry tgid_base_stuff[] = {
 	DIR("task",       S_IRUGO|S_IXUGO, proc_task_inode_operations, proc_task_operations),
 	DIR("fd",         S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations),
+	DIR("map_files",  S_IRUSR|S_IXUSR, proc_map_files_inode_operations, proc_map_files_operations),
 	DIR("fdinfo",     S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations),
 	DIR("ns",	  S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations),
 #ifdef CONFIG_NET
Index: linux-2.6.git/include/linux/mm.h
===================================================================
--- linux-2.6.git.orig/include/linux/mm.h
+++ linux-2.6.git/include/linux/mm.h
@@ -1491,6 +1491,18 @@ static inline unsigned long vma_pages(st
 	return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 }
 
+/* Look up the first VMA which exactly match the interval vm_start ... vm_end */
+static inline struct vm_area_struct *
+find_exact_vma(struct mm_struct *mm, unsigned long vm_start, unsigned long vm_end)
+{
+	struct vm_area_struct *vma = find_vma(mm, vm_start);
+
+	if (vma && (vma->vm_start != vm_start || vma->vm_end != vm_end))
+		vma = NULL;
+
+	return vma;
+}
+
 #ifdef CONFIG_MMU
 pgprot_t vm_get_page_prot(unsigned long vm_flags);
 #else

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-08  6:04                                               ` Cyrill Gorcunov
@ 2011-09-08 23:52                                                 ` Andrew Morton
  2011-09-09  0:24                                                   ` Pavel Emelyanov
  2011-09-09  5:48                                                   ` Cyrill Gorcunov
  2011-09-10 13:21                                                 ` Vasiliy Kulikov
  1 sibling, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2011-09-08 23:52 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Vasiliy Kulikov, Tejun Heo, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Thu, 8 Sep 2011 10:04:05 +0400
Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> fs, proc: Introduce the /proc/<pid>/map_files/ directory v11

Ho hum, I've pretty much run out of excuses to avoid merging this.

except...

We don't really want to bloat fs/proc/base.o by 4k until all the other
things which support c/r are mergeable and we know that the whole
project is actually useful.  When will we be at this stage?

<looks at the warning>

fs/proc/base.c: In function 'proc_map_files_instantiate':
fs/proc/base.c:2348: warning: assignment from incompatible pointer type

err, that code will crash at runtime and it isn't trivial to fix. 
How could this happen?

>
> ...
>
> +				if (fa)
> +					flex_array_free(fa);
>
> ...
>
> +		if (fa)
> +			flex_array_free(fa);

I think I'll do this:

From: Andrew Morton <akpm@linux-foundation.org>

Lots of callers are avoiding passing NULL into flex_array_free().  Move
the check into flex_array_free() in the usual fashion.

Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/base.c                 |    6 ++----
 lib/flex_array.c               |    2 ++
 security/selinux/ss/policydb.c |    9 +++------
 3 files changed, 7 insertions(+), 10 deletions(-)

diff -puN lib/flex_array.c~lib-flex_arrayc-accept-null-arg-to-flex_array_free lib/flex_array.c
--- a/lib/flex_array.c~lib-flex_arrayc-accept-null-arg-to-flex_array_free
+++ a/lib/flex_array.c
@@ -142,6 +142,8 @@ EXPORT_SYMBOL(flex_array_free_parts);
 
 void flex_array_free(struct flex_array *fa)
 {
+	if (!fa)
+		return;
 	flex_array_free_parts(fa);
 	kfree(fa);
 }
diff -puN fs/proc/base.c~lib-flex_arrayc-accept-null-arg-to-flex_array_free fs/proc/base.c
--- a/fs/proc/base.c~lib-flex_arrayc-accept-null-arg-to-flex_array_free
+++ a/fs/proc/base.c
@@ -2514,8 +2514,7 @@ static int proc_map_files_readdir(struct
 			fa = flex_array_alloc(sizeof(info), nr_files, GFP_KERNEL);
 			if (!fa || flex_array_prealloc(fa, 0, nr_files, GFP_KERNEL)) {
 				ret = -ENOMEM;
-				if (fa)
-					flex_array_free(fa);
+				flex_array_free(fa);
 				up_read(&mm->mmap_sem);
 				mmput(mm);
 				goto out;
@@ -2556,8 +2555,7 @@ static int proc_map_files_readdir(struct
 			p = flex_array_get(fa, i);
 			fput(p->file);
 		}
-		if (fa)
-			flex_array_free(fa);
+		flex_array_free(fa);
 		mmput(mm);
 	}
 	}
diff -puN security/selinux/ss/policydb.c~lib-flex_arrayc-accept-null-arg-to-flex_array_free security/selinux/ss/policydb.c
--- a/security/selinux/ss/policydb.c~lib-flex_arrayc-accept-null-arg-to-flex_array_free
+++ a/security/selinux/ss/policydb.c
@@ -769,16 +769,13 @@ void policydb_destroy(struct policydb *p
 		hashtab_destroy(p->symtab[i].table);
 	}
 
-	for (i = 0; i < SYM_NUM; i++) {
-		if (p->sym_val_to_name[i])
-			flex_array_free(p->sym_val_to_name[i]);
-	}
+	for (i = 0; i < SYM_NUM; i++)
+		flex_array_free(p->sym_val_to_name[i]);
 
 	kfree(p->class_val_to_struct);
 	kfree(p->role_val_to_struct);
 	kfree(p->user_val_to_struct);
-	if (p->type_val_to_struct_array)
-		flex_array_free(p->type_val_to_struct_array);
+	flex_array_free(p->type_val_to_struct_array);
 
 	avtab_destroy(&p->te_avtab);
 
_

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-08 23:52                                                 ` Andrew Morton
@ 2011-09-09  0:24                                                   ` Pavel Emelyanov
  2011-09-09  5:48                                                   ` Cyrill Gorcunov
  1 sibling, 0 replies; 25+ messages in thread
From: Pavel Emelyanov @ 2011-09-09  0:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Cyrill Gorcunov, Vasiliy Kulikov, Tejun Heo, Kirill A. Shutemov,
	containers@lists.osdl.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, Nathan Lynch,
	kernel-hardening@lists.openwall.com, Oren Laadan, Daniel Lezcano,
	Glauber Costa, James Bottomley, Alexey Dobriyan, Al Viro

On 09/09/2011 03:52 AM, Andrew Morton wrote:
> On Thu, 8 Sep 2011 10:04:05 +0400
> Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> 
>> fs, proc: Introduce the /proc/<pid>/map_files/ directory v11
> 
> Ho hum, I've pretty much run out of excuses to avoid merging this.
> 
> except...
> 
> We don't really want to bloat fs/proc/base.o by 4k until all the other
> things which support c/r are mergeable and we know that the whole
> project is actually useful.  When will we be at this stage?

Well, I see no other stuff that will be required for us in the nearest
future (well, and in the not-so-near future as well) in the fs/proc/base.o
to checkpoint or restore a task.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-08 23:52                                                 ` Andrew Morton
  2011-09-09  0:24                                                   ` Pavel Emelyanov
@ 2011-09-09  5:48                                                   ` Cyrill Gorcunov
  2011-09-09  6:00                                                     ` Andrew Morton
  1 sibling, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2011-09-09  5:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vasiliy Kulikov, Tejun Heo, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Thu, Sep 08, 2011 at 04:52:01PM -0700, Andrew Morton wrote:
> On Thu, 8 Sep 2011 10:04:05 +0400
> Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> 
> > fs, proc: Introduce the /proc/<pid>/map_files/ directory v11
> 
> Ho hum, I've pretty much run out of excuses to avoid merging this.
> 
> except...
> 
> We don't really want to bloat fs/proc/base.o by 4k until all the other
> things which support c/r are mergeable and we know that the whole
> project is actually useful.  When will we be at this stage?

I hope we will bring in a final set in a couple of weeks.

> 
> <looks at the warning>
> 
> fs/proc/base.c: In function 'proc_map_files_instantiate':
> fs/proc/base.c:2348: warning: assignment from incompatible pointer type
> 
> err, that code will crash at runtime and it isn't trivial to fix. 
> How could this happen?
> 

Hmm. I never saw this warning. (Andrew, I'm still unable to fetch
your current -mm tree, is there some place other than kernel.org?
So the patch is done on top of 3.1-rc3). I guess this warrning is
from p = flex_array_get(fa, i); ? (since I don't have any warning
at all).

> >
> > ...
> >
> > +				if (fa)
> > +					flex_array_free(fa);
> >
> > ...
> >
> > +		if (fa)
> > +			flex_array_free(fa);
> 
> I think I'll do this:
> 
> From: Andrew Morton <akpm@linux-foundation.org>
> 
> Lots of callers are avoiding passing NULL into flex_array_free().  Move
> the check into flex_array_free() in the usual fashion.
> 
> Cc: Stephen Smalley <sds@tycho.nsa.gov>
> Cc: James Morris <jmorris@namei.org>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---

Yeah, great. Moreover, flex_array_free calls for kfree which
support NULL argument so it's natural to make this one NULL
capable as well.

	Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-09  5:48                                                   ` Cyrill Gorcunov
@ 2011-09-09  6:00                                                     ` Andrew Morton
  2011-09-09  6:22                                                       ` Cyrill Gorcunov
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2011-09-09  6:00 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Vasiliy Kulikov, Tejun Heo, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Fri, 9 Sep 2011 09:48:19 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> > 
> > <looks at the warning>
> > 
> > fs/proc/base.c: In function 'proc_map_files_instantiate':
> > fs/proc/base.c:2348: warning: assignment from incompatible pointer type
> > 
> > err, that code will crash at runtime and it isn't trivial to fix. 
> > How could this happen?
> > 
> 
> Hmm. I never saw this warning. (Andrew, I'm still unable to fetch
> your current -mm tree, is there some place other than kernel.org?

Nope, sorry - we're dead in the water at present.

> So the patch is done on top of 3.1-rc3). I guess this warrning is
> from p = flex_array_get(fa, i); ? (since I don't have any warning
> at all).

The warning is from

	ei->op.proc_get_link = proc_map_files_get_link;

The lhs has type

union proc_op {
	int (*proc_get_link)(struct inode *, struct path *);

and the rhs has type

static int proc_map_files_get_link(struct dentry *dentry, struct path *path)

So we end up passing an inode* to a function which expects a dentry*.

That's in 3.1-rc4.  proc_op.proc_get_link() hasn't changed since 3.0 (at least).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-09  6:00                                                     ` Andrew Morton
@ 2011-09-09  6:22                                                       ` Cyrill Gorcunov
  0 siblings, 0 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2011-09-09  6:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vasiliy Kulikov, Tejun Heo, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, kernel-hardening,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Thu, Sep 08, 2011 at 11:00:20PM -0700, Andrew Morton wrote:
...
> > 
> > Hmm. I never saw this warning. (Andrew, I'm still unable to fetch
> > your current -mm tree, is there some place other than kernel.org?
> 
> Nope, sorry - we're dead in the water at present.
> 
> > So the patch is done on top of 3.1-rc3). I guess this warrning is
> > from p = flex_array_get(fa, i); ? (since I don't have any warning
> > at all).
> 
> The warning is from
> 
> 	ei->op.proc_get_link = proc_map_files_get_link;
> 
> The lhs has type
> 
> union proc_op {
> 	int (*proc_get_link)(struct inode *, struct path *);
> 
> and the rhs has type
> 
> static int proc_map_files_get_link(struct dentry *dentry, struct path *path)
> 
> So we end up passing an inode* to a function which expects a dentry*.
> 
> That's in 3.1-rc4.  proc_op.proc_get_link() hasn't changed since 3.0 (at least).

Crap. I know what happened. At first proposal time Tejun (iirc ;) said that
is might be better to separate proc_get_link change. I did so... and of
course I forgot to send it out ;) Ie it's in my queue and I dont see any
warnings for that reason. Sorry for that. Attached below.
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: fs, proc: Make proc_get_link to use dentry instead of inode

This patch prepares the ground for the next "map_files"
patch which needs a name of a link file to analyse.

So instead of squashing this change into one big
patch the separate one is done.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Tejun Heo <tj@kernel.org>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: "Kirill A. Shutemov" <kirill@shutemov.name>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Al Viro <viro@ZenIV.linux.org.uk>
CC: Andrew Morton <akpm@linux-foundation.org>
---
 fs/proc/base.c          |   20 ++++++++++----------
 include/linux/proc_fs.h |    2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)

Index: linux-2.6.git/fs/proc/base.c
===================================================================
--- linux-2.6.git.orig/fs/proc/base.c
+++ linux-2.6.git/fs/proc/base.c
@@ -165,9 +165,9 @@ static int get_task_root(struct task_str
 	return result;
 }
 
-static int proc_cwd_link(struct inode *inode, struct path *path)
+static int proc_cwd_link(struct dentry *dentry, struct path *path)
 {
-	struct task_struct *task = get_proc_task(inode);
+	struct task_struct *task = get_proc_task(dentry->d_inode);
 	int result = -ENOENT;
 
 	if (task) {
@@ -182,9 +182,9 @@ static int proc_cwd_link(struct inode *i
 	return result;
 }
 
-static int proc_root_link(struct inode *inode, struct path *path)
+static int proc_root_link(struct dentry *dentry, struct path *path)
 {
-	struct task_struct *task = get_proc_task(inode);
+	struct task_struct *task = get_proc_task(dentry->d_inode);
 	int result = -ENOENT;
 
 	if (task) {
@@ -1580,13 +1580,13 @@ static const struct file_operations proc
 	.release	= single_release,
 };
 
-static int proc_exe_link(struct inode *inode, struct path *exe_path)
+static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
 {
 	struct task_struct *task;
 	struct mm_struct *mm;
 	struct file *exe_file;
 
-	task = get_proc_task(inode);
+	task = get_proc_task(dentry->d_inode);
 	if (!task)
 		return -ENOENT;
 	mm = get_task_mm(task);
@@ -1616,7 +1616,7 @@ static void *proc_pid_follow_link(struct
 	if (!proc_fd_access_allowed(inode))
 		goto out;
 
-	error = PROC_I(inode)->op.proc_get_link(inode, &nd->path);
+	error = PROC_I(inode)->op.proc_get_link(dentry, &nd->path);
 out:
 	return ERR_PTR(error);
 }
@@ -1655,7 +1655,7 @@ static int proc_pid_readlink(struct dent
 	if (!proc_fd_access_allowed(inode))
 		goto out;
 
-	error = PROC_I(inode)->op.proc_get_link(inode, &path);
+	error = PROC_I(inode)->op.proc_get_link(dentry, &path);
 	if (error)
 		goto out;
 
@@ -1947,9 +1947,9 @@ static int proc_fd_info(struct inode *in
 	return -ENOENT;
 }
 
-static int proc_fd_link(struct inode *inode, struct path *path)
+static int proc_fd_link(struct dentry *dentry, struct path *path)
 {
-	return proc_fd_info(inode, path, NULL);
+	return proc_fd_info(dentry->d_inode, path, NULL);
 }
 
 static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
Index: linux-2.6.git/include/linux/proc_fs.h
===================================================================
--- linux-2.6.git.orig/include/linux/proc_fs.h
+++ linux-2.6.git/include/linux/proc_fs.h
@@ -253,7 +253,7 @@ extern const struct proc_ns_operations u
 extern const struct proc_ns_operations ipcns_operations;
 
 union proc_op {
-	int (*proc_get_link)(struct inode *, struct path *);
+	int (*proc_get_link)(struct dentry *, struct path *);
 	int (*proc_read)(struct task_struct *task, char *page);
 	int (*proc_show)(struct seq_file *m,
 		struct pid_namespace *ns, struct pid *pid,

	Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-08  6:04                                               ` Cyrill Gorcunov
  2011-09-08 23:52                                                 ` Andrew Morton
@ 2011-09-10 13:21                                                 ` Vasiliy Kulikov
  2011-09-10 13:49                                                   ` Cyrill Gorcunov
  1 sibling, 1 reply; 25+ messages in thread
From: Vasiliy Kulikov @ 2011-09-10 13:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Andrew Morton, Tejun Heo, Kirill A. Shutemov, containers,
	linux-kernel, linux-fsdevel, Nathan Lynch, Oren Laadan,
	Daniel Lezcano, Glauber Costa, James Bottomley, Alexey Dobriyan,
	Al Viro, Pavel Emelyanov

Hi Cyrill,

On Thu, Sep 08, 2011 at 10:04 +0400, Cyrill Gorcunov wrote:
> +static int map_files_d_revalidate(struct dentry *dentry, struct nameidata *nd)
> +{
> +	unsigned long vm_start, vm_end;
> +	bool exact_vma_exists = false;
> +	struct task_struct *task;
> +	const struct cred *cred;
> +	struct mm_struct *mm;
> +	struct inode *inode;
> +
> +	if (nd && nd->flags & LOOKUP_RCU)
> +		return -ECHILD;
> +
> +	inode = dentry->d_inode;
> +	task = get_proc_task(inode);
> +	if (!task)
> +		goto out;
> +
> +	if (!ptrace_may_access(task, PTRACE_MODE_READ))

        put_task_struct(task) belongs here.

> +		goto out;
> +
> +	mm = get_task_mm(task);
> +	put_task_struct(task);
> +	if (!mm)
> +		goto out;
> +
> +	if (!dname_to_vma_addr(dentry, &vm_start, &vm_end)) {
> +		down_read(&mm->mmap_sem);
> +		exact_vma_exists = !!find_exact_vma(mm, vm_start, vm_end);
> +		up_read(&mm->mmap_sem);
> +	}
> +
> +	mmput(mm);
> +
> +	if (exact_vma_exists) {
> +		if (task_dumpable(task)) {
> +			rcu_read_lock();
> +			cred = __task_cred(task);
> +			inode->i_uid = cred->euid;
> +			inode->i_gid = cred->egid;
> +			rcu_read_unlock();
> +		} else {
> +			inode->i_uid = 0;
> +			inode->i_gid = 0;
> +		}
> +		security_task_to_inode(task, inode);
> +		return 1;
> +	}
> +out:
> +	d_drop(dentry);
> +	return 0;
> +}

Thanks,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6
  2011-09-10 13:21                                                 ` Vasiliy Kulikov
@ 2011-09-10 13:49                                                   ` Cyrill Gorcunov
  0 siblings, 0 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2011-09-10 13:49 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: kernel-hardening, Andrew Morton, Tejun Heo, Kirill A. Shutemov,
	containers, linux-kernel, linux-fsdevel, Nathan Lynch,
	Oren Laadan, Daniel Lezcano, Glauber Costa, James Bottomley,
	Alexey Dobriyan, Al Viro, Pavel Emelyanov

On Sat, Sep 10, 2011 at 05:21:01PM +0400, Vasiliy Kulikov wrote:
...
> > +
> > +	if (!ptrace_may_access(task, PTRACE_MODE_READ))
> 
>         put_task_struct(task) belongs here.
> 

Yeah, thanks! I'll update.

	Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-09-10 13:49 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20110831075814.003575573@openvz.org>
     [not found] ` <20110831080229.100652529@openvz.org>
     [not found]   ` <20110831090612.GA3253@albatros>
     [not found]     ` <20110831112642.GI25465@sun>
     [not found]       ` <20110831140416.GA17626@shutemov.name>
     [not found]         ` <20110831142622.GB30615@sun>
     [not found]           ` <20110831151023.5b7e12da.akpm@linux-foundation.org>
     [not found]             ` <20110901080508.GF30615@sun>
2011-09-02 16:37               ` [kernel-hardening] Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6 Vasiliy Kulikov
2011-09-05 18:53                 ` Vasiliy Kulikov
2011-09-05 19:20                   ` Cyrill Gorcunov
2011-09-05 19:49                     ` Vasiliy Kulikov
2011-09-05 20:36                       ` Cyrill Gorcunov
2011-09-06 10:15                         ` Vasiliy Kulikov
2011-09-06 16:51                           ` Tejun Heo
2011-09-06 17:29                             ` Vasiliy Kulikov
2011-09-06 17:33                               ` Tejun Heo
2011-09-06 18:15                                 ` Cyrill Gorcunov
2011-09-07 11:23                                 ` Vasiliy Kulikov
2011-09-07 21:53                                   ` Cyrill Gorcunov
2011-09-07 22:13                                     ` Andrew Morton
2011-09-07 22:42                                       ` Cyrill Gorcunov
2011-09-07 22:53                                         ` Andrew Morton
2011-09-08  5:48                                           ` Cyrill Gorcunov
2011-09-08  5:50                                             ` Cyrill Gorcunov
2011-09-08  6:04                                               ` Cyrill Gorcunov
2011-09-08 23:52                                                 ` Andrew Morton
2011-09-09  0:24                                                   ` Pavel Emelyanov
2011-09-09  5:48                                                   ` Cyrill Gorcunov
2011-09-09  6:00                                                     ` Andrew Morton
2011-09-09  6:22                                                       ` Cyrill Gorcunov
2011-09-10 13:21                                                 ` Vasiliy Kulikov
2011-09-10 13:49                                                   ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox