[PULL REQUEST] : ima-appraisal patches

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PULL REQUEST] : ima-appraisal patches
@ 2012-04-18 13:04 Mimi Zohar
  2012-04-18 15:02 ` James Morris
  0 siblings, 1 reply; 32+ messages in thread
From: Mimi Zohar @ 2012-04-18 13:04 UTC (permalink / raw)
  To: James Morris
  Cc: linux-security-module, linux-kernel, linux-fsdevel, Al Viro,
	David Safford, Dmitry Kasatkin

Hi James,

As the last IMA-appraisal posting on 3/29 addressed Al's
performance/maintenance concerns of deferring the __fput() and there
hasn't been any additional comments, please consider pulling the
IMA-appraisal patches.

The linux-integrity.git also contains the two prereqs:
   vfs: fix IMA lockdep circular locking dependency  (Acked by Eric)
   vfs: iversion truncate bug fix (currently in linux-next, via Andrew)

The following changes since commit eadc10b3e17f00681f7bfb2ed6e4aee39ad93f03:

  vfs: extend vfs_removexattr locking (2012-04-18 07:06:55 -0400)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git next-ima-appraisal

thanks,

Mimi

Dmitry Kasatkin (3):
      ima: free securityfs violations file
      ima: allocating iint improvements
      ima: digital signature verification support

Mimi Zohar (8):
      vfs: move ima_file_free before releasing the file
      ima: integrity appraisal extension
      ima: add appraise action keywords and default rules
      ima: replace iint spinlock with rwlock/read_lock
      ima: add inode_post_setattr call
      ima: add ima_inode_setxattr/removexattr function and calls
      ima: defer calling __fput()
      ima: add support for different security.ima data types

 Documentation/ABI/testing/ima_policy  |   25 ++-
 Documentation/kernel-parameters.txt   |    8 +
 fs/attr.c                             |    2 +
 fs/file_table.c                       |    7 +-
 include/linux/ima.h                   |   32 +++
 include/linux/integrity.h             |    7 +-
 include/linux/xattr.h                 |    3 +
 security/integrity/evm/evm_main.c     |    3 +
 security/integrity/iint.c             |   64 +++----
 security/integrity/ima/Kconfig        |   15 ++
 security/integrity/ima/Makefile       |    2 +
 security/integrity/ima/ima.h          |   37 ++++-
 security/integrity/ima/ima_api.c      |   56 ++++--
 security/integrity/ima/ima_appraise.c |  344 +++++++++++++++++++++++++++++++++
 security/integrity/ima/ima_crypto.c   |    8 +-
 security/integrity/ima/ima_fs.c       |    1 +
 security/integrity/ima/ima_main.c     |   89 ++++++---
 security/integrity/ima/ima_policy.c   |  181 +++++++++++++-----
 security/integrity/integrity.h        |   11 +-
 security/security.c                   |    6 +
 20 files changed, 754 insertions(+), 147 deletions(-)
 create mode 100644 security/integrity/ima/ima_appraise.c



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PULL REQUEST] : ima-appraisal patches
  2012-04-18 13:04 [PULL REQUEST] : ima-appraisal patches Mimi Zohar
@ 2012-04-18 15:02 ` James Morris
  2012-04-18 18:07   ` Mimi Zohar
  0 siblings, 1 reply; 32+ messages in thread
From: James Morris @ 2012-04-18 15:02 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: linux-security-module, linux-kernel, linux-fsdevel, Al Viro,
	David Safford, Dmitry Kasatkin

On Wed, 18 Apr 2012, Mimi Zohar wrote:

> Hi James,
> 
> As the last IMA-appraisal posting on 3/29 addressed Al's
> performance/maintenance concerns of deferring the __fput() and there
> hasn't been any additional comments, please consider pulling the
> IMA-appraisal patches.

I thought Dmitry was still waiting on comments from Al for this.



- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PULL REQUEST] : ima-appraisal patches
  2012-04-18 15:02 ` James Morris
@ 2012-04-18 18:07   ` Mimi Zohar
  2012-04-18 18:39     ` Al Viro
  0 siblings, 1 reply; 32+ messages in thread
From: Mimi Zohar @ 2012-04-18 18:07 UTC (permalink / raw)
  To: James Morris
  Cc: linux-security-module, linux-kernel, linux-fsdevel, Al Viro,
	David Safford, Dmitry Kasatkin

On Thu, 2012-04-19 at 01:02 +1000, James Morris wrote:
> On Wed, 18 Apr 2012, Mimi Zohar wrote:
> 
> > Hi James,
> > 
> > As the last IMA-appraisal posting on 3/29 addressed Al's
> > performance/maintenance concerns of deferring the __fput() and there
> > hasn't been any additional comments, please consider pulling the
> > IMA-appraisal patches.
> 
> I thought Dmitry was still waiting on comments from Al for this.

Al NAK'ed the original 'ima: defer calling __fput()' patch from 3/22,
but we addressed both of Al's performance and maintaince concerns in the
last posting on 3/29, which was three weeks ago.  We haven't heard
back ...

>From the 'ima: defer calling __fput()' patch description:

ima_file_free(), which is called on __fput(), updates the file data
hash stored as an extended attribute to reflect file changes.  If a
file is closed before it is munmapped, __fput() is called with the
mmap_sem taken.  With IMA-appraisal enabled, this results in an
mmap_sem/i_mutex lockdep.  ima_defer_fput() increments the f_count to
defer the __fput() being called until after the mmap_sem is released.

The number of __fput() calls needing to be deferred is minimal.  Only
those files in policy, that were closed prior to the munmap and were
mmapped write, need to defer the __fput().

With this patch, on a clean F16 install, from boot to login, only
5 out of ~100,000 mmap_sem held fput() calls were deferred.

Changelog v4:
- ima_defer_fput() performance improvements
- move ima_defer_fput() call from remove_vma() to __fput()
- only defer mmapped files opened for write
- remove unnecessary ima_fput_mempool_destroy()

Mimi

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PULL REQUEST] : ima-appraisal patches
  2012-04-18 18:07   ` Mimi Zohar
@ 2012-04-18 18:39     ` Al Viro
  2012-04-18 20:56       ` Mimi Zohar
  2012-04-19 19:57       ` Mimi Zohar
  0 siblings, 2 replies; 32+ messages in thread
From: Al Viro @ 2012-04-18 18:39 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: James Morris, linux-security-module, linux-kernel, linux-fsdevel,
	David Safford, Dmitry Kasatkin

On Wed, Apr 18, 2012 at 02:07:52PM -0400, Mimi Zohar wrote:

> >From the 'ima: defer calling __fput()' patch description:
> 
> ima_file_free(), which is called on __fput(), updates the file data
> hash stored as an extended attribute to reflect file changes.  If a
> file is closed before it is munmapped, __fput() is called with the
> mmap_sem taken.  With IMA-appraisal enabled, this results in an
> mmap_sem/i_mutex lockdep.  ima_defer_fput() increments the f_count to
> defer the __fput() being called until after the mmap_sem is released.
> 
> The number of __fput() calls needing to be deferred is minimal.  Only
> those files in policy, that were closed prior to the munmap and were
> mmapped write, need to defer the __fput().
> 
> With this patch, on a clean F16 install, from boot to login, only
> 5 out of ~100,000 mmap_sem held fput() calls were deferred.

Assuming that it's commit 3cee52ffe8ca925bb1e96f804daa87f7e2e34e46
Author: Mimi Zohar <zohar@linux.vnet.ibm.com>
Date:   Fri Feb 24 06:23:12 2012 -0500

    ima: defer calling __fput()
in your tree, the NAK still stands.  For starters, but you are creating a
different locking rules for IMA-enabled builds and for everything else.
Moreover, this deferral is done only for files opened for write; the
rules are convoluted as hell *and* inviting abuses.  

NAKed at least until you come up with formal proof that there's no other
lock where fput() would be possible and ->i_mutex was not allowed.  This
is not a way to go; that kind of kludges leads to locking code that is
impossible to reason about.

PS: BTW, what the hell is "fput already scheduled" codepath about?
Why is it pr_info() and not an outright BUG_ON()?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PULL REQUEST] : ima-appraisal patches
  2012-04-18 18:39     ` Al Viro
@ 2012-04-18 20:56       ` Mimi Zohar
  2012-04-19 19:57       ` Mimi Zohar
  1 sibling, 0 replies; 32+ messages in thread
From: Mimi Zohar @ 2012-04-18 20:56 UTC (permalink / raw)
  To: Al Viro
  Cc: James Morris, linux-security-module, linux-kernel, linux-fsdevel,
	David Safford, Dmitry Kasatkin

On Wed, 2012-04-18 at 19:39 +0100, Al Viro wrote:
> On Wed, Apr 18, 2012 at 02:07:52PM -0400, Mimi Zohar wrote:
> 
> > >From the 'ima: defer calling __fput()' patch description:
> > 
> > ima_file_free(), which is called on __fput(), updates the file data
> > hash stored as an extended attribute to reflect file changes.  If a
> > file is closed before it is munmapped, __fput() is called with the
> > mmap_sem taken.  With IMA-appraisal enabled, this results in an
> > mmap_sem/i_mutex lockdep.  ima_defer_fput() increments the f_count to
> > defer the __fput() being called until after the mmap_sem is released.
> > 
> > The number of __fput() calls needing to be deferred is minimal.  Only
> > those files in policy, that were closed prior to the munmap and were
> > mmapped write, need to defer the __fput().
> > 
> > With this patch, on a clean F16 install, from boot to login, only
> > 5 out of ~100,000 mmap_sem held fput() calls were deferred.
> 
> Assuming that it's commit 3cee52ffe8ca925bb1e96f804daa87f7e2e34e46
> Author: Mimi Zohar <zohar@linux.vnet.ibm.com>
> Date:   Fri Feb 24 06:23:12 2012 -0500
> 
>     ima: defer calling __fput()
> in your tree, the NAK still stands.  For starters, but you are creating a
> different locking rules for IMA-enabled builds and for everything else.
> Moreover, this deferral is done only for files opened for write; the
> rules are convoluted as hell *and* inviting abuses.  

Yes, that is the updated version.  For performance, we limited deferring
the __fput() to only those files that could possibly change - open for
write, were closed before being munmapped, and that IMA-appraisal
maintains a file data hash as an xattr.  If the main concern is
different locking rules when IMA is enabled, then we could remove the
IMA criteria and rename ima_defer_fput() to something more generic.

As for "*and* inviting abuses", I'm not sure what you mean.

> NAKed at least until you come up with formal proof that there's no other
> lock where fput() would be possible and ->i_mutex was not allowed.  This
> is not a way to go; that kind of kludges leads to locking code that is
> impossible to reason about.

On __fput(), we need to update the security.ima xattr with a hash of the
file data.  The original thread discussion suggested changing the xattr
locking.  The filesystems seem to do their own xattr locking, but in
fs/xattr.c the i_mutex is taken before accessing the inode
setxattr/removexattr ops. 

hm, lockdep isn't complaining about anything else.  Not sure if that
qualifies as formal proof. 

> PS: BTW, what the hell is "fput already scheduled" codepath about?
> Why is it pr_info() and not an outright BUG_ON()?

I'll fix this.

thanks,

Mimi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PULL REQUEST] : ima-appraisal patches
  2012-04-18 18:39     ` Al Viro
  2012-04-18 20:56       ` Mimi Zohar
@ 2012-04-19 19:57       ` Mimi Zohar
  2012-04-20  0:43         ` [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) Al Viro
  1 sibling, 1 reply; 32+ messages in thread
From: Mimi Zohar @ 2012-04-19 19:57 UTC (permalink / raw)
  To: Al Viro
  Cc: James Morris, linux-security-module, linux-kernel, linux-fsdevel,
	David Safford, Dmitry Kasatkin

On Wed, 2012-04-18 at 19:39 +0100, Al Viro wrote:
> On Wed, Apr 18, 2012 at 02:07:52PM -0400, Mimi Zohar wrote:

> NAKed at least until you come up with formal proof that there's no other
> lock where fput() would be possible and ->i_mutex was not allowed.  

Has the discussion here moved from deferring the __fput() for the
mmap_sem/i_mutex lockdep side case, to taking the i_mutex in __fput() in
general?  Lockdep has not reported any problems, other than for the
mmap_sem/i_mutex scenario.

> This
> is not a way to go; that kind of kludges leads to locking code that is
> impossible to reason about.

Are you referring to defering the __fput() or taking the i_mutex in
__fput() in general?

The i_mutex is currently used to protect file data and metadata (eg.
chown, chmod, xattrs).  After the last file data change, the file
metadata needs to be updated to reflect the file data changes.  As
i_mutex is used for protecting both the file data and file metadata, why
would taking the i_mutex in __fput() be kludgie.

I'd really appreciate any help, suggestions.

thanks,

Mimi

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-19 19:57       ` Mimi Zohar
@ 2012-04-20  0:43         ` Al Viro
  2012-04-20  2:31           ` Linus Torvalds
  2012-04-20 18:54           ` Hugh Dickins
  0 siblings, 2 replies; 32+ messages in thread
From: Al Viro @ 2012-04-20  0:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: James Morris, linux-security-module, linux-kernel, linux-fsdevel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, Linus Torvalds,
	David Miller

On Thu, Apr 19, 2012 at 03:57:28PM -0400, Mimi Zohar wrote:

> Has the discussion here moved from deferring the __fput() for the
> mmap_sem/i_mutex lockdep side case, to taking the i_mutex in __fput() in
> general?  Lockdep has not reported any problems, other than for the
> mmap_sem/i_mutex scenario.
> 
> > This
> > is not a way to go; that kind of kludges leads to locking code that is
> > impossible to reason about.
> 
> Are you referring to defering the __fput() or taking the i_mutex in
> __fput() in general?

I'm referring to having "are we holding ->mmap_sem" logics anywhere in fput().
Note that your "lockdep has not reported..." is a symptom of the same
problem - it's *NOT* enough to show the lack of deadlocks from the change
of locking rules.  And after that change we'll get even worse situation,
since proving the safety will become harder (and even more so if we end
up adding other cases when we need to defer).

Summary for those who'd been spared the earlier fun: IMA stuff wants to grab
->i_mutex on the final fput() in some cases.  That violates the locking
order in at least one way - final fput() may come under ->mmap_sem, in
which case grabbing ->i_mutex would be a Bloody Bad Idea(tm).  "Solution"
proposed: a bit of spectaculary ugly logics checking in final fput() whether
we have ->mmap_sem locked.  If we do, said final fput() becomes non-final
and is done asynchronously.  This is fundamentally flawed, of course,
since "is ->mmap_sem unlocked" is used as "is it safe to have ->i_mutex
grabbed", and it's both unproven and brittle as hell even if it happens
to be true right now (and I wouldn't bet on that, TBH).

If it had been IMA alone, I would've cheerfully told them where to stuff
the whole thing.  Unfortunately, fput() *is* lock-heavy even without them.
Note that it can easily end up triggering such things as final
deactivate_super().  Now, ->mmap_sem is irrelevant here - after all,
any inodes involved won't be mmapped, or deactivate_super() wouldn't
be final.  However, fput() is called e.g. under rtnl_lock() and I'm
not at all sure that something like NFS won't try to grab it from its
->kill_sb().

So the question is, do we have any reasonable way to get to the situation
where the __fput() would only be done without any locks held?  It might
be worth trying. What we CAN'T do:
	* defer all __fput() (i.e. always do final fput() async).  Obviously
no-go, for performance reasons alone.
	* check some predicate about the set of locks held and defer if it's
false.  That's what IMA folks have tried to do; no-go for the reasons described
above.
	* add a new helper (fput_carefully() or something like that) that would
defer final __fput(), leaving fput() with the current behaviour.  And convert
the potentially unsafe callers to it (starting with everything that holds
->mmap_sem).  No-go since it's not maintainable - a change pretty far away
from a function that does (currently safe) fput() can end up requiring the
conversion to fput_carefully().  Too easy to miss, so this will decay and it
won't be easy to verify correctness several years down the road.

However, there's an approach that might be feasible.  Most of the time
the final fput() *is* done without any locks held and there's a very
large subclass of those call sites - those that come via fput_light().  
What we could do, and what might be maintainable is:
	* prohibit fput_light() with locks held.  Right now we are very
close to that (or already there - I haven't finished checking).
	* convert low-hanging fget/fput in syscalls to fget_light/fput_light.
Makes sense anyway.
	* split off fput_nodefer() (identical to what fput() is right now),
make fput_light() call it instead of fput().  Switch path_lookupat() and
path_openat() to fput_nodefer() as well.  Ditto for sys_socketpair() and
sys_accept4().
	* make fput() (now almost never leading to __fput()) defer the sucker
in very rare cases when it ends up dropping the last reference.
At that point we would have __fput() always done without any locks held,
which would remove all restrictions on the locks that can be taken from it.

Comments?

Disclaimer: I have not finished going through the call sites (note that
there are wrappers - e.g. sockfd_put() *and* sockfd_lookup()), so there might
be obstacles.  In particular, I'm still not sure about SCM_RIGHTS datagrams
handling - IIRC, we had an interesting kludge in there, with a kinda-sorta
deferral/batching.  BTW, I wonder about deadlocks around that one -
drivers/vhost/net.c is doing ->recvmsg() while holding vq->mutex and if
an SCM_RIGHTS datagram comes in, we'll get a bunch of fput() done.  Which
can lead to final umount of a network filesystem, etc.  If that thing can
lead to handle_rx() on the same sucker, we have a deadlock...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20  0:43         ` [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) Al Viro
@ 2012-04-20  2:31           ` Linus Torvalds
  2012-04-20  2:54             ` Al Viro
  2012-04-20 18:54           ` Hugh Dickins
  1 sibling, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2012-04-20  2:31 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller

On Thu, Apr 19, 2012 at 5:43 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> However, there's an approach that might be feasible.  Most of the time
> the final fput() *is* done without any locks held and there's a very
> large subclass of those call sites - those that come via fput_light().
> What we could do, and what might be maintainable is:
>        * prohibit fput_light() with locks held.  Right now we are very
> close to that (or already there - I haven't finished checking).
>        * convert low-hanging fget/fput in syscalls to fget_light/fput_light.
> Makes sense anyway.

Many of them would make sense, yes (looking at vfs_fstatat() etc.

But a lot of fput() calls come from close() -> filp_close -> fput().

And the "fput_light()" model *only* works together with fget_light()
as it is now.

So I do think you need some other model. Of course, we can just do
"fput_light(file, 1)" instead - that seems pretty ugly, though. But
just making "fput()" do a defer on the last count sounds actively
*wrong* for things like close(), which may actually have serious
consistency guarantees (ie the process doing the close() may "know"
that it is the last user, and depend on the fact that the close() did
actually delete the inode etc.

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20  2:31           ` Linus Torvalds
@ 2012-04-20  2:54             ` Al Viro
  2012-04-20  2:58               ` Linus Torvalds
  2012-04-20  3:15               ` Al Viro
  0 siblings, 2 replies; 32+ messages in thread
From: Al Viro @ 2012-04-20  2:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller

On Thu, Apr 19, 2012 at 07:31:01PM -0700, Linus Torvalds wrote:
> On Thu, Apr 19, 2012 at 5:43 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > However, there's an approach that might be feasible. ?Most of the time
> > the final fput() *is* done without any locks held and there's a very
> > large subclass of those call sites - those that come via fput_light().
> > What we could do, and what might be maintainable is:
> > ? ? ? ?* prohibit fput_light() with locks held. ?Right now we are very
> > close to that (or already there - I haven't finished checking).
> > ? ? ? ?* convert low-hanging fget/fput in syscalls to fget_light/fput_light.
> > Makes sense anyway.
> 
> Many of them would make sense, yes (looking at vfs_fstatat() etc.
> 
> But a lot of fput() calls come from close() -> filp_close -> fput().
> 
> And the "fput_light()" model *only* works together with fget_light()
> as it is now.
> 
> So I do think you need some other model. Of course, we can just do
> "fput_light(file, 1)" instead - that seems pretty ugly, though. But
> just making "fput()" do a defer on the last count sounds actively
> *wrong* for things like close(), which may actually have serious
> consistency guarantees (ie the process doing the close() may "know"
> that it is the last user, and depend on the fact that the close() did
> actually delete the inode etc.

Umm...  I really wonder if we *want* filp_close() under any kind of
locks.  You are right - it should not be deferred.  I haven't finished
checking the callers of that puppy, but if we really do it while holding
any kind of lock, we are asking for trouble.  So I'd rather switch
filp_close() to use of fput_nodefer() if that turns out to be possible.

FWIW, the set of primitives I'm thinking of right now is

__fput(file) - same as now
schedule_fput(file) - takes the only reference to file and schedules __fput()
fput_nodefer(file)
{
        if (atomic_long_dec_and_test(&file->f_count))
                __fput(file);
}
fput(file)
{
	if (unlikely(!fput_atomic(file))
		schedule_fput(file);
}
fput_light(file, need_fput)
{
	if (need_fput)
		fput_nodefer(file);
}
fput_light_defer(file, need_fput) // for callers in some weird ioctls, might
				  // not be needed at all
{
	if (need_fput)
		fput(file);
}

and filp_close() would, if that turns out to be possible, call fput_nodefer()
instead of fput().  If we *do* have places where we need deferral in
filp_close() (and I'm fairly sure that any such place is a deadlock right
now), well, we'll need a variant of filp_close() sans the call of fput...()
and those places would call that, followed by full (deferring) fput().

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20  2:54             ` Al Viro
@ 2012-04-20  2:58               ` Linus Torvalds
  2012-04-20  8:09                 ` Al Viro
  2012-04-20  3:15               ` Al Viro
  1 sibling, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2012-04-20  2:58 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller

On Thu, Apr 19, 2012 at 7:54 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Umm...  I really wonder if we *want* filp_close() under any kind of
> locks.  You are right - it should not be deferred.  I haven't finished
> checking the callers of that puppy, but if we really do it while holding
> any kind of lock, we are asking for trouble.  So I'd rather switch
> filp_close() to use of fput_nodefer() if that turns out to be possible.

Ok, fair enough, looks like a reasonable plan to me.

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20  2:58               ` Linus Torvalds
@ 2012-04-20  8:09                 ` Al Viro
  2012-04-20 15:56                   ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Al Viro @ 2012-04-20  8:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller

On Thu, Apr 19, 2012 at 07:58:57PM -0700, Linus Torvalds wrote:
> On Thu, Apr 19, 2012 at 7:54 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Umm... ?I really wonder if we *want* filp_close() under any kind of
> > locks. ?You are right - it should not be deferred. ?I haven't finished
> > checking the callers of that puppy, but if we really do it while holding
> > any kind of lock, we are asking for trouble. ?So I'd rather switch
> > filp_close() to use of fput_nodefer() if that turns out to be possible.
> 
> Ok, fair enough, looks like a reasonable plan to me.

Actually, scratch that; I think I have a better variant
	* switch to fget_light/fput_light where possible; it's not needed
for the rest, but is useful anyway
	* move the guts of filp_close() (everything prior to fput() it does
in the end) into a new helper, turning filp_close() into a couple of calls
(inlined, at that).  Equivalent transformation.
	* add fput_nodefer(); does the same thing fput() does now.  Make
fput() defer the call of __fput().  Add a filp_close_nodefer() - same as
filp_close(), but the second call in it is fput_nodefer(), not fput().  And
switch the callers that are known to be done outside of any locks to
filp_close_nodefer().  Switch accept4()/socketpair() to fput_nodefer()
(for freshly created struct file in case of cleanup on failure exit).

Note that we do *not* need to bother with fput_light() - even if it does
fput(), that fput() won't usually be the final one.  We'd need it to
race with close() or dup2() from another thread for that to happen.  So
we only need to review the callers of filp_close() and there are much
fewer of those.

We also get something else out of that - AFAICS, the kludge in __scm_destroy()
can be killed after that.  We did it to prevent recursion on fput(), right?
Now that recursion will be gone...

I'd probably not bother with trying to be clever in doing the deferral itself -
the point is to make it rare, so it's not a hot path anyway.  We can play
with per-CPU lists, etc., but in this case dumber is better.

Comments?  I'm half-asleep right now, so I might be missing something
obvious...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20  8:09                 ` Al Viro
@ 2012-04-20 15:56                   ` Linus Torvalds
  2012-04-20 16:08                     ` Al Viro
  0 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2012-04-20 15:56 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller

On Fri, Apr 20, 2012 at 1:09 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Note that we do *not* need to bother with fput_light() - even if it does
> fput(), that fput() won't usually be the final one.

Ack. Most of the time the fput_light()->fput will just decrement the use count.

> We also get something else out of that - AFAICS, the kludge in __scm_destroy()
> can be killed after that.  We did it to prevent recursion on fput(), right?
> Now that recursion will be gone...

Hmm.. That points out that we may have a *lot* of these pending final
fput's, though. So the deferral queueing should be fairly light. What
were your particular plans for it?

This actually sounds like a fairly good usage-case for Oleg's new
task_work_add() thing. That would defer the final fput, but at the
same time guarantee that it gets done before returning to user space -
in case there are any issues with synchronous actions. Have you looked
at Oleg's series? You weren't cc'd because it didn't affect you, but
look at lkml for "task_work_add()" to find it.

NOTE! If pure kernel threads do fput() deferral (and maybe they do -
I'm thinking nfsd etc), then the task-work thing might need some extra
thought.

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 15:56                   ` Linus Torvalds
@ 2012-04-20 16:08                     ` Al Viro
  2012-04-20 16:42                       ` Al Viro
  0 siblings, 1 reply; 32+ messages in thread
From: Al Viro @ 2012-04-20 16:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller

On Fri, Apr 20, 2012 at 08:56:57AM -0700, Linus Torvalds wrote:
> > We also get something else out of that - AFAICS, the kludge in __scm_destroy()
> > can be killed after that. ?We did it to prevent recursion on fput(), right?
> > Now that recursion will be gone...
> 
> Hmm.. That points out that we may have a *lot* of these pending final
> fput's, though. So the deferral queueing should be fairly light. What
> were your particular plans for it?

Doing removal from per-sb list immediately (i.e. before possible
deferral; we skip ones with zero ->f_count when we walk the list
anyway), then in case we decide to defer just move them to per-CPU
list and schedule work on that CPU, with handler that will pull the
corresponding list out and do the rest of __fput() for everything
in that list.  No extra locking, just preempt_disable() around the
"move to per-CPU list" bit.  Or a per-CPU spinlock with worker not
being tied to specific CPU and told which CPU's list to work with.
How does CPU hotplug interact with work scheduled on CPU about to
be taken down, BTW?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 16:08                     ` Al Viro
@ 2012-04-20 16:42                       ` Al Viro
  2012-04-20 17:21                         ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Al Viro @ 2012-04-20 16:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller

On Fri, Apr 20, 2012 at 05:08:48PM +0100, Al Viro wrote:

> Doing removal from per-sb list immediately (i.e. before possible
> deferral; we skip ones with zero ->f_count when we walk the list
> anyway), then in case we decide to defer just move them to per-CPU
> list and schedule work on that CPU, with handler that will pull the
> corresponding list out and do the rest of __fput() for everything
> in that list.  No extra locking, just preempt_disable() around the
> "move to per-CPU list" bit.  Or a per-CPU spinlock with worker not
> being tied to specific CPU and told which CPU's list to work with.
> How does CPU hotplug interact with work scheduled on CPU about to
> be taken down, BTW?

Actually, I like the per-CPU spinlock variant better; the thing is,
with that scheme we get normal fput() (i.e. non-nodefer variant)
non-blocking.  How about this:

__fput() loses file_sb_list_del() call

fput(file)
{
	if (atomic_long_dec_and_test(...)) {
		unsigned long flags;
		struct foo *p;
		file_sb_list_del(file);
		p = get_cpu_var(deferral_lists);
		spin_lock_irqsave(&p->lock, flags);
		list_move(&file->f_u.fu_list, &p->list);
		spin_unlock_irqrestore(&p->lock, flags);
		schedule_work(&p->work);
		put_cpu_var(p);
	}
}

fput_nodefer(file)
{
	if (atomic_long_dec_and_test(...)) {
		file_sb_list_del(file);
		__fput(file);
	}
}

do_deferred_fput_work(work)
{
	struct foo *p = container_of(work, struct foo, work);
	LIST_HEAD(list);
	spin_lock_irq(&p->lock);
	list_splice_init(&p->list, list);
	spin_unlock_irq(&p->lock);
	while (!list_empty(list)) {
		struct file *file = list_entry(list, struct file, f_u.fu_list);
		list_del_init(&file->f_u.fu_list);
		__fput(file);
	}
}

Voila - now only fput_nodefer() is blocking!  fput() can be used from
any context that way, which should kill e.g. a kludge in fs/aio.c.

Comments?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 16:42                       ` Al Viro
@ 2012-04-20 17:21                         ` Linus Torvalds
  2012-04-20 18:07                           ` Al Viro
  0 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2012-04-20 17:21 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller

On Fri, Apr 20, 2012 at 9:42 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Actually, I like the per-CPU spinlock variant better; the thing is,
> with that scheme we get normal fput() (i.e. non-nodefer variant)
> non-blocking.  How about this:

What's the advantage of a per-cpu lock?

If you make the work be per-cpu, then you're better with no locking at
all: just disable interrupts (which you do anyway).

And if you want to use a spinlock, don't bother with the percpu side.

The thing I do not like about the schedule_work approach is that it
(a) totally hides the real cost  - which is the scheduling - and (b)
it is so asynchronous that it will happen potentially long after the
task dropped the reference.

And seriously - that is user-visible behavior.

For example, think about this *common* pattern:

  open+mmap+close+unlink+munmap

which would trigger the whole deferred fput, but also triggers the
actual real unlink() at fput time.

Right now, you can have that kind of thing in a program and
immediately unmount the filesystem afterwards (replace "unmount" with
"cannot see silly-renamed files" etc).

The "totally asynchronous deferral" literally *breaks*semantics*.

Sure, it won't be noticeable in 99.99% of all cases, and I doubt you
can trigger much of a test for it. But it's potential real breakage,
and it's going to be hard to ever see. And then when it *does* happen,
it's going to be totally impossible to debug.

It's not just the "last unlink" thing that gets delayed. It things
like file locking. It's "drop_file_write_access()". It's whatever
random thing that file does at "release()". It's a ton of things like
that. Delaying them has user-visible actions.

That's a whole can of complexities and worries outside of the kernel
interface that you are completely ignoring - just because you are
trying to solve the *simple* complexity of locking interaction
entirely within the kernel.

I think that's a bit myopic. We don't even *know* what the problems
with the async approach might be. Your "simple" solution is simple
only inside the kernel.

This is why I suggested you look at Oleg's patches. If we guarantee
that things won't be delayed past re-entering user mode, all those
issues go away.

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 17:21                         ` Linus Torvalds
@ 2012-04-20 18:07                           ` Al Viro
  0 siblings, 0 replies; 32+ messages in thread
From: Al Viro @ 2012-04-20 18:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller

On Fri, Apr 20, 2012 at 10:21:35AM -0700, Linus Torvalds wrote:
> On Fri, Apr 20, 2012 at 9:42 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Actually, I like the per-CPU spinlock variant better; the thing is,
> > with that scheme we get normal fput() (i.e. non-nodefer variant)
> > non-blocking. ?How about this:
> 
> What's the advantage of a per-cpu lock?
> 
> If you make the work be per-cpu, then you're better with no locking at
> all: just disable interrupts (which you do anyway).

Point taken.

> The thing I do not like about the schedule_work approach is that it
> (a) totally hides the real cost  - which is the scheduling - and (b)
> it is so asynchronous that it will happen potentially long after the
> task dropped the reference.

[snip]

> This is why I suggested you look at Oleg's patches. If we guarantee
> that things won't be delayed past re-entering user mode, all those
> issues go away.

I've looked at them.  One obvious problem is that it tracehook_notify_resume()
is not universally called.  AFAICS, hexagon, m68k, microblaze, score, um
and xtensa never call tracehook_notify_resume().  Out of those, hexagon is
manually checking TIF_NOTIFY_RESUME and does key_replace_session_keyring(),
so the call could be easily added into the same place; the rest of those
guys don't even look at TIF_NOTIFY_RESUME anywhere near their signal*.c
and m68k/um/xtensa don't even have it defined, let alone handled.  So this
stuff depends on some amount of asm glue hacking on several architectures ;-/

Another is that final fput() can, indeed, happen in kernel threads, so
pure switch to task_work_add() won't be enough.  I agree that having this
stuff flushed before we return to userland would be a good thing; the
question is, can we easily tell if there will be such a return to userland?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20  2:54             ` Al Viro
  2012-04-20  2:58               ` Linus Torvalds
@ 2012-04-20  3:15               ` Al Viro
  1 sibling, 0 replies; 32+ messages in thread
From: Al Viro @ 2012-04-20  3:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller

On Fri, Apr 20, 2012 at 03:54:38AM +0100, Al Viro wrote:

> and filp_close() would, if that turns out to be possible, call fput_nodefer()
> instead of fput().  If we *do* have places where we need deferral in
> filp_close() (and I'm fairly sure that any such place is a deadlock right
> now), well, we'll need a variant of filp_close() sans the call of fput...()
> and those places would call that, followed by full (deferring) fput().

BTW, some of those filp_close() are simply wrong - e.g. a big pile in
drivers/media/video/cx25821/*.c should be fput().

And yes, we have at least some under mutex - binder_lock held by
binder_ioctl(), which ends up doing binder_transaction() with its
failure cleanup hitting close_filp().  On an arbitrary struct file.
It *probably* doesn't deadlock on the spot, since binder_release()
itself contains a deferral mechanism (perhaps they'd spotted the
deadlock, perhaps there are other reasons to have it).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20  0:43         ` [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) Al Viro
  2012-04-20  2:31           ` Linus Torvalds
@ 2012-04-20 18:54           ` Hugh Dickins
  2012-04-20 19:04             ` Al Viro
  1 sibling, 1 reply; 32+ messages in thread
From: Hugh Dickins @ 2012-04-20 18:54 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, Linus Torvalds,
	David Miller, Andrew Morton

On Fri, 20 Apr 2012, Al Viro wrote:
> On Thu, Apr 19, 2012 at 03:57:28PM -0400, Mimi Zohar wrote:
> 
> > Has the discussion here moved from deferring the __fput() for the
> > mmap_sem/i_mutex lockdep side case, to taking the i_mutex in __fput() in
> > general?  Lockdep has not reported any problems, other than for the
> > mmap_sem/i_mutex scenario.
> > 
> > > This
> > > is not a way to go; that kind of kludges leads to locking code that is
> > > impossible to reason about.
> > 
> > Are you referring to defering the __fput() or taking the i_mutex in
> > __fput() in general?
> 
> I'm referring to having "are we holding ->mmap_sem" logics anywhere in fput().
> Note that your "lockdep has not reported..." is a symptom of the same
> problem - it's *NOT* enough to show the lack of deadlocks from the change
> of locking rules.  And after that change we'll get even worse situation,
> since proving the safety will become harder (and even more so if we end
> up adding other cases when we need to defer).
> 
> Summary for those who'd been spared the earlier fun: IMA stuff wants to grab
> ->i_mutex on the final fput() in some cases.  That violates the locking
> order in at least one way - final fput() may come under ->mmap_sem, in
> which case grabbing ->i_mutex would be a Bloody Bad Idea(tm).  "Solution"
> proposed: a bit of spectaculary ugly logics checking in final fput() whether
> we have ->mmap_sem locked.  If we do, said final fput() becomes non-final
> and is done asynchronously.  This is fundamentally flawed, of course,
> since "is ->mmap_sem unlocked" is used as "is it safe to have ->i_mutex
> grabbed", and it's both unproven and brittle as hell even if it happens
> to be true right now (and I wouldn't bet on that, TBH).

I can see that the discussion has since moved on quite a way from here.

But it looks to me fairly easy for mm to stop doing fput() under mmap_sem.

That's already the case when exiting (no mmap_sem held), and shouldn't add
observable cost when unmapping (we already work on a chain of vmas to be
freed, and when unmapping that chain will usually just be of one: shouldn't
matter to defer a final pass until after mmap_sem is dropped).  Unless I'm
mistaken, the fput() buried in vma_adjust() can never be a final fput.

Is it worth my trying to implement that?  Or do you see an immediate
gotcha that I'm missing?  Or are you happy enough with your deferred
fput() ideas, that it would be a waste of time to rearrange the mm end?

Hugh

> 
> If it had been IMA alone, I would've cheerfully told them where to stuff
> the whole thing.  Unfortunately, fput() *is* lock-heavy even without them.
> Note that it can easily end up triggering such things as final
> deactivate_super().  Now, ->mmap_sem is irrelevant here - after all,
> any inodes involved won't be mmapped, or deactivate_super() wouldn't
> be final.  However, fput() is called e.g. under rtnl_lock() and I'm
> not at all sure that something like NFS won't try to grab it from its
> ->kill_sb().
> 
> So the question is, do we have any reasonable way to get to the situation
> where the __fput() would only be done without any locks held?  It might
> be worth trying. What we CAN'T do:
> 	* defer all __fput() (i.e. always do final fput() async).  Obviously
> no-go, for performance reasons alone.
> 	* check some predicate about the set of locks held and defer if it's
> false.  That's what IMA folks have tried to do; no-go for the reasons described
> above.
> 	* add a new helper (fput_carefully() or something like that) that would
> defer final __fput(), leaving fput() with the current behaviour.  And convert
> the potentially unsafe callers to it (starting with everything that holds
> ->mmap_sem).  No-go since it's not maintainable - a change pretty far away
> from a function that does (currently safe) fput() can end up requiring the
> conversion to fput_carefully().  Too easy to miss, so this will decay and it
> won't be easy to verify correctness several years down the road.
> 
> However, there's an approach that might be feasible.  Most of the time
> the final fput() *is* done without any locks held and there's a very
> large subclass of those call sites - those that come via fput_light().  
> What we could do, and what might be maintainable is:
> 	* prohibit fput_light() with locks held.  Right now we are very
> close to that (or already there - I haven't finished checking).
> 	* convert low-hanging fget/fput in syscalls to fget_light/fput_light.
> Makes sense anyway.
> 	* split off fput_nodefer() (identical to what fput() is right now),
> make fput_light() call it instead of fput().  Switch path_lookupat() and
> path_openat() to fput_nodefer() as well.  Ditto for sys_socketpair() and
> sys_accept4().
> 	* make fput() (now almost never leading to __fput()) defer the sucker
> in very rare cases when it ends up dropping the last reference.
> At that point we would have __fput() always done without any locks held,
> which would remove all restrictions on the locks that can be taken from it.
> 
> Comments?
> 
> Disclaimer: I have not finished going through the call sites (note that
> there are wrappers - e.g. sockfd_put() *and* sockfd_lookup()), so there might
> be obstacles.  In particular, I'm still not sure about SCM_RIGHTS datagrams
> handling - IIRC, we had an interesting kludge in there, with a kinda-sorta
> deferral/batching.  BTW, I wonder about deadlocks around that one -
> drivers/vhost/net.c is doing ->recvmsg() while holding vq->mutex and if
> an SCM_RIGHTS datagram comes in, we'll get a bunch of fput() done.  Which
> can lead to final umount of a network filesystem, etc.  If that thing can
> lead to handle_rx() on the same sucker, we have a deadlock...
> --

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 18:54           ` Hugh Dickins
@ 2012-04-20 19:04             ` Al Viro
  2012-04-20 19:18               ` Linus Torvalds
  2012-04-20 19:37               ` Al Viro
  0 siblings, 2 replies; 32+ messages in thread
From: Al Viro @ 2012-04-20 19:04 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, Linus Torvalds,
	David Miller, Andrew Morton

On Fri, Apr 20, 2012 at 11:54:01AM -0700, Hugh Dickins wrote:
> I can see that the discussion has since moved on quite a way from here.
> 
> But it looks to me fairly easy for mm to stop doing fput() under mmap_sem.
> 
> That's already the case when exiting (no mmap_sem held), and shouldn't add
> observable cost when unmapping (we already work on a chain of vmas to be
> freed, and when unmapping that chain will usually just be of one: shouldn't
> matter to defer a final pass until after mmap_sem is dropped).  Unless I'm
> mistaken, the fput() buried in vma_adjust() can never be a final fput.
> 
> Is it worth my trying to implement that?  Or do you see an immediate
> gotcha that I'm missing?  Or are you happy enough with your deferred
> fput() ideas, that it would be a waste of time to rearrange the mm end?

Deferring the final pass after dropping ->mmap_sem is going to be
interesting; what would protect ->vm_next on those suckers?  Locking
rules are already bloody complicated in mm/*; this will only add more
subtle fun.  Note that right now both the dissolution of ->vm_next
list and freeing of VMAs happen under ->mmap_sem; changing that might
cost us a lot of headache...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 19:04             ` Al Viro
@ 2012-04-20 19:18               ` Linus Torvalds
  2012-04-20 19:32                 ` Hugh Dickins
  2012-04-20 19:58                 ` Al Viro
  2012-04-20 19:37               ` Al Viro
  1 sibling, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2012-04-20 19:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, linux-fsdevel, James Morris, linux-security-module,
	linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar,
	David Miller, Andrew Morton

On Fri, Apr 20, 2012 at 12:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Deferring the final pass after dropping ->mmap_sem is going to be
> interesting; what would protect ->vm_next on those suckers?

Just remove them from the active list, and keep them linked to each
other using vm_next.

After all, the only case we care about is the case where the vma gets
removed entirely, so we just put them on their own list.

In fact, that's what we already do for other reasons. See
detach_vmas_to_be_unmapped().

So vm_next is actually entirely *private* by this time, and needs no
locking at all.

As far as I can tell, we could just make do_munmap() return that
private list, and then do the fput's and freeing of the list outside
the mmap_sem lock.

That actually looks pretty simple. There are a fair number of callers,
which looks to be the main annoyance. But fixing it up with some
pattern of "do finish_munmap after drooping the mmap_sem" doesn't look
*complicated*, just slightly annoying.

The *bigger* annoyance is actually "do_mmap()", which does a
do_munmap() as part of it, so it needs the same cleanup too.

There might be other cases that do munmap as part of their operation
(and that have the mmap_sem held outside of the caller), but
do_munmap() and do_mmap() seem to be the two obvious ones.

Many of the callers seem to do the mmap_sem() right around the call,
though (see binfmt_elf.c for example), so it really would be a rather
local fixup.

                   Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 19:18               ` Linus Torvalds
@ 2012-04-20 19:32                 ` Hugh Dickins
  2012-04-20 19:58                 ` Al Viro
  1 sibling, 0 replies; 32+ messages in thread
From: Hugh Dickins @ 2012-04-20 19:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-fsdevel, James Morris, linux-security-module,
	linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar,
	David Miller, Andrew Morton

On Fri, 20 Apr 2012, Linus Torvalds wrote:
> On Fri, Apr 20, 2012 at 12:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Deferring the final pass after dropping ->mmap_sem is going to be
> > interesting; what would protect ->vm_next on those suckers?
> 
> Just remove them from the active list, and keep them linked to each
> other using vm_next.
> 
> After all, the only case we care about is the case where the vma gets
> removed entirely, so we just put them on their own list.
> 
> In fact, that's what we already do for other reasons. See
> detach_vmas_to_be_unmapped().
> 
> So vm_next is actually entirely *private* by this time, and needs no
> locking at all.
> 
> As far as I can tell, we could just make do_munmap() return that
> private list, and then do the fput's and freeing of the list outside
> the mmap_sem lock.
> 
> That actually looks pretty simple. There are a fair number of callers,
> which looks to be the main annoyance. But fixing it up with some
> pattern of "do finish_munmap after drooping the mmap_sem" doesn't look
> *complicated*, just slightly annoying.
> 
> The *bigger* annoyance is actually "do_mmap()", which does a
> do_munmap() as part of it, so it needs the same cleanup too.
> 
> There might be other cases that do munmap as part of their operation
> (and that have the mmap_sem held outside of the caller), but
> do_munmap() and do_mmap() seem to be the two obvious ones.
> 
> Many of the callers seem to do the mmap_sem() right around the call,
> though (see binfmt_elf.c for example), so it really would be a rather
> local fixup.

Yes, that's exactly how I was thinking of it.

Hugh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 19:18               ` Linus Torvalds
  2012-04-20 19:32                 ` Hugh Dickins
@ 2012-04-20 19:58                 ` Al Viro
  2012-04-20 21:12                   ` Linus Torvalds
  1 sibling, 1 reply; 32+ messages in thread
From: Al Viro @ 2012-04-20 19:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, linux-fsdevel, James Morris, linux-security-module,
	linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar,
	David Miller, Andrew Morton

On Fri, Apr 20, 2012 at 12:18:43PM -0700, Linus Torvalds wrote:

> The *bigger* annoyance is actually "do_mmap()", which does a
> do_munmap() as part of it, so it needs the same cleanup too.

So does a bunch of other places.  Let me dig out the call graph circa
3.3.0...  Here is the relevant part:
do_munmap() -
        <- pfm_do_munmap() <- pfm_remove_smpl_mapping() which grabs mmap_sem excl
        <- 64_munmap(2) which grabs mmap_sem excl
        <- kvm_arch_commit_memory_region() which grabs mmap_sem excl
        <- i810_unmap_buffer() which grabs mmap_sem excl
        <- aio_free_ring() which grabs mmap_sem excl
        <- elf_map() which grabs mmap_sem excl
        <- [flat] load_flat_file() --- BUG HERE
        <- shmdt(2) which grabs mmap_sem excl
        <- brk(2) which grabs mmap_sem excl
        <- mmap_region() [see below]
        <- munmap(2) which grabs mmap_sem excl
        <- do_brk() [see below]
        <- move_vma()
                <- mremap_to() <- do_mremap() [see below]
                <- do_mremap() [see below]
        <- mremap_to() <- do_mremap() [see below]
        <- do_mremap() [see below]
do_brk() -
        <- brk(2) which grabs mmap_sem excl
        <- [ia32_aout] set_brk() which grabs mmap_sem excl
        <- [ia32_aout] load_aout_binary() which grabs mmap_sem excl
        <- [ia32_aout] load_aout_library() which grabs mmap_sem excl
        <- [aout] set_brk() which grabs mmap_sem excl
        <- [aout] load_aout_binary() which grabs mmap_sem excl
        <- [aout] load_aout_library() which grabs mmap_sem excl
        <- [elf] set_brk() which grabs mmap_sem excl
        <- [elf] load_elf_interp() which grabs mmap_sem excl
        <- [elf] load_elf_library() which grabs mmap_sem excl
mmap_region() -
        <- remap_file_pages(2) which grabs mmap_sem excl
        <- do_mmap_pgoff() [see below]
        <- [tile] arch_setup_additional_pages() which grabs mmap_sem excl
                (a bit too late, BTW, but not for this one)
do_mmap_pgoff() -
        <- do_mmap() [see below]
        <- mmap_pgoff(2) which grabs mmap_sem excl
do_mmap() -
        <- shmat(2) which grabs mmap_sem excl
        <- aio_setup_ring() which grabs mmap_sem excl [NB: only because ctx->mm == current->mm]
        <- kvm_arch_prepare_memory_region() which grabs mmap_sem excl
        <- drm_mapbufs() which grabs mmap_sem excl
        <- exynos_drm_gem_mmap_ioctl() which grabs mmap_sem excl
        <- i810_map_buffer() which grabs mmap_sem excl [NB: racy changes of ->f_op]
        <- i915_gem_mmap_ioctl() which grabs mmap_sem excl
        <- [tile] single_step_once() which grabs mmap_sem excl
        <- [elf] elf_map() which grabs mmap_sem excl
        <- [elf] load_elf_binary() which grabs mmap_sem excl
        <- [elf_fdpic] load_elf_fdpic_binary() which grabs mmap_sem excl
        <- [elf_fdpic] elf_fdpic_map_file_constdisp_on_uclinux() which grabs mmap_sem excl
        <- [elf_fdpic] elf_fdpic_map_file_by_direct_mmap() which grabs mmap_sem excl
        <- [aout] load_aout_binary() which grabs mmap_sem excl
        <- [aout] load_aout_library() which grabs mmap_sem excl
        <- [ia32_aout] load_aout_binary() which grabs mmap_sem excl
        <- [ia32_aout] load_aout_library() which grabs mmap_sem excl
        <- [flat] load_flat_file() which grabs mmap_sem excl
        <- [som] map_som_binary() which grabs mmap_sem excl
do_mremap() -
        <- mremap(2) which grabs mmap_sem excl

(bug mentioned re load_flat_file() is still there, but it's irrelevant
for our purposes - no ->mmap_sem held by caller of do_munmap()).  That's
a metric arseload of sites to propagate that thing to...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 19:58                 ` Al Viro
@ 2012-04-20 21:12                   ` Linus Torvalds
  2012-04-20 22:13                     ` Al Viro
  0 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2012-04-20 21:12 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, linux-fsdevel, James Morris, linux-security-module,
	linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar,
	David Miller, Andrew Morton

On Fri, Apr 20, 2012 at 12:58 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> So does a bunch of other places.  Let me dig out the call graph circa
> 3.3.0...  Here is the relevant part:

Yes, but a lot of those would actually be helped by a helper function
that does all of:
 - grab mmap_sem
 - call do_m[un]map()
 - release mmap_sem

and that would actually clean them up even in the current case.

And then we could do the cleanup in just the helper function.

Not all, no. But a preparatory patch that just creates the helper
functions for doing brk/mmap/munmap would get rid of a fairly big
chunk of them.

You can visualize how many of them do that by just doing

    git grep -5 do_m[un]*map

and then high-lghting '_write(' (to visually show the
down_write/up_write pairs that surround most of them) by searching for
it.

Are they all like that? No. But most of the ones outside of mm/ do fit
that simple pattern and should probably be fixed up just to have them
not contain VM locking details in them *anyway*.

                    Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 21:12                   ` Linus Torvalds
@ 2012-04-20 22:13                     ` Al Viro
  2012-04-20 22:35                       ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Al Viro @ 2012-04-20 22:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, linux-fsdevel, James Morris, linux-security-module,
	linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar,
	David Miller, Andrew Morton

On Fri, Apr 20, 2012 at 02:12:55PM -0700, Linus Torvalds wrote:

> Are they all like that? No. But most of the ones outside of mm/ do fit
> that simple pattern and should probably be fixed up just to have them
> not contain VM locking details in them *anyway*.

Kinda-sorta.  I agree that such helpers make sense, but you are too
optimistic about the number of such places.  And clusterfuck around
mremap() is fairly deep, so propagating all way back to up_write()
wont' be fun.

sys_shmdt() is misusing do_munmap(), AFAICS.  And there we call it many
times in a loop, unmapping only a subset, so it's not like we could
blindly pick a string of vmas and handle them all separately afterwards.
It could be handled if we passed an initially empty list, collecting
vmas in it as we go, but... ouch

BTW, looks like I've just found a bug in oprofile - take a look at
sys_munmap() and you'll see that it's almost exactly your proposed
helper.  The only difference is that it starts with calling
	profile_munmap(addr);
(mind you, *before* grabbing ->mmap_sem), which is to say
        blocking_notifier_call_chain(&munmap_notifier, 0, (void *)addr);
i.e. a really fancy way to arrange a call of munmap_notify() from
drivers/oprofile/buffer_sync.c.  Which does the following:
{
        unsigned long addr = (unsigned long)data;
        struct mm_struct *mm = current->mm;
        struct vm_area_struct *mpnt;

        down_read(&mm->mmap_sem);

        mpnt = find_vma(mm, addr);
        if (mpnt && mpnt->vm_file && (mpnt->vm_flags & VM_EXEC)) {
                up_read(&mm->mmap_sem);
                /* To avoid latency problems, we only process the current CPU,
                 * hoping that most samples for the task are on this CPU
                 */
                sync_buffer(raw_smp_processor_id());
                return 0;
        }

        up_read(&mm->mmap_sem);
        return 0;
}

Leaving aside the convoluted way it's done, it's obviously both racy and
incomplete.  The worst part is, sync_buffer() *can't* be called with
any ->mmap_sem held; it goes through a bunch of task_struct, grabbing
->mmap_sem shared.  Fun...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 22:13                     ` Al Viro
@ 2012-04-20 22:35                       ` Linus Torvalds
  2012-04-27  7:35                         ` Kasatkin, Dmitry
  0 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2012-04-20 22:35 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, linux-fsdevel, James Morris, linux-security-module,
	linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar,
	David Miller, Andrew Morton

On Fri, Apr 20, 2012 at 3:13 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Kinda-sorta.  I agree that such helpers make sense, but you are too
> optimistic about the number of such places.  And clusterfuck around
> mremap() is fairly deep, so propagating all way back to up_write()
> wont' be fun.

Fair enough.

I'll do the helpers and see how much they get rid of, just because
looking at all the callers, those helpers seem to be obviously the
right thing anyway. So even if we don't do anything else, we can
improve things regardless.

For do_brk(), for example, it looks like do_brk() itself should
actually be entirely static to mm/mmap.c, because every single caller
from the outside actually wants the self-locking version.

So plan right now: do "vm_xyzzy()" helper functions that do
"do_xyzzy()" and take the lock (and do not take the "mm" argument,
because it had better always be the current one - keep the calling
convention as simple as possible).

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 22:35                       ` Linus Torvalds
@ 2012-04-27  7:35                         ` Kasatkin, Dmitry
  2012-04-27 17:34                           ` Al Viro
  0 siblings, 1 reply; 32+ messages in thread
From: Kasatkin, Dmitry @ 2012-04-27  7:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Hugh Dickins, linux-fsdevel, James Morris,
	linux-security-module, linux-kernel, David Safford, Mimi Zohar,
	David Miller, Andrew Morton

On Sat, Apr 21, 2012 at 1:35 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Apr 20, 2012 at 3:13 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Kinda-sorta.  I agree that such helpers make sense, but you are too
> > optimistic about the number of such places.  And clusterfuck around
> > mremap() is fairly deep, so propagating all way back to up_write()
> > wont' be fun.
>
> Fair enough.
>
> I'll do the helpers and see how much they get rid of, just because
> looking at all the callers, those helpers seem to be obviously the
> right thing anyway. So even if we don't do anything else, we can
> improve things regardless.
>
> For do_brk(), for example, it looks like do_brk() itself should
> actually be entirely static to mm/mmap.c, because every single caller
> from the outside actually wants the self-locking version.
>
> So plan right now: do "vm_xyzzy()" helper functions that do
> "do_xyzzy()" and take the lock (and do not take the "mm" argument,
> because it had better always be the current one - keep the calling
> convention as simple as possible).
>
>                   Linus

Moi Linus,

Taking the list of files to fput out of do_munmap() and doing it after
unlocking mmap sem
was one of possible solutions when we discussed it with Mimi.
But we were looking for the solution with less modifications of existing kernel.

In fact IMA is already doing some work before taking mmap sem.
---
	ret = ima_file_mmap(filep, prot);
	if (ret < 0)
		return ret;
	down_write(&current->mm->mmap_sem);
--

But have you seen the proposed patch for __fput()?
[PATCH v4 10/12] ima: defer calling __fput()

It defers only of course the last AND mmap_sem is locked AND open for write.

	if (current->mm && rwsem_is_locked(&current->mm->mmap_sem)) {
		if (ima_defer_fput(file) == 0)
			return;
	}

Just 5 out of ~100,000 mmap_sem held fput() calls were deferred.

t.
- Dmitry

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-27  7:35                         ` Kasatkin, Dmitry
@ 2012-04-27 17:34                           ` Al Viro
  2012-04-27 18:52                             ` Kasatkin, Dmitry
  2012-04-30 14:32                             ` Mimi Zohar
  0 siblings, 2 replies; 32+ messages in thread
From: Al Viro @ 2012-04-27 17:34 UTC (permalink / raw)
  To: Kasatkin, Dmitry
  Cc: Linus Torvalds, Hugh Dickins, linux-fsdevel, James Morris,
	linux-security-module, linux-kernel, David Safford, Mimi Zohar,
	David Miller, Andrew Morton

On Fri, Apr 27, 2012 at 10:35:25AM +0300, Kasatkin, Dmitry wrote:

> But have you seen the proposed patch for __fput()?
> [PATCH v4 10/12] ima: defer calling __fput()
> 
> It defers only of course the last AND mmap_sem is locked AND open for write.
> 
> 	if (current->mm && rwsem_is_locked(&current->mm->mmap_sem)) {
> 		if (ima_defer_fput(file) == 0)
> 			return;
> 	}
> 
> Just 5 out of ~100,000 mmap_sem held fput() calls were deferred.

Let me get it straight.
	a) You still ignore all the problems with that described in the
posting right in the beginning of this thread.
	b) You ignore the problems with semantics changes from user-visible
delays of fput() past the return from syscall (described in Linus' posting
upthread - they apply to this "solution" as well).
	c) You seem to consider the fact that this path will be exercised
very rarely, thus making any races on it damn hard to reproduce and debug
as a good thing.  

And as for the sentiment expressed in the beginning of your posting (that
smaller patch size is worth more than clean locking rules, maintainability
of resulting kernel, etc.)...  I'm sorry, but you guys need to decide
what IMA is.  If it's a first-class part of the kernel, you have your
priorities backwards...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-27 17:34                           ` Al Viro
@ 2012-04-27 18:52                             ` Kasatkin, Dmitry
  2012-04-27 19:15                               ` Kasatkin, Dmitry
  2012-04-30 14:32                             ` Mimi Zohar
  1 sibling, 1 reply; 32+ messages in thread
From: Kasatkin, Dmitry @ 2012-04-27 18:52 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Hugh Dickins, linux-fsdevel, James Morris,
	linux-security-module, linux-kernel, David Safford, Mimi Zohar,
	David Miller, Andrew Morton

On Fri, Apr 27, 2012 at 8:34 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Apr 27, 2012 at 10:35:25AM +0300, Kasatkin, Dmitry wrote:
>
>> But have you seen the proposed patch for __fput()?
>> [PATCH v4 10/12] ima: defer calling __fput()
>>
>> It defers only of course the last AND mmap_sem is locked AND open for write.
>>
>>       if (current->mm && rwsem_is_locked(&current->mm->mmap_sem)) {
>>               if (ima_defer_fput(file) == 0)
>>                       return;
>>       }
>>
>> Just 5 out of ~100,000 mmap_sem held fput() calls were deferred.
>
> Let me get it straight.
>        a) You still ignore all the problems with that described in the
> posting right in the beginning of this thread.
>        b) You ignore the problems with semantics changes from user-visible
> delays of fput() past the return from syscall (described in Linus' posting
> upthread - they apply to this "solution" as well).
>        c) You seem to consider the fact that this path will be exercised
> very rarely, thus making any races on it damn hard to reproduce and debug
> as a good thing.
>
> And as for the sentiment expressed in the beginning of your posting (that
> smaller patch size is worth more than clean locking rules, maintainability
> of resulting kernel, etc.)...  I'm sorry, but you guys need to decide
> what IMA is.  If it's a first-class part of the kernel, you have your
> priorities backwards...

Hello,

I do not ignore anything.
I said that we were thinking about solution to get the list of file to
fput them after mmap unlock.
And I do understand the issues discussed.
I just wanted to know more opinions on proposed patch.

- Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-27 18:52                             ` Kasatkin, Dmitry
@ 2012-04-27 19:15                               ` Kasatkin, Dmitry
  0 siblings, 0 replies; 32+ messages in thread
From: Kasatkin, Dmitry @ 2012-04-27 19:15 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Hugh Dickins, linux-fsdevel, James Morris,
	linux-security-module, linux-kernel, David Safford, Mimi Zohar,
	David Miller, Andrew Morton

On Fri, Apr 27, 2012 at 9:52 PM, Kasatkin, Dmitry
<dmitry.kasatkin@intel.com> wrote:
> On Fri, Apr 27, 2012 at 8:34 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Fri, Apr 27, 2012 at 10:35:25AM +0300, Kasatkin, Dmitry wrote:
>>
>>> But have you seen the proposed patch for __fput()?
>>> [PATCH v4 10/12] ima: defer calling __fput()
>>>
>>> It defers only of course the last AND mmap_sem is locked AND open for write.
>>>
>>>       if (current->mm && rwsem_is_locked(&current->mm->mmap_sem)) {
>>>               if (ima_defer_fput(file) == 0)
>>>                       return;
>>>       }
>>>
>>> Just 5 out of ~100,000 mmap_sem held fput() calls were deferred.
>>
>> Let me get it straight.
>>        a) You still ignore all the problems with that described in the
>> posting right in the beginning of this thread.
>>        b) You ignore the problems with semantics changes from user-visible
>> delays of fput() past the return from syscall (described in Linus' posting
>> upthread - they apply to this "solution" as well).
>>        c) You seem to consider the fact that this path will be exercised
>> very rarely, thus making any races on it damn hard to reproduce and debug
>> as a good thing.
>>
>> And as for the sentiment expressed in the beginning of your posting (that
>> smaller patch size is worth more than clean locking rules, maintainability
>> of resulting kernel, etc.)...  I'm sorry, but you guys need to decide
>> what IMA is.  If it's a first-class part of the kernel, you have your
>> priorities backwards...
>
> Hello,
>
> I do not ignore anything.
> I said that we were thinking about solution to get the list of file to
> fput them after mmap unlock.
> And I do understand the issues discussed.
> I just wanted to know more opinions on proposed patch.
>
> - Dmitry

For sure we are happy to see the discussion about the problem. Thanks
for your attention to it.
And our target to solve it in best possible way.
So let's agree the solution and we will be happy to implement the patch.

- Dmitry

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-27 17:34                           ` Al Viro
  2012-04-27 18:52                             ` Kasatkin, Dmitry
@ 2012-04-30 14:32                             ` Mimi Zohar
  2012-05-03  4:23                               ` James Morris
  1 sibling, 1 reply; 32+ messages in thread
From: Mimi Zohar @ 2012-04-30 14:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Jonathan Corbet <corbet@lwn.net>, Kasatkin, Dmitry,
	Linus Torvalds, Hugh Dickins, linux-fsdevel, James Morris,
	linux-security-module, linux-kernel, David Safford, David Miller,
	Andrew Morton

On Fri, 2012-04-27 at 18:34 +0100, Al Viro wrote:
> On Fri, Apr 27, 2012 at 10:35:25AM +0300, Kasatkin, Dmitry wrote:
> 
> > But have you seen the proposed patch for __fput()?
> > [PATCH v4 10/12] ima: defer calling __fput()
> > 
> > It defers only of course the last AND mmap_sem is locked AND open for write.
> > 
> > 	if (current->mm && rwsem_is_locked(&current->mm->mmap_sem)) {
> > 		if (ima_defer_fput(file) == 0)
> > 			return;
> > 	}
> > 
> > Just 5 out of ~100,000 mmap_sem held fput() calls were deferred.
> 
> Let me get it straight.
> 	a) You still ignore all the problems with that described in the
> posting right in the beginning of this thread.
> 	b) You ignore the problems with semantics changes from user-visible
> delays of fput() past the return from syscall (described in Linus' posting
> upthread - they apply to this "solution" as well).
> 	c) You seem to consider the fact that this path will be exercised
> very rarely, thus making any races on it damn hard to reproduce and debug
> as a good thing.  
> 
> And as for the sentiment expressed in the beginning of your posting (that
> smaller patch size is worth more than clean locking rules, maintainability
> of resulting kernel, etc.)...  I'm sorry, but you guys need to decide
> what IMA is.  If it's a first-class part of the kernel, you have your
> priorities backwards...

Al, with all this time spent on the different components of the
integrity subsystem, making it a first class citizen is our main
concern.  We definitely appreciate all of your work, previous and
current work on fput, to help make this happen and will be happy to help
in any way possbile.

Jon, thank you for summarizing the discussion - article
http://lwn.net/Articles/494158/.  As Jake Edge's previous LWN article
http://lwn.net/Articles/488906/ said, we've been working on upstreaming
the different integrity components for quite a while.  Although
IMA-appraisal isn't the last component, it is a major one and were
hoping that it would be upstreamed in the near future.  Is 3.5 still a
possibility?

(It was just pointed out to me that the discussion has moved to
linux-arch.  I'm pretty sure that other people on the LSM mailing list
are following this discussion.  In the future, please CC the LSM
mailing.)

thanks,

Mimi

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-30 14:32                             ` Mimi Zohar
@ 2012-05-03  4:23                               ` James Morris
  0 siblings, 0 replies; 32+ messages in thread
From: James Morris @ 2012-05-03  4:23 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Al Viro, Jonathan Corbet <corbet@lwn.net>, Kasatkin, Dmitry,
	Linus Torvalds, Hugh Dickins, linux-fsdevel,
	linux-security-module, linux-kernel, David Safford, David Miller,
	Andrew Morton

On Mon, 30 Apr 2012, Mimi Zohar wrote:

> Jon, thank you for summarizing the discussion - article
> http://lwn.net/Articles/494158/.  As Jake Edge's previous LWN article
> http://lwn.net/Articles/488906/ said, we've been working on upstreaming
> the different integrity components for quite a while.  Although
> IMA-appraisal isn't the last component, it is a major one and were
> hoping that it would be upstreamed in the near future.  Is 3.5 still a
> possibility?

Not sure why you're asking Jon, but the technical issues raised by Al need 
to be resolved before the code can be considered for upstreaming.


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
  2012-04-20 19:04             ` Al Viro
  2012-04-20 19:18               ` Linus Torvalds
@ 2012-04-20 19:37               ` Al Viro
  1 sibling, 0 replies; 32+ messages in thread
From: Al Viro @ 2012-04-20 19:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel,
	David Safford, Dmitry Kasatkin, Mimi Zohar, Linus Torvalds,
	David Miller, Andrew Morton

On Fri, Apr 20, 2012 at 08:04:18PM +0100, Al Viro wrote:

> Deferring the final pass after dropping ->mmap_sem is going to be
> interesting; what would protect ->vm_next on those suckers?  Locking
> rules are already bloody complicated in mm/*; this will only add more
> subtle fun.  Note that right now both the dissolution of ->vm_next
> list and freeing of VMAs happen under ->mmap_sem; changing that might
> cost us a lot of headache...

BTW, even leaving that aside, we have at least one definite deadlock
(mainline, not on ->mmap_sem - see the mess in drivers/video/msm/mdp.c)
and a possibility of other fun (e.g. any network filesystem that
ends up triggering rtnl_lock() during umount is going to deadlock if
you combine it with sch_atm.c mess where we do fget()/check/fput()
in netlink message handling, close() from another thread racing with
it and descriptor being the only thing that keeps that network fs
alive past lazy umount or namespace dissolution).  And then there's
SCM_RIGHTS vs. drivers/vhost/net.c, which might or might not be
deadlockable - I'm not familiar enough with that driver to tell
right now.

_If_ it had been ->mmap_sem alone, the things would be much simpler.
There wouldn't be a problem, to start with - IMA stuff is not in the
mainline and that's the only thing that really wants ->i_mutex in
fput(); the few existing places that grab it in ->release() are racy
anyway and need to be fixed with proper locking.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-05-03  4:23 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-18 13:04 [PULL REQUEST] : ima-appraisal patches Mimi Zohar
2012-04-18 15:02 ` James Morris
2012-04-18 18:07   ` Mimi Zohar
2012-04-18 18:39     ` Al Viro
2012-04-18 20:56       ` Mimi Zohar
2012-04-19 19:57       ` Mimi Zohar
2012-04-20  0:43         ` [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) Al Viro
2012-04-20  2:31           ` Linus Torvalds
2012-04-20  2:54             ` Al Viro
2012-04-20  2:58               ` Linus Torvalds
2012-04-20  8:09                 ` Al Viro
2012-04-20 15:56                   ` Linus Torvalds
2012-04-20 16:08                     ` Al Viro
2012-04-20 16:42                       ` Al Viro
2012-04-20 17:21                         ` Linus Torvalds
2012-04-20 18:07                           ` Al Viro
2012-04-20  3:15               ` Al Viro
2012-04-20 18:54           ` Hugh Dickins
2012-04-20 19:04             ` Al Viro
2012-04-20 19:18               ` Linus Torvalds
2012-04-20 19:32                 ` Hugh Dickins
2012-04-20 19:58                 ` Al Viro
2012-04-20 21:12                   ` Linus Torvalds
2012-04-20 22:13                     ` Al Viro
2012-04-20 22:35                       ` Linus Torvalds
2012-04-27  7:35                         ` Kasatkin, Dmitry
2012-04-27 17:34                           ` Al Viro
2012-04-27 18:52                             ` Kasatkin, Dmitry
2012-04-27 19:15                               ` Kasatkin, Dmitry
2012-04-30 14:32                             ` Mimi Zohar
2012-05-03  4:23                               ` James Morris
2012-04-20 19:37               ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).