* [PULL REQUEST] : ima-appraisal patches @ 2012-04-18 13:04 Mimi Zohar 2012-04-18 15:02 ` James Morris 0 siblings, 1 reply; 32+ messages in thread From: Mimi Zohar @ 2012-04-18 13:04 UTC (permalink / raw) To: James Morris Cc: linux-security-module, linux-kernel, linux-fsdevel, Al Viro, David Safford, Dmitry Kasatkin Hi James, As the last IMA-appraisal posting on 3/29 addressed Al's performance/maintenance concerns of deferring the __fput() and there hasn't been any additional comments, please consider pulling the IMA-appraisal patches. The linux-integrity.git also contains the two prereqs: vfs: fix IMA lockdep circular locking dependency (Acked by Eric) vfs: iversion truncate bug fix (currently in linux-next, via Andrew) The following changes since commit eadc10b3e17f00681f7bfb2ed6e4aee39ad93f03: vfs: extend vfs_removexattr locking (2012-04-18 07:06:55 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git next-ima-appraisal thanks, Mimi Dmitry Kasatkin (3): ima: free securityfs violations file ima: allocating iint improvements ima: digital signature verification support Mimi Zohar (8): vfs: move ima_file_free before releasing the file ima: integrity appraisal extension ima: add appraise action keywords and default rules ima: replace iint spinlock with rwlock/read_lock ima: add inode_post_setattr call ima: add ima_inode_setxattr/removexattr function and calls ima: defer calling __fput() ima: add support for different security.ima data types Documentation/ABI/testing/ima_policy | 25 ++- Documentation/kernel-parameters.txt | 8 + fs/attr.c | 2 + fs/file_table.c | 7 +- include/linux/ima.h | 32 +++ include/linux/integrity.h | 7 +- include/linux/xattr.h | 3 + security/integrity/evm/evm_main.c | 3 + security/integrity/iint.c | 64 +++---- security/integrity/ima/Kconfig | 15 ++ security/integrity/ima/Makefile | 2 + security/integrity/ima/ima.h | 37 ++++- security/integrity/ima/ima_api.c | 56 ++++-- security/integrity/ima/ima_appraise.c | 344 +++++++++++++++++++++++++++++++++ security/integrity/ima/ima_crypto.c | 8 +- security/integrity/ima/ima_fs.c | 1 + security/integrity/ima/ima_main.c | 89 ++++++--- security/integrity/ima/ima_policy.c | 181 +++++++++++++----- security/integrity/integrity.h | 11 +- security/security.c | 6 + 20 files changed, 754 insertions(+), 147 deletions(-) create mode 100644 security/integrity/ima/ima_appraise.c ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PULL REQUEST] : ima-appraisal patches 2012-04-18 13:04 [PULL REQUEST] : ima-appraisal patches Mimi Zohar @ 2012-04-18 15:02 ` James Morris 2012-04-18 18:07 ` Mimi Zohar 0 siblings, 1 reply; 32+ messages in thread From: James Morris @ 2012-04-18 15:02 UTC (permalink / raw) To: Mimi Zohar Cc: linux-security-module, linux-kernel, linux-fsdevel, Al Viro, David Safford, Dmitry Kasatkin On Wed, 18 Apr 2012, Mimi Zohar wrote: > Hi James, > > As the last IMA-appraisal posting on 3/29 addressed Al's > performance/maintenance concerns of deferring the __fput() and there > hasn't been any additional comments, please consider pulling the > IMA-appraisal patches. I thought Dmitry was still waiting on comments from Al for this. - James -- James Morris <jmorris@namei.org> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PULL REQUEST] : ima-appraisal patches 2012-04-18 15:02 ` James Morris @ 2012-04-18 18:07 ` Mimi Zohar 2012-04-18 18:39 ` Al Viro 0 siblings, 1 reply; 32+ messages in thread From: Mimi Zohar @ 2012-04-18 18:07 UTC (permalink / raw) To: James Morris Cc: linux-security-module, linux-kernel, linux-fsdevel, Al Viro, David Safford, Dmitry Kasatkin On Thu, 2012-04-19 at 01:02 +1000, James Morris wrote: > On Wed, 18 Apr 2012, Mimi Zohar wrote: > > > Hi James, > > > > As the last IMA-appraisal posting on 3/29 addressed Al's > > performance/maintenance concerns of deferring the __fput() and there > > hasn't been any additional comments, please consider pulling the > > IMA-appraisal patches. > > I thought Dmitry was still waiting on comments from Al for this. Al NAK'ed the original 'ima: defer calling __fput()' patch from 3/22, but we addressed both of Al's performance and maintaince concerns in the last posting on 3/29, which was three weeks ago. We haven't heard back ... >From the 'ima: defer calling __fput()' patch description: ima_file_free(), which is called on __fput(), updates the file data hash stored as an extended attribute to reflect file changes. If a file is closed before it is munmapped, __fput() is called with the mmap_sem taken. With IMA-appraisal enabled, this results in an mmap_sem/i_mutex lockdep. ima_defer_fput() increments the f_count to defer the __fput() being called until after the mmap_sem is released. The number of __fput() calls needing to be deferred is minimal. Only those files in policy, that were closed prior to the munmap and were mmapped write, need to defer the __fput(). With this patch, on a clean F16 install, from boot to login, only 5 out of ~100,000 mmap_sem held fput() calls were deferred. Changelog v4: - ima_defer_fput() performance improvements - move ima_defer_fput() call from remove_vma() to __fput() - only defer mmapped files opened for write - remove unnecessary ima_fput_mempool_destroy() Mimi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PULL REQUEST] : ima-appraisal patches 2012-04-18 18:07 ` Mimi Zohar @ 2012-04-18 18:39 ` Al Viro 2012-04-18 20:56 ` Mimi Zohar 2012-04-19 19:57 ` Mimi Zohar 0 siblings, 2 replies; 32+ messages in thread From: Al Viro @ 2012-04-18 18:39 UTC (permalink / raw) To: Mimi Zohar Cc: James Morris, linux-security-module, linux-kernel, linux-fsdevel, David Safford, Dmitry Kasatkin On Wed, Apr 18, 2012 at 02:07:52PM -0400, Mimi Zohar wrote: > >From the 'ima: defer calling __fput()' patch description: > > ima_file_free(), which is called on __fput(), updates the file data > hash stored as an extended attribute to reflect file changes. If a > file is closed before it is munmapped, __fput() is called with the > mmap_sem taken. With IMA-appraisal enabled, this results in an > mmap_sem/i_mutex lockdep. ima_defer_fput() increments the f_count to > defer the __fput() being called until after the mmap_sem is released. > > The number of __fput() calls needing to be deferred is minimal. Only > those files in policy, that were closed prior to the munmap and were > mmapped write, need to defer the __fput(). > > With this patch, on a clean F16 install, from boot to login, only > 5 out of ~100,000 mmap_sem held fput() calls were deferred. Assuming that it's commit 3cee52ffe8ca925bb1e96f804daa87f7e2e34e46 Author: Mimi Zohar <zohar@linux.vnet.ibm.com> Date: Fri Feb 24 06:23:12 2012 -0500 ima: defer calling __fput() in your tree, the NAK still stands. For starters, but you are creating a different locking rules for IMA-enabled builds and for everything else. Moreover, this deferral is done only for files opened for write; the rules are convoluted as hell *and* inviting abuses. NAKed at least until you come up with formal proof that there's no other lock where fput() would be possible and ->i_mutex was not allowed. This is not a way to go; that kind of kludges leads to locking code that is impossible to reason about. PS: BTW, what the hell is "fput already scheduled" codepath about? Why is it pr_info() and not an outright BUG_ON()? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PULL REQUEST] : ima-appraisal patches 2012-04-18 18:39 ` Al Viro @ 2012-04-18 20:56 ` Mimi Zohar 2012-04-19 19:57 ` Mimi Zohar 1 sibling, 0 replies; 32+ messages in thread From: Mimi Zohar @ 2012-04-18 20:56 UTC (permalink / raw) To: Al Viro Cc: James Morris, linux-security-module, linux-kernel, linux-fsdevel, David Safford, Dmitry Kasatkin On Wed, 2012-04-18 at 19:39 +0100, Al Viro wrote: > On Wed, Apr 18, 2012 at 02:07:52PM -0400, Mimi Zohar wrote: > > > >From the 'ima: defer calling __fput()' patch description: > > > > ima_file_free(), which is called on __fput(), updates the file data > > hash stored as an extended attribute to reflect file changes. If a > > file is closed before it is munmapped, __fput() is called with the > > mmap_sem taken. With IMA-appraisal enabled, this results in an > > mmap_sem/i_mutex lockdep. ima_defer_fput() increments the f_count to > > defer the __fput() being called until after the mmap_sem is released. > > > > The number of __fput() calls needing to be deferred is minimal. Only > > those files in policy, that were closed prior to the munmap and were > > mmapped write, need to defer the __fput(). > > > > With this patch, on a clean F16 install, from boot to login, only > > 5 out of ~100,000 mmap_sem held fput() calls were deferred. > > Assuming that it's commit 3cee52ffe8ca925bb1e96f804daa87f7e2e34e46 > Author: Mimi Zohar <zohar@linux.vnet.ibm.com> > Date: Fri Feb 24 06:23:12 2012 -0500 > > ima: defer calling __fput() > in your tree, the NAK still stands. For starters, but you are creating a > different locking rules for IMA-enabled builds and for everything else. > Moreover, this deferral is done only for files opened for write; the > rules are convoluted as hell *and* inviting abuses. Yes, that is the updated version. For performance, we limited deferring the __fput() to only those files that could possibly change - open for write, were closed before being munmapped, and that IMA-appraisal maintains a file data hash as an xattr. If the main concern is different locking rules when IMA is enabled, then we could remove the IMA criteria and rename ima_defer_fput() to something more generic. As for "*and* inviting abuses", I'm not sure what you mean. > NAKed at least until you come up with formal proof that there's no other > lock where fput() would be possible and ->i_mutex was not allowed. This > is not a way to go; that kind of kludges leads to locking code that is > impossible to reason about. On __fput(), we need to update the security.ima xattr with a hash of the file data. The original thread discussion suggested changing the xattr locking. The filesystems seem to do their own xattr locking, but in fs/xattr.c the i_mutex is taken before accessing the inode setxattr/removexattr ops. hm, lockdep isn't complaining about anything else. Not sure if that qualifies as formal proof. > PS: BTW, what the hell is "fput already scheduled" codepath about? > Why is it pr_info() and not an outright BUG_ON()? I'll fix this. thanks, Mimi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PULL REQUEST] : ima-appraisal patches 2012-04-18 18:39 ` Al Viro 2012-04-18 20:56 ` Mimi Zohar @ 2012-04-19 19:57 ` Mimi Zohar 2012-04-20 0:43 ` [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) Al Viro 1 sibling, 1 reply; 32+ messages in thread From: Mimi Zohar @ 2012-04-19 19:57 UTC (permalink / raw) To: Al Viro Cc: James Morris, linux-security-module, linux-kernel, linux-fsdevel, David Safford, Dmitry Kasatkin On Wed, 2012-04-18 at 19:39 +0100, Al Viro wrote: > On Wed, Apr 18, 2012 at 02:07:52PM -0400, Mimi Zohar wrote: > NAKed at least until you come up with formal proof that there's no other > lock where fput() would be possible and ->i_mutex was not allowed. Has the discussion here moved from deferring the __fput() for the mmap_sem/i_mutex lockdep side case, to taking the i_mutex in __fput() in general? Lockdep has not reported any problems, other than for the mmap_sem/i_mutex scenario. > This > is not a way to go; that kind of kludges leads to locking code that is > impossible to reason about. Are you referring to defering the __fput() or taking the i_mutex in __fput() in general? The i_mutex is currently used to protect file data and metadata (eg. chown, chmod, xattrs). After the last file data change, the file metadata needs to be updated to reflect the file data changes. As i_mutex is used for protecting both the file data and file metadata, why would taking the i_mutex in __fput() be kludgie. I'd really appreciate any help, suggestions. thanks, Mimi ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-19 19:57 ` Mimi Zohar @ 2012-04-20 0:43 ` Al Viro 2012-04-20 2:31 ` Linus Torvalds 2012-04-20 18:54 ` Hugh Dickins 0 siblings, 2 replies; 32+ messages in thread From: Al Viro @ 2012-04-20 0:43 UTC (permalink / raw) To: linux-fsdevel Cc: James Morris, linux-security-module, linux-kernel, linux-fsdevel, David Safford, Dmitry Kasatkin, Mimi Zohar, Linus Torvalds, David Miller On Thu, Apr 19, 2012 at 03:57:28PM -0400, Mimi Zohar wrote: > Has the discussion here moved from deferring the __fput() for the > mmap_sem/i_mutex lockdep side case, to taking the i_mutex in __fput() in > general? Lockdep has not reported any problems, other than for the > mmap_sem/i_mutex scenario. > > > This > > is not a way to go; that kind of kludges leads to locking code that is > > impossible to reason about. > > Are you referring to defering the __fput() or taking the i_mutex in > __fput() in general? I'm referring to having "are we holding ->mmap_sem" logics anywhere in fput(). Note that your "lockdep has not reported..." is a symptom of the same problem - it's *NOT* enough to show the lack of deadlocks from the change of locking rules. And after that change we'll get even worse situation, since proving the safety will become harder (and even more so if we end up adding other cases when we need to defer). Summary for those who'd been spared the earlier fun: IMA stuff wants to grab ->i_mutex on the final fput() in some cases. That violates the locking order in at least one way - final fput() may come under ->mmap_sem, in which case grabbing ->i_mutex would be a Bloody Bad Idea(tm). "Solution" proposed: a bit of spectaculary ugly logics checking in final fput() whether we have ->mmap_sem locked. If we do, said final fput() becomes non-final and is done asynchronously. This is fundamentally flawed, of course, since "is ->mmap_sem unlocked" is used as "is it safe to have ->i_mutex grabbed", and it's both unproven and brittle as hell even if it happens to be true right now (and I wouldn't bet on that, TBH). If it had been IMA alone, I would've cheerfully told them where to stuff the whole thing. Unfortunately, fput() *is* lock-heavy even without them. Note that it can easily end up triggering such things as final deactivate_super(). Now, ->mmap_sem is irrelevant here - after all, any inodes involved won't be mmapped, or deactivate_super() wouldn't be final. However, fput() is called e.g. under rtnl_lock() and I'm not at all sure that something like NFS won't try to grab it from its ->kill_sb(). So the question is, do we have any reasonable way to get to the situation where the __fput() would only be done without any locks held? It might be worth trying. What we CAN'T do: * defer all __fput() (i.e. always do final fput() async). Obviously no-go, for performance reasons alone. * check some predicate about the set of locks held and defer if it's false. That's what IMA folks have tried to do; no-go for the reasons described above. * add a new helper (fput_carefully() or something like that) that would defer final __fput(), leaving fput() with the current behaviour. And convert the potentially unsafe callers to it (starting with everything that holds ->mmap_sem). No-go since it's not maintainable - a change pretty far away from a function that does (currently safe) fput() can end up requiring the conversion to fput_carefully(). Too easy to miss, so this will decay and it won't be easy to verify correctness several years down the road. However, there's an approach that might be feasible. Most of the time the final fput() *is* done without any locks held and there's a very large subclass of those call sites - those that come via fput_light(). What we could do, and what might be maintainable is: * prohibit fput_light() with locks held. Right now we are very close to that (or already there - I haven't finished checking). * convert low-hanging fget/fput in syscalls to fget_light/fput_light. Makes sense anyway. * split off fput_nodefer() (identical to what fput() is right now), make fput_light() call it instead of fput(). Switch path_lookupat() and path_openat() to fput_nodefer() as well. Ditto for sys_socketpair() and sys_accept4(). * make fput() (now almost never leading to __fput()) defer the sucker in very rare cases when it ends up dropping the last reference. At that point we would have __fput() always done without any locks held, which would remove all restrictions on the locks that can be taken from it. Comments? Disclaimer: I have not finished going through the call sites (note that there are wrappers - e.g. sockfd_put() *and* sockfd_lookup()), so there might be obstacles. In particular, I'm still not sure about SCM_RIGHTS datagrams handling - IIRC, we had an interesting kludge in there, with a kinda-sorta deferral/batching. BTW, I wonder about deadlocks around that one - drivers/vhost/net.c is doing ->recvmsg() while holding vq->mutex and if an SCM_RIGHTS datagram comes in, we'll get a bunch of fput() done. Which can lead to final umount of a network filesystem, etc. If that thing can lead to handle_rx() on the same sucker, we have a deadlock... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 0:43 ` [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) Al Viro @ 2012-04-20 2:31 ` Linus Torvalds 2012-04-20 2:54 ` Al Viro 2012-04-20 18:54 ` Hugh Dickins 1 sibling, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2012-04-20 2:31 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller On Thu, Apr 19, 2012 at 5:43 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > However, there's an approach that might be feasible. Most of the time > the final fput() *is* done without any locks held and there's a very > large subclass of those call sites - those that come via fput_light(). > What we could do, and what might be maintainable is: > * prohibit fput_light() with locks held. Right now we are very > close to that (or already there - I haven't finished checking). > * convert low-hanging fget/fput in syscalls to fget_light/fput_light. > Makes sense anyway. Many of them would make sense, yes (looking at vfs_fstatat() etc. But a lot of fput() calls come from close() -> filp_close -> fput(). And the "fput_light()" model *only* works together with fget_light() as it is now. So I do think you need some other model. Of course, we can just do "fput_light(file, 1)" instead - that seems pretty ugly, though. But just making "fput()" do a defer on the last count sounds actively *wrong* for things like close(), which may actually have serious consistency guarantees (ie the process doing the close() may "know" that it is the last user, and depend on the fact that the close() did actually delete the inode etc. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 2:31 ` Linus Torvalds @ 2012-04-20 2:54 ` Al Viro 2012-04-20 2:58 ` Linus Torvalds 2012-04-20 3:15 ` Al Viro 0 siblings, 2 replies; 32+ messages in thread From: Al Viro @ 2012-04-20 2:54 UTC (permalink / raw) To: Linus Torvalds Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller On Thu, Apr 19, 2012 at 07:31:01PM -0700, Linus Torvalds wrote: > On Thu, Apr 19, 2012 at 5:43 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > However, there's an approach that might be feasible. ?Most of the time > > the final fput() *is* done without any locks held and there's a very > > large subclass of those call sites - those that come via fput_light(). > > What we could do, and what might be maintainable is: > > ? ? ? ?* prohibit fput_light() with locks held. ?Right now we are very > > close to that (or already there - I haven't finished checking). > > ? ? ? ?* convert low-hanging fget/fput in syscalls to fget_light/fput_light. > > Makes sense anyway. > > Many of them would make sense, yes (looking at vfs_fstatat() etc. > > But a lot of fput() calls come from close() -> filp_close -> fput(). > > And the "fput_light()" model *only* works together with fget_light() > as it is now. > > So I do think you need some other model. Of course, we can just do > "fput_light(file, 1)" instead - that seems pretty ugly, though. But > just making "fput()" do a defer on the last count sounds actively > *wrong* for things like close(), which may actually have serious > consistency guarantees (ie the process doing the close() may "know" > that it is the last user, and depend on the fact that the close() did > actually delete the inode etc. Umm... I really wonder if we *want* filp_close() under any kind of locks. You are right - it should not be deferred. I haven't finished checking the callers of that puppy, but if we really do it while holding any kind of lock, we are asking for trouble. So I'd rather switch filp_close() to use of fput_nodefer() if that turns out to be possible. FWIW, the set of primitives I'm thinking of right now is __fput(file) - same as now schedule_fput(file) - takes the only reference to file and schedules __fput() fput_nodefer(file) { if (atomic_long_dec_and_test(&file->f_count)) __fput(file); } fput(file) { if (unlikely(!fput_atomic(file)) schedule_fput(file); } fput_light(file, need_fput) { if (need_fput) fput_nodefer(file); } fput_light_defer(file, need_fput) // for callers in some weird ioctls, might // not be needed at all { if (need_fput) fput(file); } and filp_close() would, if that turns out to be possible, call fput_nodefer() instead of fput(). If we *do* have places where we need deferral in filp_close() (and I'm fairly sure that any such place is a deadlock right now), well, we'll need a variant of filp_close() sans the call of fput...() and those places would call that, followed by full (deferring) fput(). ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 2:54 ` Al Viro @ 2012-04-20 2:58 ` Linus Torvalds 2012-04-20 8:09 ` Al Viro 2012-04-20 3:15 ` Al Viro 1 sibling, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2012-04-20 2:58 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller On Thu, Apr 19, 2012 at 7:54 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Umm... I really wonder if we *want* filp_close() under any kind of > locks. You are right - it should not be deferred. I haven't finished > checking the callers of that puppy, but if we really do it while holding > any kind of lock, we are asking for trouble. So I'd rather switch > filp_close() to use of fput_nodefer() if that turns out to be possible. Ok, fair enough, looks like a reasonable plan to me. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 2:58 ` Linus Torvalds @ 2012-04-20 8:09 ` Al Viro 2012-04-20 15:56 ` Linus Torvalds 0 siblings, 1 reply; 32+ messages in thread From: Al Viro @ 2012-04-20 8:09 UTC (permalink / raw) To: Linus Torvalds Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller On Thu, Apr 19, 2012 at 07:58:57PM -0700, Linus Torvalds wrote: > On Thu, Apr 19, 2012 at 7:54 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Umm... ?I really wonder if we *want* filp_close() under any kind of > > locks. ?You are right - it should not be deferred. ?I haven't finished > > checking the callers of that puppy, but if we really do it while holding > > any kind of lock, we are asking for trouble. ?So I'd rather switch > > filp_close() to use of fput_nodefer() if that turns out to be possible. > > Ok, fair enough, looks like a reasonable plan to me. Actually, scratch that; I think I have a better variant * switch to fget_light/fput_light where possible; it's not needed for the rest, but is useful anyway * move the guts of filp_close() (everything prior to fput() it does in the end) into a new helper, turning filp_close() into a couple of calls (inlined, at that). Equivalent transformation. * add fput_nodefer(); does the same thing fput() does now. Make fput() defer the call of __fput(). Add a filp_close_nodefer() - same as filp_close(), but the second call in it is fput_nodefer(), not fput(). And switch the callers that are known to be done outside of any locks to filp_close_nodefer(). Switch accept4()/socketpair() to fput_nodefer() (for freshly created struct file in case of cleanup on failure exit). Note that we do *not* need to bother with fput_light() - even if it does fput(), that fput() won't usually be the final one. We'd need it to race with close() or dup2() from another thread for that to happen. So we only need to review the callers of filp_close() and there are much fewer of those. We also get something else out of that - AFAICS, the kludge in __scm_destroy() can be killed after that. We did it to prevent recursion on fput(), right? Now that recursion will be gone... I'd probably not bother with trying to be clever in doing the deferral itself - the point is to make it rare, so it's not a hot path anyway. We can play with per-CPU lists, etc., but in this case dumber is better. Comments? I'm half-asleep right now, so I might be missing something obvious... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 8:09 ` Al Viro @ 2012-04-20 15:56 ` Linus Torvalds 2012-04-20 16:08 ` Al Viro 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2012-04-20 15:56 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller On Fri, Apr 20, 2012 at 1:09 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Note that we do *not* need to bother with fput_light() - even if it does > fput(), that fput() won't usually be the final one. Ack. Most of the time the fput_light()->fput will just decrement the use count. > We also get something else out of that - AFAICS, the kludge in __scm_destroy() > can be killed after that. We did it to prevent recursion on fput(), right? > Now that recursion will be gone... Hmm.. That points out that we may have a *lot* of these pending final fput's, though. So the deferral queueing should be fairly light. What were your particular plans for it? This actually sounds like a fairly good usage-case for Oleg's new task_work_add() thing. That would defer the final fput, but at the same time guarantee that it gets done before returning to user space - in case there are any issues with synchronous actions. Have you looked at Oleg's series? You weren't cc'd because it didn't affect you, but look at lkml for "task_work_add()" to find it. NOTE! If pure kernel threads do fput() deferral (and maybe they do - I'm thinking nfsd etc), then the task-work thing might need some extra thought. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 15:56 ` Linus Torvalds @ 2012-04-20 16:08 ` Al Viro 2012-04-20 16:42 ` Al Viro 0 siblings, 1 reply; 32+ messages in thread From: Al Viro @ 2012-04-20 16:08 UTC (permalink / raw) To: Linus Torvalds Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller On Fri, Apr 20, 2012 at 08:56:57AM -0700, Linus Torvalds wrote: > > We also get something else out of that - AFAICS, the kludge in __scm_destroy() > > can be killed after that. ?We did it to prevent recursion on fput(), right? > > Now that recursion will be gone... > > Hmm.. That points out that we may have a *lot* of these pending final > fput's, though. So the deferral queueing should be fairly light. What > were your particular plans for it? Doing removal from per-sb list immediately (i.e. before possible deferral; we skip ones with zero ->f_count when we walk the list anyway), then in case we decide to defer just move them to per-CPU list and schedule work on that CPU, with handler that will pull the corresponding list out and do the rest of __fput() for everything in that list. No extra locking, just preempt_disable() around the "move to per-CPU list" bit. Or a per-CPU spinlock with worker not being tied to specific CPU and told which CPU's list to work with. How does CPU hotplug interact with work scheduled on CPU about to be taken down, BTW? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 16:08 ` Al Viro @ 2012-04-20 16:42 ` Al Viro 2012-04-20 17:21 ` Linus Torvalds 0 siblings, 1 reply; 32+ messages in thread From: Al Viro @ 2012-04-20 16:42 UTC (permalink / raw) To: Linus Torvalds Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller On Fri, Apr 20, 2012 at 05:08:48PM +0100, Al Viro wrote: > Doing removal from per-sb list immediately (i.e. before possible > deferral; we skip ones with zero ->f_count when we walk the list > anyway), then in case we decide to defer just move them to per-CPU > list and schedule work on that CPU, with handler that will pull the > corresponding list out and do the rest of __fput() for everything > in that list. No extra locking, just preempt_disable() around the > "move to per-CPU list" bit. Or a per-CPU spinlock with worker not > being tied to specific CPU and told which CPU's list to work with. > How does CPU hotplug interact with work scheduled on CPU about to > be taken down, BTW? Actually, I like the per-CPU spinlock variant better; the thing is, with that scheme we get normal fput() (i.e. non-nodefer variant) non-blocking. How about this: __fput() loses file_sb_list_del() call fput(file) { if (atomic_long_dec_and_test(...)) { unsigned long flags; struct foo *p; file_sb_list_del(file); p = get_cpu_var(deferral_lists); spin_lock_irqsave(&p->lock, flags); list_move(&file->f_u.fu_list, &p->list); spin_unlock_irqrestore(&p->lock, flags); schedule_work(&p->work); put_cpu_var(p); } } fput_nodefer(file) { if (atomic_long_dec_and_test(...)) { file_sb_list_del(file); __fput(file); } } do_deferred_fput_work(work) { struct foo *p = container_of(work, struct foo, work); LIST_HEAD(list); spin_lock_irq(&p->lock); list_splice_init(&p->list, list); spin_unlock_irq(&p->lock); while (!list_empty(list)) { struct file *file = list_entry(list, struct file, f_u.fu_list); list_del_init(&file->f_u.fu_list); __fput(file); } } Voila - now only fput_nodefer() is blocking! fput() can be used from any context that way, which should kill e.g. a kludge in fs/aio.c. Comments? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 16:42 ` Al Viro @ 2012-04-20 17:21 ` Linus Torvalds 2012-04-20 18:07 ` Al Viro 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2012-04-20 17:21 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller On Fri, Apr 20, 2012 at 9:42 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Actually, I like the per-CPU spinlock variant better; the thing is, > with that scheme we get normal fput() (i.e. non-nodefer variant) > non-blocking. How about this: What's the advantage of a per-cpu lock? If you make the work be per-cpu, then you're better with no locking at all: just disable interrupts (which you do anyway). And if you want to use a spinlock, don't bother with the percpu side. The thing I do not like about the schedule_work approach is that it (a) totally hides the real cost - which is the scheduling - and (b) it is so asynchronous that it will happen potentially long after the task dropped the reference. And seriously - that is user-visible behavior. For example, think about this *common* pattern: open+mmap+close+unlink+munmap which would trigger the whole deferred fput, but also triggers the actual real unlink() at fput time. Right now, you can have that kind of thing in a program and immediately unmount the filesystem afterwards (replace "unmount" with "cannot see silly-renamed files" etc). The "totally asynchronous deferral" literally *breaks*semantics*. Sure, it won't be noticeable in 99.99% of all cases, and I doubt you can trigger much of a test for it. But it's potential real breakage, and it's going to be hard to ever see. And then when it *does* happen, it's going to be totally impossible to debug. It's not just the "last unlink" thing that gets delayed. It things like file locking. It's "drop_file_write_access()". It's whatever random thing that file does at "release()". It's a ton of things like that. Delaying them has user-visible actions. That's a whole can of complexities and worries outside of the kernel interface that you are completely ignoring - just because you are trying to solve the *simple* complexity of locking interaction entirely within the kernel. I think that's a bit myopic. We don't even *know* what the problems with the async approach might be. Your "simple" solution is simple only inside the kernel. This is why I suggested you look at Oleg's patches. If we guarantee that things won't be delayed past re-entering user mode, all those issues go away. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 17:21 ` Linus Torvalds @ 2012-04-20 18:07 ` Al Viro 0 siblings, 0 replies; 32+ messages in thread From: Al Viro @ 2012-04-20 18:07 UTC (permalink / raw) To: Linus Torvalds Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller On Fri, Apr 20, 2012 at 10:21:35AM -0700, Linus Torvalds wrote: > On Fri, Apr 20, 2012 at 9:42 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Actually, I like the per-CPU spinlock variant better; the thing is, > > with that scheme we get normal fput() (i.e. non-nodefer variant) > > non-blocking. ?How about this: > > What's the advantage of a per-cpu lock? > > If you make the work be per-cpu, then you're better with no locking at > all: just disable interrupts (which you do anyway). Point taken. > The thing I do not like about the schedule_work approach is that it > (a) totally hides the real cost - which is the scheduling - and (b) > it is so asynchronous that it will happen potentially long after the > task dropped the reference. [snip] > This is why I suggested you look at Oleg's patches. If we guarantee > that things won't be delayed past re-entering user mode, all those > issues go away. I've looked at them. One obvious problem is that it tracehook_notify_resume() is not universally called. AFAICS, hexagon, m68k, microblaze, score, um and xtensa never call tracehook_notify_resume(). Out of those, hexagon is manually checking TIF_NOTIFY_RESUME and does key_replace_session_keyring(), so the call could be easily added into the same place; the rest of those guys don't even look at TIF_NOTIFY_RESUME anywhere near their signal*.c and m68k/um/xtensa don't even have it defined, let alone handled. So this stuff depends on some amount of asm glue hacking on several architectures ;-/ Another is that final fput() can, indeed, happen in kernel threads, so pure switch to task_work_add() won't be enough. I agree that having this stuff flushed before we return to userland would be a good thing; the question is, can we easily tell if there will be such a return to userland? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 2:54 ` Al Viro 2012-04-20 2:58 ` Linus Torvalds @ 2012-04-20 3:15 ` Al Viro 1 sibling, 0 replies; 32+ messages in thread From: Al Viro @ 2012-04-20 3:15 UTC (permalink / raw) To: Linus Torvalds Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller On Fri, Apr 20, 2012 at 03:54:38AM +0100, Al Viro wrote: > and filp_close() would, if that turns out to be possible, call fput_nodefer() > instead of fput(). If we *do* have places where we need deferral in > filp_close() (and I'm fairly sure that any such place is a deadlock right > now), well, we'll need a variant of filp_close() sans the call of fput...() > and those places would call that, followed by full (deferring) fput(). BTW, some of those filp_close() are simply wrong - e.g. a big pile in drivers/media/video/cx25821/*.c should be fput(). And yes, we have at least some under mutex - binder_lock held by binder_ioctl(), which ends up doing binder_transaction() with its failure cleanup hitting close_filp(). On an arbitrary struct file. It *probably* doesn't deadlock on the spot, since binder_release() itself contains a deferral mechanism (perhaps they'd spotted the deadlock, perhaps there are other reasons to have it). ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 0:43 ` [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) Al Viro 2012-04-20 2:31 ` Linus Torvalds @ 2012-04-20 18:54 ` Hugh Dickins 2012-04-20 19:04 ` Al Viro 1 sibling, 1 reply; 32+ messages in thread From: Hugh Dickins @ 2012-04-20 18:54 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, Linus Torvalds, David Miller, Andrew Morton On Fri, 20 Apr 2012, Al Viro wrote: > On Thu, Apr 19, 2012 at 03:57:28PM -0400, Mimi Zohar wrote: > > > Has the discussion here moved from deferring the __fput() for the > > mmap_sem/i_mutex lockdep side case, to taking the i_mutex in __fput() in > > general? Lockdep has not reported any problems, other than for the > > mmap_sem/i_mutex scenario. > > > > > This > > > is not a way to go; that kind of kludges leads to locking code that is > > > impossible to reason about. > > > > Are you referring to defering the __fput() or taking the i_mutex in > > __fput() in general? > > I'm referring to having "are we holding ->mmap_sem" logics anywhere in fput(). > Note that your "lockdep has not reported..." is a symptom of the same > problem - it's *NOT* enough to show the lack of deadlocks from the change > of locking rules. And after that change we'll get even worse situation, > since proving the safety will become harder (and even more so if we end > up adding other cases when we need to defer). > > Summary for those who'd been spared the earlier fun: IMA stuff wants to grab > ->i_mutex on the final fput() in some cases. That violates the locking > order in at least one way - final fput() may come under ->mmap_sem, in > which case grabbing ->i_mutex would be a Bloody Bad Idea(tm). "Solution" > proposed: a bit of spectaculary ugly logics checking in final fput() whether > we have ->mmap_sem locked. If we do, said final fput() becomes non-final > and is done asynchronously. This is fundamentally flawed, of course, > since "is ->mmap_sem unlocked" is used as "is it safe to have ->i_mutex > grabbed", and it's both unproven and brittle as hell even if it happens > to be true right now (and I wouldn't bet on that, TBH). I can see that the discussion has since moved on quite a way from here. But it looks to me fairly easy for mm to stop doing fput() under mmap_sem. That's already the case when exiting (no mmap_sem held), and shouldn't add observable cost when unmapping (we already work on a chain of vmas to be freed, and when unmapping that chain will usually just be of one: shouldn't matter to defer a final pass until after mmap_sem is dropped). Unless I'm mistaken, the fput() buried in vma_adjust() can never be a final fput. Is it worth my trying to implement that? Or do you see an immediate gotcha that I'm missing? Or are you happy enough with your deferred fput() ideas, that it would be a waste of time to rearrange the mm end? Hugh > > If it had been IMA alone, I would've cheerfully told them where to stuff > the whole thing. Unfortunately, fput() *is* lock-heavy even without them. > Note that it can easily end up triggering such things as final > deactivate_super(). Now, ->mmap_sem is irrelevant here - after all, > any inodes involved won't be mmapped, or deactivate_super() wouldn't > be final. However, fput() is called e.g. under rtnl_lock() and I'm > not at all sure that something like NFS won't try to grab it from its > ->kill_sb(). > > So the question is, do we have any reasonable way to get to the situation > where the __fput() would only be done without any locks held? It might > be worth trying. What we CAN'T do: > * defer all __fput() (i.e. always do final fput() async). Obviously > no-go, for performance reasons alone. > * check some predicate about the set of locks held and defer if it's > false. That's what IMA folks have tried to do; no-go for the reasons described > above. > * add a new helper (fput_carefully() or something like that) that would > defer final __fput(), leaving fput() with the current behaviour. And convert > the potentially unsafe callers to it (starting with everything that holds > ->mmap_sem). No-go since it's not maintainable - a change pretty far away > from a function that does (currently safe) fput() can end up requiring the > conversion to fput_carefully(). Too easy to miss, so this will decay and it > won't be easy to verify correctness several years down the road. > > However, there's an approach that might be feasible. Most of the time > the final fput() *is* done without any locks held and there's a very > large subclass of those call sites - those that come via fput_light(). > What we could do, and what might be maintainable is: > * prohibit fput_light() with locks held. Right now we are very > close to that (or already there - I haven't finished checking). > * convert low-hanging fget/fput in syscalls to fget_light/fput_light. > Makes sense anyway. > * split off fput_nodefer() (identical to what fput() is right now), > make fput_light() call it instead of fput(). Switch path_lookupat() and > path_openat() to fput_nodefer() as well. Ditto for sys_socketpair() and > sys_accept4(). > * make fput() (now almost never leading to __fput()) defer the sucker > in very rare cases when it ends up dropping the last reference. > At that point we would have __fput() always done without any locks held, > which would remove all restrictions on the locks that can be taken from it. > > Comments? > > Disclaimer: I have not finished going through the call sites (note that > there are wrappers - e.g. sockfd_put() *and* sockfd_lookup()), so there might > be obstacles. In particular, I'm still not sure about SCM_RIGHTS datagrams > handling - IIRC, we had an interesting kludge in there, with a kinda-sorta > deferral/batching. BTW, I wonder about deadlocks around that one - > drivers/vhost/net.c is doing ->recvmsg() while holding vq->mutex and if > an SCM_RIGHTS datagram comes in, we'll get a bunch of fput() done. Which > can lead to final umount of a network filesystem, etc. If that thing can > lead to handle_rx() on the same sucker, we have a deadlock... > -- ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 18:54 ` Hugh Dickins @ 2012-04-20 19:04 ` Al Viro 2012-04-20 19:18 ` Linus Torvalds 2012-04-20 19:37 ` Al Viro 0 siblings, 2 replies; 32+ messages in thread From: Al Viro @ 2012-04-20 19:04 UTC (permalink / raw) To: Hugh Dickins Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, Linus Torvalds, David Miller, Andrew Morton On Fri, Apr 20, 2012 at 11:54:01AM -0700, Hugh Dickins wrote: > I can see that the discussion has since moved on quite a way from here. > > But it looks to me fairly easy for mm to stop doing fput() under mmap_sem. > > That's already the case when exiting (no mmap_sem held), and shouldn't add > observable cost when unmapping (we already work on a chain of vmas to be > freed, and when unmapping that chain will usually just be of one: shouldn't > matter to defer a final pass until after mmap_sem is dropped). Unless I'm > mistaken, the fput() buried in vma_adjust() can never be a final fput. > > Is it worth my trying to implement that? Or do you see an immediate > gotcha that I'm missing? Or are you happy enough with your deferred > fput() ideas, that it would be a waste of time to rearrange the mm end? Deferring the final pass after dropping ->mmap_sem is going to be interesting; what would protect ->vm_next on those suckers? Locking rules are already bloody complicated in mm/*; this will only add more subtle fun. Note that right now both the dissolution of ->vm_next list and freeing of VMAs happen under ->mmap_sem; changing that might cost us a lot of headache... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 19:04 ` Al Viro @ 2012-04-20 19:18 ` Linus Torvalds 2012-04-20 19:32 ` Hugh Dickins 2012-04-20 19:58 ` Al Viro 2012-04-20 19:37 ` Al Viro 1 sibling, 2 replies; 32+ messages in thread From: Linus Torvalds @ 2012-04-20 19:18 UTC (permalink / raw) To: Al Viro Cc: Hugh Dickins, linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller, Andrew Morton On Fri, Apr 20, 2012 at 12:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Deferring the final pass after dropping ->mmap_sem is going to be > interesting; what would protect ->vm_next on those suckers? Just remove them from the active list, and keep them linked to each other using vm_next. After all, the only case we care about is the case where the vma gets removed entirely, so we just put them on their own list. In fact, that's what we already do for other reasons. See detach_vmas_to_be_unmapped(). So vm_next is actually entirely *private* by this time, and needs no locking at all. As far as I can tell, we could just make do_munmap() return that private list, and then do the fput's and freeing of the list outside the mmap_sem lock. That actually looks pretty simple. There are a fair number of callers, which looks to be the main annoyance. But fixing it up with some pattern of "do finish_munmap after drooping the mmap_sem" doesn't look *complicated*, just slightly annoying. The *bigger* annoyance is actually "do_mmap()", which does a do_munmap() as part of it, so it needs the same cleanup too. There might be other cases that do munmap as part of their operation (and that have the mmap_sem held outside of the caller), but do_munmap() and do_mmap() seem to be the two obvious ones. Many of the callers seem to do the mmap_sem() right around the call, though (see binfmt_elf.c for example), so it really would be a rather local fixup. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 19:18 ` Linus Torvalds @ 2012-04-20 19:32 ` Hugh Dickins 2012-04-20 19:58 ` Al Viro 1 sibling, 0 replies; 32+ messages in thread From: Hugh Dickins @ 2012-04-20 19:32 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller, Andrew Morton On Fri, 20 Apr 2012, Linus Torvalds wrote: > On Fri, Apr 20, 2012 at 12:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Deferring the final pass after dropping ->mmap_sem is going to be > > interesting; what would protect ->vm_next on those suckers? > > Just remove them from the active list, and keep them linked to each > other using vm_next. > > After all, the only case we care about is the case where the vma gets > removed entirely, so we just put them on their own list. > > In fact, that's what we already do for other reasons. See > detach_vmas_to_be_unmapped(). > > So vm_next is actually entirely *private* by this time, and needs no > locking at all. > > As far as I can tell, we could just make do_munmap() return that > private list, and then do the fput's and freeing of the list outside > the mmap_sem lock. > > That actually looks pretty simple. There are a fair number of callers, > which looks to be the main annoyance. But fixing it up with some > pattern of "do finish_munmap after drooping the mmap_sem" doesn't look > *complicated*, just slightly annoying. > > The *bigger* annoyance is actually "do_mmap()", which does a > do_munmap() as part of it, so it needs the same cleanup too. > > There might be other cases that do munmap as part of their operation > (and that have the mmap_sem held outside of the caller), but > do_munmap() and do_mmap() seem to be the two obvious ones. > > Many of the callers seem to do the mmap_sem() right around the call, > though (see binfmt_elf.c for example), so it really would be a rather > local fixup. Yes, that's exactly how I was thinking of it. Hugh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 19:18 ` Linus Torvalds 2012-04-20 19:32 ` Hugh Dickins @ 2012-04-20 19:58 ` Al Viro 2012-04-20 21:12 ` Linus Torvalds 1 sibling, 1 reply; 32+ messages in thread From: Al Viro @ 2012-04-20 19:58 UTC (permalink / raw) To: Linus Torvalds Cc: Hugh Dickins, linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller, Andrew Morton On Fri, Apr 20, 2012 at 12:18:43PM -0700, Linus Torvalds wrote: > The *bigger* annoyance is actually "do_mmap()", which does a > do_munmap() as part of it, so it needs the same cleanup too. So does a bunch of other places. Let me dig out the call graph circa 3.3.0... Here is the relevant part: do_munmap() - <- pfm_do_munmap() <- pfm_remove_smpl_mapping() which grabs mmap_sem excl <- 64_munmap(2) which grabs mmap_sem excl <- kvm_arch_commit_memory_region() which grabs mmap_sem excl <- i810_unmap_buffer() which grabs mmap_sem excl <- aio_free_ring() which grabs mmap_sem excl <- elf_map() which grabs mmap_sem excl <- [flat] load_flat_file() --- BUG HERE <- shmdt(2) which grabs mmap_sem excl <- brk(2) which grabs mmap_sem excl <- mmap_region() [see below] <- munmap(2) which grabs mmap_sem excl <- do_brk() [see below] <- move_vma() <- mremap_to() <- do_mremap() [see below] <- do_mremap() [see below] <- mremap_to() <- do_mremap() [see below] <- do_mremap() [see below] do_brk() - <- brk(2) which grabs mmap_sem excl <- [ia32_aout] set_brk() which grabs mmap_sem excl <- [ia32_aout] load_aout_binary() which grabs mmap_sem excl <- [ia32_aout] load_aout_library() which grabs mmap_sem excl <- [aout] set_brk() which grabs mmap_sem excl <- [aout] load_aout_binary() which grabs mmap_sem excl <- [aout] load_aout_library() which grabs mmap_sem excl <- [elf] set_brk() which grabs mmap_sem excl <- [elf] load_elf_interp() which grabs mmap_sem excl <- [elf] load_elf_library() which grabs mmap_sem excl mmap_region() - <- remap_file_pages(2) which grabs mmap_sem excl <- do_mmap_pgoff() [see below] <- [tile] arch_setup_additional_pages() which grabs mmap_sem excl (a bit too late, BTW, but not for this one) do_mmap_pgoff() - <- do_mmap() [see below] <- mmap_pgoff(2) which grabs mmap_sem excl do_mmap() - <- shmat(2) which grabs mmap_sem excl <- aio_setup_ring() which grabs mmap_sem excl [NB: only because ctx->mm == current->mm] <- kvm_arch_prepare_memory_region() which grabs mmap_sem excl <- drm_mapbufs() which grabs mmap_sem excl <- exynos_drm_gem_mmap_ioctl() which grabs mmap_sem excl <- i810_map_buffer() which grabs mmap_sem excl [NB: racy changes of ->f_op] <- i915_gem_mmap_ioctl() which grabs mmap_sem excl <- [tile] single_step_once() which grabs mmap_sem excl <- [elf] elf_map() which grabs mmap_sem excl <- [elf] load_elf_binary() which grabs mmap_sem excl <- [elf_fdpic] load_elf_fdpic_binary() which grabs mmap_sem excl <- [elf_fdpic] elf_fdpic_map_file_constdisp_on_uclinux() which grabs mmap_sem excl <- [elf_fdpic] elf_fdpic_map_file_by_direct_mmap() which grabs mmap_sem excl <- [aout] load_aout_binary() which grabs mmap_sem excl <- [aout] load_aout_library() which grabs mmap_sem excl <- [ia32_aout] load_aout_binary() which grabs mmap_sem excl <- [ia32_aout] load_aout_library() which grabs mmap_sem excl <- [flat] load_flat_file() which grabs mmap_sem excl <- [som] map_som_binary() which grabs mmap_sem excl do_mremap() - <- mremap(2) which grabs mmap_sem excl (bug mentioned re load_flat_file() is still there, but it's irrelevant for our purposes - no ->mmap_sem held by caller of do_munmap()). That's a metric arseload of sites to propagate that thing to... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 19:58 ` Al Viro @ 2012-04-20 21:12 ` Linus Torvalds 2012-04-20 22:13 ` Al Viro 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2012-04-20 21:12 UTC (permalink / raw) To: Al Viro Cc: Hugh Dickins, linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller, Andrew Morton On Fri, Apr 20, 2012 at 12:58 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > So does a bunch of other places. Let me dig out the call graph circa > 3.3.0... Here is the relevant part: Yes, but a lot of those would actually be helped by a helper function that does all of: - grab mmap_sem - call do_m[un]map() - release mmap_sem and that would actually clean them up even in the current case. And then we could do the cleanup in just the helper function. Not all, no. But a preparatory patch that just creates the helper functions for doing brk/mmap/munmap would get rid of a fairly big chunk of them. You can visualize how many of them do that by just doing git grep -5 do_m[un]*map and then high-lghting '_write(' (to visually show the down_write/up_write pairs that surround most of them) by searching for it. Are they all like that? No. But most of the ones outside of mm/ do fit that simple pattern and should probably be fixed up just to have them not contain VM locking details in them *anyway*. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 21:12 ` Linus Torvalds @ 2012-04-20 22:13 ` Al Viro 2012-04-20 22:35 ` Linus Torvalds 0 siblings, 1 reply; 32+ messages in thread From: Al Viro @ 2012-04-20 22:13 UTC (permalink / raw) To: Linus Torvalds Cc: Hugh Dickins, linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller, Andrew Morton On Fri, Apr 20, 2012 at 02:12:55PM -0700, Linus Torvalds wrote: > Are they all like that? No. But most of the ones outside of mm/ do fit > that simple pattern and should probably be fixed up just to have them > not contain VM locking details in them *anyway*. Kinda-sorta. I agree that such helpers make sense, but you are too optimistic about the number of such places. And clusterfuck around mremap() is fairly deep, so propagating all way back to up_write() wont' be fun. sys_shmdt() is misusing do_munmap(), AFAICS. And there we call it many times in a loop, unmapping only a subset, so it's not like we could blindly pick a string of vmas and handle them all separately afterwards. It could be handled if we passed an initially empty list, collecting vmas in it as we go, but... ouch BTW, looks like I've just found a bug in oprofile - take a look at sys_munmap() and you'll see that it's almost exactly your proposed helper. The only difference is that it starts with calling profile_munmap(addr); (mind you, *before* grabbing ->mmap_sem), which is to say blocking_notifier_call_chain(&munmap_notifier, 0, (void *)addr); i.e. a really fancy way to arrange a call of munmap_notify() from drivers/oprofile/buffer_sync.c. Which does the following: { unsigned long addr = (unsigned long)data; struct mm_struct *mm = current->mm; struct vm_area_struct *mpnt; down_read(&mm->mmap_sem); mpnt = find_vma(mm, addr); if (mpnt && mpnt->vm_file && (mpnt->vm_flags & VM_EXEC)) { up_read(&mm->mmap_sem); /* To avoid latency problems, we only process the current CPU, * hoping that most samples for the task are on this CPU */ sync_buffer(raw_smp_processor_id()); return 0; } up_read(&mm->mmap_sem); return 0; } Leaving aside the convoluted way it's done, it's obviously both racy and incomplete. The worst part is, sync_buffer() *can't* be called with any ->mmap_sem held; it goes through a bunch of task_struct, grabbing ->mmap_sem shared. Fun... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 22:13 ` Al Viro @ 2012-04-20 22:35 ` Linus Torvalds 2012-04-27 7:35 ` Kasatkin, Dmitry 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2012-04-20 22:35 UTC (permalink / raw) To: Al Viro Cc: Hugh Dickins, linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, David Miller, Andrew Morton On Fri, Apr 20, 2012 at 3:13 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Kinda-sorta. I agree that such helpers make sense, but you are too > optimistic about the number of such places. And clusterfuck around > mremap() is fairly deep, so propagating all way back to up_write() > wont' be fun. Fair enough. I'll do the helpers and see how much they get rid of, just because looking at all the callers, those helpers seem to be obviously the right thing anyway. So even if we don't do anything else, we can improve things regardless. For do_brk(), for example, it looks like do_brk() itself should actually be entirely static to mm/mmap.c, because every single caller from the outside actually wants the self-locking version. So plan right now: do "vm_xyzzy()" helper functions that do "do_xyzzy()" and take the lock (and do not take the "mm" argument, because it had better always be the current one - keep the calling convention as simple as possible). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 22:35 ` Linus Torvalds @ 2012-04-27 7:35 ` Kasatkin, Dmitry 2012-04-27 17:34 ` Al Viro 0 siblings, 1 reply; 32+ messages in thread From: Kasatkin, Dmitry @ 2012-04-27 7:35 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Hugh Dickins, linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Mimi Zohar, David Miller, Andrew Morton On Sat, Apr 21, 2012 at 1:35 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Fri, Apr 20, 2012 at 3:13 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Kinda-sorta. I agree that such helpers make sense, but you are too > > optimistic about the number of such places. And clusterfuck around > > mremap() is fairly deep, so propagating all way back to up_write() > > wont' be fun. > > Fair enough. > > I'll do the helpers and see how much they get rid of, just because > looking at all the callers, those helpers seem to be obviously the > right thing anyway. So even if we don't do anything else, we can > improve things regardless. > > For do_brk(), for example, it looks like do_brk() itself should > actually be entirely static to mm/mmap.c, because every single caller > from the outside actually wants the self-locking version. > > So plan right now: do "vm_xyzzy()" helper functions that do > "do_xyzzy()" and take the lock (and do not take the "mm" argument, > because it had better always be the current one - keep the calling > convention as simple as possible). > > Linus Moi Linus, Taking the list of files to fput out of do_munmap() and doing it after unlocking mmap sem was one of possible solutions when we discussed it with Mimi. But we were looking for the solution with less modifications of existing kernel. In fact IMA is already doing some work before taking mmap sem. --- ret = ima_file_mmap(filep, prot); if (ret < 0) return ret; down_write(¤t->mm->mmap_sem); -- But have you seen the proposed patch for __fput()? [PATCH v4 10/12] ima: defer calling __fput() It defers only of course the last AND mmap_sem is locked AND open for write. if (current->mm && rwsem_is_locked(¤t->mm->mmap_sem)) { if (ima_defer_fput(file) == 0) return; } Just 5 out of ~100,000 mmap_sem held fput() calls were deferred. t. - Dmitry ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-27 7:35 ` Kasatkin, Dmitry @ 2012-04-27 17:34 ` Al Viro 2012-04-27 18:52 ` Kasatkin, Dmitry 2012-04-30 14:32 ` Mimi Zohar 0 siblings, 2 replies; 32+ messages in thread From: Al Viro @ 2012-04-27 17:34 UTC (permalink / raw) To: Kasatkin, Dmitry Cc: Linus Torvalds, Hugh Dickins, linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Mimi Zohar, David Miller, Andrew Morton On Fri, Apr 27, 2012 at 10:35:25AM +0300, Kasatkin, Dmitry wrote: > But have you seen the proposed patch for __fput()? > [PATCH v4 10/12] ima: defer calling __fput() > > It defers only of course the last AND mmap_sem is locked AND open for write. > > if (current->mm && rwsem_is_locked(¤t->mm->mmap_sem)) { > if (ima_defer_fput(file) == 0) > return; > } > > Just 5 out of ~100,000 mmap_sem held fput() calls were deferred. Let me get it straight. a) You still ignore all the problems with that described in the posting right in the beginning of this thread. b) You ignore the problems with semantics changes from user-visible delays of fput() past the return from syscall (described in Linus' posting upthread - they apply to this "solution" as well). c) You seem to consider the fact that this path will be exercised very rarely, thus making any races on it damn hard to reproduce and debug as a good thing. And as for the sentiment expressed in the beginning of your posting (that smaller patch size is worth more than clean locking rules, maintainability of resulting kernel, etc.)... I'm sorry, but you guys need to decide what IMA is. If it's a first-class part of the kernel, you have your priorities backwards... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-27 17:34 ` Al Viro @ 2012-04-27 18:52 ` Kasatkin, Dmitry 2012-04-27 19:15 ` Kasatkin, Dmitry 2012-04-30 14:32 ` Mimi Zohar 1 sibling, 1 reply; 32+ messages in thread From: Kasatkin, Dmitry @ 2012-04-27 18:52 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Hugh Dickins, linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Mimi Zohar, David Miller, Andrew Morton On Fri, Apr 27, 2012 at 8:34 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Fri, Apr 27, 2012 at 10:35:25AM +0300, Kasatkin, Dmitry wrote: > >> But have you seen the proposed patch for __fput()? >> [PATCH v4 10/12] ima: defer calling __fput() >> >> It defers only of course the last AND mmap_sem is locked AND open for write. >> >> if (current->mm && rwsem_is_locked(¤t->mm->mmap_sem)) { >> if (ima_defer_fput(file) == 0) >> return; >> } >> >> Just 5 out of ~100,000 mmap_sem held fput() calls were deferred. > > Let me get it straight. > a) You still ignore all the problems with that described in the > posting right in the beginning of this thread. > b) You ignore the problems with semantics changes from user-visible > delays of fput() past the return from syscall (described in Linus' posting > upthread - they apply to this "solution" as well). > c) You seem to consider the fact that this path will be exercised > very rarely, thus making any races on it damn hard to reproduce and debug > as a good thing. > > And as for the sentiment expressed in the beginning of your posting (that > smaller patch size is worth more than clean locking rules, maintainability > of resulting kernel, etc.)... I'm sorry, but you guys need to decide > what IMA is. If it's a first-class part of the kernel, you have your > priorities backwards... Hello, I do not ignore anything. I said that we were thinking about solution to get the list of file to fput them after mmap unlock. And I do understand the issues discussed. I just wanted to know more opinions on proposed patch. - Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-27 18:52 ` Kasatkin, Dmitry @ 2012-04-27 19:15 ` Kasatkin, Dmitry 0 siblings, 0 replies; 32+ messages in thread From: Kasatkin, Dmitry @ 2012-04-27 19:15 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Hugh Dickins, linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Mimi Zohar, David Miller, Andrew Morton On Fri, Apr 27, 2012 at 9:52 PM, Kasatkin, Dmitry <dmitry.kasatkin@intel.com> wrote: > On Fri, Apr 27, 2012 at 8:34 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> On Fri, Apr 27, 2012 at 10:35:25AM +0300, Kasatkin, Dmitry wrote: >> >>> But have you seen the proposed patch for __fput()? >>> [PATCH v4 10/12] ima: defer calling __fput() >>> >>> It defers only of course the last AND mmap_sem is locked AND open for write. >>> >>> if (current->mm && rwsem_is_locked(¤t->mm->mmap_sem)) { >>> if (ima_defer_fput(file) == 0) >>> return; >>> } >>> >>> Just 5 out of ~100,000 mmap_sem held fput() calls were deferred. >> >> Let me get it straight. >> a) You still ignore all the problems with that described in the >> posting right in the beginning of this thread. >> b) You ignore the problems with semantics changes from user-visible >> delays of fput() past the return from syscall (described in Linus' posting >> upthread - they apply to this "solution" as well). >> c) You seem to consider the fact that this path will be exercised >> very rarely, thus making any races on it damn hard to reproduce and debug >> as a good thing. >> >> And as for the sentiment expressed in the beginning of your posting (that >> smaller patch size is worth more than clean locking rules, maintainability >> of resulting kernel, etc.)... I'm sorry, but you guys need to decide >> what IMA is. If it's a first-class part of the kernel, you have your >> priorities backwards... > > Hello, > > I do not ignore anything. > I said that we were thinking about solution to get the list of file to > fput them after mmap unlock. > And I do understand the issues discussed. > I just wanted to know more opinions on proposed patch. > > - Dmitry For sure we are happy to see the discussion about the problem. Thanks for your attention to it. And our target to solve it in best possible way. So let's agree the solution and we will be happy to implement the patch. - Dmitry ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-27 17:34 ` Al Viro 2012-04-27 18:52 ` Kasatkin, Dmitry @ 2012-04-30 14:32 ` Mimi Zohar 2012-05-03 4:23 ` James Morris 1 sibling, 1 reply; 32+ messages in thread From: Mimi Zohar @ 2012-04-30 14:32 UTC (permalink / raw) To: Al Viro Cc: Jonathan Corbet <corbet@lwn.net>, Kasatkin, Dmitry, Linus Torvalds, Hugh Dickins, linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, David Miller, Andrew Morton On Fri, 2012-04-27 at 18:34 +0100, Al Viro wrote: > On Fri, Apr 27, 2012 at 10:35:25AM +0300, Kasatkin, Dmitry wrote: > > > But have you seen the proposed patch for __fput()? > > [PATCH v4 10/12] ima: defer calling __fput() > > > > It defers only of course the last AND mmap_sem is locked AND open for write. > > > > if (current->mm && rwsem_is_locked(¤t->mm->mmap_sem)) { > > if (ima_defer_fput(file) == 0) > > return; > > } > > > > Just 5 out of ~100,000 mmap_sem held fput() calls were deferred. > > Let me get it straight. > a) You still ignore all the problems with that described in the > posting right in the beginning of this thread. > b) You ignore the problems with semantics changes from user-visible > delays of fput() past the return from syscall (described in Linus' posting > upthread - they apply to this "solution" as well). > c) You seem to consider the fact that this path will be exercised > very rarely, thus making any races on it damn hard to reproduce and debug > as a good thing. > > And as for the sentiment expressed in the beginning of your posting (that > smaller patch size is worth more than clean locking rules, maintainability > of resulting kernel, etc.)... I'm sorry, but you guys need to decide > what IMA is. If it's a first-class part of the kernel, you have your > priorities backwards... Al, with all this time spent on the different components of the integrity subsystem, making it a first class citizen is our main concern. We definitely appreciate all of your work, previous and current work on fput, to help make this happen and will be happy to help in any way possbile. Jon, thank you for summarizing the discussion - article http://lwn.net/Articles/494158/. As Jake Edge's previous LWN article http://lwn.net/Articles/488906/ said, we've been working on upstreaming the different integrity components for quite a while. Although IMA-appraisal isn't the last component, it is a major one and were hoping that it would be upstreamed in the near future. Is 3.5 still a possibility? (It was just pointed out to me that the discussion has moved to linux-arch. I'm pretty sure that other people on the LSM mailing list are following this discussion. In the future, please CC the LSM mailing.) thanks, Mimi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-30 14:32 ` Mimi Zohar @ 2012-05-03 4:23 ` James Morris 0 siblings, 0 replies; 32+ messages in thread From: James Morris @ 2012-05-03 4:23 UTC (permalink / raw) To: Mimi Zohar Cc: Al Viro, Jonathan Corbet <corbet@lwn.net>, Kasatkin, Dmitry, Linus Torvalds, Hugh Dickins, linux-fsdevel, linux-security-module, linux-kernel, David Safford, David Miller, Andrew Morton On Mon, 30 Apr 2012, Mimi Zohar wrote: > Jon, thank you for summarizing the discussion - article > http://lwn.net/Articles/494158/. As Jake Edge's previous LWN article > http://lwn.net/Articles/488906/ said, we've been working on upstreaming > the different integrity components for quite a while. Although > IMA-appraisal isn't the last component, it is a major one and were > hoping that it would be upstreamed in the near future. Is 3.5 still a > possibility? Not sure why you're asking Jon, but the technical issues raised by Al need to be resolved before the code can be considered for upstreaming. - James -- James Morris <jmorris@namei.org> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) 2012-04-20 19:04 ` Al Viro 2012-04-20 19:18 ` Linus Torvalds @ 2012-04-20 19:37 ` Al Viro 1 sibling, 0 replies; 32+ messages in thread From: Al Viro @ 2012-04-20 19:37 UTC (permalink / raw) To: Hugh Dickins Cc: linux-fsdevel, James Morris, linux-security-module, linux-kernel, David Safford, Dmitry Kasatkin, Mimi Zohar, Linus Torvalds, David Miller, Andrew Morton On Fri, Apr 20, 2012 at 08:04:18PM +0100, Al Viro wrote: > Deferring the final pass after dropping ->mmap_sem is going to be > interesting; what would protect ->vm_next on those suckers? Locking > rules are already bloody complicated in mm/*; this will only add more > subtle fun. Note that right now both the dissolution of ->vm_next > list and freeing of VMAs happen under ->mmap_sem; changing that might > cost us a lot of headache... BTW, even leaving that aside, we have at least one definite deadlock (mainline, not on ->mmap_sem - see the mess in drivers/video/msm/mdp.c) and a possibility of other fun (e.g. any network filesystem that ends up triggering rtnl_lock() during umount is going to deadlock if you combine it with sch_atm.c mess where we do fget()/check/fput() in netlink message handling, close() from another thread racing with it and descriptor being the only thing that keeps that network fs alive past lazy umount or namespace dissolution). And then there's SCM_RIGHTS vs. drivers/vhost/net.c, which might or might not be deadlockable - I'm not familiar enough with that driver to tell right now. _If_ it had been ->mmap_sem alone, the things would be much simpler. There wouldn't be a problem, to start with - IMA stuff is not in the mainline and that's the only thing that really wants ->i_mutex in fput(); the few existing places that grab it in ->release() are racy anyway and need to be fixed with proper locking. ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2012-05-03 4:23 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-04-18 13:04 [PULL REQUEST] : ima-appraisal patches Mimi Zohar 2012-04-18 15:02 ` James Morris 2012-04-18 18:07 ` Mimi Zohar 2012-04-18 18:39 ` Al Viro 2012-04-18 20:56 ` Mimi Zohar 2012-04-19 19:57 ` Mimi Zohar 2012-04-20 0:43 ` [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) Al Viro 2012-04-20 2:31 ` Linus Torvalds 2012-04-20 2:54 ` Al Viro 2012-04-20 2:58 ` Linus Torvalds 2012-04-20 8:09 ` Al Viro 2012-04-20 15:56 ` Linus Torvalds 2012-04-20 16:08 ` Al Viro 2012-04-20 16:42 ` Al Viro 2012-04-20 17:21 ` Linus Torvalds 2012-04-20 18:07 ` Al Viro 2012-04-20 3:15 ` Al Viro 2012-04-20 18:54 ` Hugh Dickins 2012-04-20 19:04 ` Al Viro 2012-04-20 19:18 ` Linus Torvalds 2012-04-20 19:32 ` Hugh Dickins 2012-04-20 19:58 ` Al Viro 2012-04-20 21:12 ` Linus Torvalds 2012-04-20 22:13 ` Al Viro 2012-04-20 22:35 ` Linus Torvalds 2012-04-27 7:35 ` Kasatkin, Dmitry 2012-04-27 17:34 ` Al Viro 2012-04-27 18:52 ` Kasatkin, Dmitry 2012-04-27 19:15 ` Kasatkin, Dmitry 2012-04-30 14:32 ` Mimi Zohar 2012-05-03 4:23 ` James Morris 2012-04-20 19:37 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).