public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: linux-fsdevel@vger.kernel.org
Cc: James Morris <jmorris@namei.org>,
	linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	David Safford <safford@linux.vnet.ibm.com>,
	Dmitry Kasatkin <dmitry.kasatkin@intel.com>,
	Mimi Zohar <zohar@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	David Miller <davem@davemloft.net>
Subject: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)
Date: Fri, 20 Apr 2012 01:43:03 +0100	[thread overview]
Message-ID: <20120420004303.GB6871@ZenIV.linux.org.uk> (raw)
In-Reply-To: <1334865448.2429.35.camel@falcor>

On Thu, Apr 19, 2012 at 03:57:28PM -0400, Mimi Zohar wrote:

> Has the discussion here moved from deferring the __fput() for the
> mmap_sem/i_mutex lockdep side case, to taking the i_mutex in __fput() in
> general?  Lockdep has not reported any problems, other than for the
> mmap_sem/i_mutex scenario.
> 
> > This
> > is not a way to go; that kind of kludges leads to locking code that is
> > impossible to reason about.
> 
> Are you referring to defering the __fput() or taking the i_mutex in
> __fput() in general?

I'm referring to having "are we holding ->mmap_sem" logics anywhere in fput().
Note that your "lockdep has not reported..." is a symptom of the same
problem - it's *NOT* enough to show the lack of deadlocks from the change
of locking rules.  And after that change we'll get even worse situation,
since proving the safety will become harder (and even more so if we end
up adding other cases when we need to defer).

Summary for those who'd been spared the earlier fun: IMA stuff wants to grab
->i_mutex on the final fput() in some cases.  That violates the locking
order in at least one way - final fput() may come under ->mmap_sem, in
which case grabbing ->i_mutex would be a Bloody Bad Idea(tm).  "Solution"
proposed: a bit of spectaculary ugly logics checking in final fput() whether
we have ->mmap_sem locked.  If we do, said final fput() becomes non-final
and is done asynchronously.  This is fundamentally flawed, of course,
since "is ->mmap_sem unlocked" is used as "is it safe to have ->i_mutex
grabbed", and it's both unproven and brittle as hell even if it happens
to be true right now (and I wouldn't bet on that, TBH).

If it had been IMA alone, I would've cheerfully told them where to stuff
the whole thing.  Unfortunately, fput() *is* lock-heavy even without them.
Note that it can easily end up triggering such things as final
deactivate_super().  Now, ->mmap_sem is irrelevant here - after all,
any inodes involved won't be mmapped, or deactivate_super() wouldn't
be final.  However, fput() is called e.g. under rtnl_lock() and I'm
not at all sure that something like NFS won't try to grab it from its
->kill_sb().

So the question is, do we have any reasonable way to get to the situation
where the __fput() would only be done without any locks held?  It might
be worth trying. What we CAN'T do:
	* defer all __fput() (i.e. always do final fput() async).  Obviously
no-go, for performance reasons alone.
	* check some predicate about the set of locks held and defer if it's
false.  That's what IMA folks have tried to do; no-go for the reasons described
above.
	* add a new helper (fput_carefully() or something like that) that would
defer final __fput(), leaving fput() with the current behaviour.  And convert
the potentially unsafe callers to it (starting with everything that holds
->mmap_sem).  No-go since it's not maintainable - a change pretty far away
from a function that does (currently safe) fput() can end up requiring the
conversion to fput_carefully().  Too easy to miss, so this will decay and it
won't be easy to verify correctness several years down the road.

However, there's an approach that might be feasible.  Most of the time
the final fput() *is* done without any locks held and there's a very
large subclass of those call sites - those that come via fput_light().  
What we could do, and what might be maintainable is:
	* prohibit fput_light() with locks held.  Right now we are very
close to that (or already there - I haven't finished checking).
	* convert low-hanging fget/fput in syscalls to fget_light/fput_light.
Makes sense anyway.
	* split off fput_nodefer() (identical to what fput() is right now),
make fput_light() call it instead of fput().  Switch path_lookupat() and
path_openat() to fput_nodefer() as well.  Ditto for sys_socketpair() and
sys_accept4().
	* make fput() (now almost never leading to __fput()) defer the sucker
in very rare cases when it ends up dropping the last reference.
At that point we would have __fput() always done without any locks held,
which would remove all restrictions on the locks that can be taken from it.

Comments?

Disclaimer: I have not finished going through the call sites (note that
there are wrappers - e.g. sockfd_put() *and* sockfd_lookup()), so there might
be obstacles.  In particular, I'm still not sure about SCM_RIGHTS datagrams
handling - IIRC, we had an interesting kludge in there, with a kinda-sorta
deferral/batching.  BTW, I wonder about deadlocks around that one -
drivers/vhost/net.c is doing ->recvmsg() while holding vq->mutex and if
an SCM_RIGHTS datagram comes in, we'll get a bunch of fput() done.  Which
can lead to final umount of a network filesystem, etc.  If that thing can
lead to handle_rx() on the same sucker, we have a deadlock...

  reply	other threads:[~2012-04-20  0:43 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-18 13:04 [PULL REQUEST] : ima-appraisal patches Mimi Zohar
2012-04-18 15:02 ` James Morris
2012-04-18 18:07   ` Mimi Zohar
2012-04-18 18:39     ` Al Viro
2012-04-18 20:56       ` Mimi Zohar
2012-04-19 19:57       ` Mimi Zohar
2012-04-20  0:43         ` Al Viro [this message]
2012-04-20  2:31           ` [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) Linus Torvalds
2012-04-20  2:54             ` Al Viro
2012-04-20  2:58               ` Linus Torvalds
2012-04-20  8:09                 ` Al Viro
2012-04-20 15:56                   ` Linus Torvalds
2012-04-20 16:08                     ` Al Viro
2012-04-20 16:42                       ` Al Viro
2012-04-20 17:21                         ` Linus Torvalds
2012-04-20 18:07                           ` Al Viro
2012-04-23 18:01                             ` [RFC] TIF_NOTIFY_RESUME, arch/*/*/*signal*.c and all such Al Viro
2012-04-23 18:37                               ` Oleg Nesterov
2012-04-24  7:26                               ` Al Viro
2012-04-25  3:06                                 ` Al Viro
2012-04-25 12:37                                   ` Oleg Nesterov
2012-04-25 12:50                                     ` Al Viro
2012-04-25 13:03                                       ` Oleg Nesterov
2012-04-25 13:32                                         ` Oleg Nesterov
2012-04-25 13:32                                         ` Al Viro
2012-04-25 14:52                                           ` Oleg Nesterov
2012-04-25 15:46                                             ` Oleg Nesterov
2012-04-25 16:10                                               ` Al Viro
2012-04-25 17:02                                                 ` Oleg Nesterov
2012-04-25 17:51                                                   ` Al Viro
2012-04-26  7:15                                                     ` Martin Schwidefsky
2012-04-26  7:25                                                       ` David Miller
2012-04-26 13:52                                                       ` Oleg Nesterov
2012-04-26 14:31                                                         ` Martin Schwidefsky
2012-04-26 13:22                                                     ` Oleg Nesterov
2012-04-26 18:37                                 ` Oleg Nesterov
2012-04-26 23:19                                   ` Al Viro
2012-04-27 17:24                                     ` Oleg Nesterov
2012-04-27 17:54                                       ` Oleg Nesterov
2012-05-02 10:37                                         ` Matt Fleming
2012-05-02 14:14                                           ` Al Viro
2012-04-27 18:45                                       ` Al Viro
2012-04-27 19:14                                         ` Geert Uytterhoeven
2012-04-27 19:34                                           ` Al Viro
2012-04-29 22:51                                             ` Al Viro
2012-04-30  6:39                                               ` Greg Ungerer
2012-04-27 19:42                                         ` Al Viro
2012-04-27 20:20                                         ` Roland McGrath
2012-04-27 21:12                                           ` Al Viro
2012-04-27 21:27                                             ` Roland McGrath
2012-04-27 23:15                                               ` Al Viro
2012-04-27 23:32                                                 ` Al Viro
2012-04-29  4:12                                                   ` Al Viro
2012-04-30  8:06                                                     ` Martin Schwidefsky
2012-04-27 23:50                                                 ` Al Viro
2012-04-28 18:51                                                   ` [PATCH] arch/tile: avoid calling do_signal() after fork from a kernel thread Chris Metcalf
2012-04-28 20:55                                                     ` Al Viro
2012-04-28 21:46                                                       ` Chris Metcalf
2012-04-29  0:55                                                         ` Al Viro
2012-04-28 18:51                                                           ` [PATCH v2] arch/tile: fix up some issues in calling do_work_pending() Chris Metcalf
2012-04-29  3:49                                                           ` [PATCH] arch/tile: avoid calling do_signal() after fork from a kernel thread Chris Metcalf
2012-04-28  2:42                                                 ` [RFC] TIF_NOTIFY_RESUME, arch/*/*/*signal*.c and all such Al Viro
2012-04-28  3:32                                                   ` Al Viro
2012-04-28  3:36                                                     ` Al Viro
2012-04-29 16:33                                                     ` Oleg Nesterov
2012-04-29 16:18                                                   ` Oleg Nesterov
2012-04-29 18:05                                                     ` Al Viro
2012-05-01  4:31                                                       ` Al Viro
2012-05-01  5:06                                                         ` Mike Frysinger
2012-05-01  5:52                                                           ` Al Viro
2012-05-02 17:24                                                             ` Al Viro
2012-05-02 18:30                                                       ` Oleg Nesterov
2012-04-29 16:41                                         ` Oleg Nesterov
2012-04-29 18:09                                           ` Al Viro
2012-04-29 18:25                                             ` Oleg Nesterov
2012-04-20  3:15               ` [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches) Al Viro
2012-04-20 18:54           ` Hugh Dickins
2012-04-20 19:04             ` Al Viro
2012-04-20 19:18               ` Linus Torvalds
2012-04-20 19:32                 ` Hugh Dickins
2012-04-20 19:58                 ` Al Viro
2012-04-20 21:12                   ` Linus Torvalds
2012-04-20 22:13                     ` Al Viro
2012-04-20 22:35                       ` Linus Torvalds
2012-04-27  7:35                         ` Kasatkin, Dmitry
2012-04-27 17:34                           ` Al Viro
2012-04-27 18:52                             ` Kasatkin, Dmitry
2012-04-27 19:15                               ` Kasatkin, Dmitry
2012-04-30 14:32                             ` Mimi Zohar
2012-05-03  4:23                               ` James Morris
2012-04-20 19:37               ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120420004303.GB6871@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=davem@davemloft.net \
    --cc=dmitry.kasatkin@intel.com \
    --cc=jmorris@namei.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=safford@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=zohar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox