From: Mimi Zohar <zohar@linux.vnet.ibm.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Christoph Hellwig <hch@lst.de>,
linux-xfs@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Al Viro <viro@ZenIV.linux.org.uk>
Subject: Re: xfs: commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead" change causes hang
Date: Sun, 08 Jan 2017 14:16:43 -0500 [thread overview]
Message-ID: <1483903003.2956.25.camel@linux.vnet.ibm.com> (raw)
In-Reply-To: <1483901848.2542.27.camel@HansenPartnership.com>
On Sun, 2017-01-08 at 10:57 -0800, James Bottomley wrote:
> On Sun, 2017-01-08 at 19:18 +0100, Christoph Hellwig wrote:
> > On Sun, Jan 08, 2017 at 09:59:25AM -0800, James Bottomley wrote:
> > > Hey, that's not really true: the inode lock (i_rwsem) is used in
> > > all sorts of generic places, including generic_file_write_iter().
> > > That's, I think, why ima is using it to try to prevent writes
> > > while it measures the file.
> >
> > But all these are _below_ file_operations. The only place where take
> > them in the VFS is for namespace locking, e.g. before calling into
> > inode_operations (to generalize a little).
>
> Definitely agree we need an abstraction with defined semantics.
>
> > > > So the answer here is that ima needs to stop playing with
> > > > i_rwsem.
> > >
> > > Isn't there a happy medium? most sensible filesystems will allow
> > > shared reading (unless they want to tank performance) so we can
> > > rely on the fact that even if a fs does use i_rwsem internally on
> > > the read path, it will have to be shared.
> >
> > At least for direct I/O that doesn't always have to be true.
>
> I'm unsure about the DIO case, so lets try defining the semantics and
> see if they're implementable for DIO, otherwise simply exclude it.
>
> > > So simply replacing the inode_lock() in ima
> > > with inode_lock_shared() should do what ima wants and not interact
> > > badly even if the underlying FS uses i_rwsem. If there's ever a FS
> > > that takes it exclusively in the read path, ima can simply
> > > blacklist
> > > it.
> >
> > IFF we actually allow recursive readers for rw_semaphores this would
> > work around the issue (but I'm not sure about that fact, at least
> > in the past we didn't). It won't fix IMA for all the file systems
> > use other synchronization for reads, e.g. the cluster locks in ocfs2
> > or gfs2. It won't fix NFS which will exhibit exacly the same issue
> > as Mimi reported.
> >
> > Last but not least it won't solve the problem that IMA has never been
> > designed and does neither document the requires it has from a file
> > system, nor is there any systematic testing for it. It will keep on
> > breaking because it has all kinds of weird implicit assumptions never
> > written down or verified, and the test coverage for it is basically
> > non-existing.
>
> OK, so how about we define it. I think we need two vfs calls:
>
> inode_block_local_writes(inode)
> inode_unblock_local_writes(inode)
>
> With semantics that between these two, all write attempts to the file
> backed by the inode on this system block but reads of the underlying
> file are allowed (I added local so we don't have to implement for
> remote filesystems). inode_block_local_writes() will block until all
> local writes to the file have finished, so you're guaranteed the file
> only allows reads when it succeeds.
>
> As for implementation in the vfs, I suspect an outstanding write count
> in the inode might be the better way?
As a reference point, what you're suggesting is similar to the current
locks that prevent writing to an executable, while it is being executed
(eg. bprm).
Mimi
prev parent reply other threads:[~2017-01-08 19:16 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-08 14:48 xfs: commit 6552321831dc "xfs: remove i_iolock and use i_rwsem in the VFS inode instead" change causes hang Mimi Zohar
2017-01-08 14:52 ` Christoph Hellwig
2017-01-08 15:03 ` Mimi Zohar
2017-01-08 15:14 ` Christoph Hellwig
2017-01-08 15:31 ` Mimi Zohar
2017-01-08 15:37 ` Christoph Hellwig
2017-01-08 16:38 ` Mimi Zohar
2017-01-08 16:43 ` Christoph Hellwig
2017-01-08 17:59 ` James Bottomley
2017-01-08 18:18 ` Christoph Hellwig
2017-01-08 18:57 ` James Bottomley
2017-01-08 19:09 ` Christoph Hellwig
2017-01-08 19:26 ` Al Viro
2017-01-08 20:10 ` Mimi Zohar
2017-01-08 19:39 ` Mimi Zohar
2017-01-09 19:44 ` Jeff Layton
2017-01-10 2:54 ` Mimi Zohar
2017-01-10 16:22 ` Jeff Layton
2017-01-08 19:16 ` Mimi Zohar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1483903003.2956.25.camel@linux.vnet.ibm.com \
--to=zohar@linux.vnet.ibm.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).