From: Mimi Zohar <zohar@linux.vnet.ibm.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
LSM List <linux-security-module@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Christoph Hellwig <hch@infradead.org>,
"Theodore Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-integrity@vger.kernel.org,
Sascha Hauer <s.hauer@pengutronix.de>,
Jeff Layton <jlayton@redhat.com>
Subject: Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively
Date: Mon, 02 Oct 2017 08:09:55 -0400 [thread overview]
Message-ID: <1506946195.5691.268.camel@linux.vnet.ibm.com> (raw)
In-Reply-To: <20171002043554.GI15067@dastard>
On Mon, 2017-10-02 at 15:35 +1100, Dave Chinner wrote:
> On Sun, Oct 01, 2017 at 07:42:42PM -0400, Mimi Zohar wrote:
> > On Mon, 2017-10-02 at 09:34 +1100, Dave Chinner wrote:
> > > On Sun, Oct 01, 2017 at 11:41:48AM -0700, Linus Torvalds wrote:
> > > > On Sun, Oct 1, 2017 at 5:08 AM, Mimi Zohar <zohar@linux.vnet.ibm.com> wrote:
> > > > >
> > > > > Right, re-introducing the iint->mutex and a new i_generation field in
> > > > > the iint struct with a separate set of locks should work. It will be
> > > > > reset if the file metadata changes (eg. setxattr, chown, chmod).
> > > >
> > > > Note that the "inner lock" could possibly be omitted if the
> > > > invalidation can be just a single atomic instruction.
> > > >
> > > > So particularly if invalidation could be just an atomic_inc() on the
> > > > generation count, there might not need to be any inner lock at all.
> > > >
> > > > You'd have to serialize the actual measurement with the "read
> > > > generation count", but that should be as simple as just doing a
> > > > smp_rmb() between the "read generation count" and "do measurement on
> > > > file contents".
> > >
> > > We already have a change counter on the inode, which is modified on
> > > any data or metadata write (i_version) under filesystem locks. The
> > > i_version counter has well defined semantics - it's required by
> > > NFSv4 to increment on any metadata or data change - so we should be
> > > able to rely on it's behaviour to implement IMA as well. Filesystems
> > > that support i_version are marked with [SB|MS]_I_VERSION in the
> > > superblock (IS_I_VERSION(inode)) so it should be easy to tell if IMA
> > > can be supported on a specific filesystem (btrfs, ext4, fuse and xfs
> > > ATM).
> >
> > Recently I received a patch to replace i_version with mtime/atime.
>
> mtime is not guaranteed to change on data writes - the resolution of
> the filesystem timestamps may mean mtime only changes once a second
> regardless of the number of writes performed to that file. That's
> why NFS can't use it as a change attribute, and hence we have
> i_version....
>
> > Now, even more recently, I received a patch that claims that
> > i_version is just a performance improvement.
>
> Did you ask them to explain/quantify the performance improvement?
Using i_version is a performance improvement as opposed to always
calculating the file hash and writing the xattr. The patch is
intended for filesystems that don't support i_version (eg. ubifs).
> e.g. Using i_version on XFS slows down performance on small
> writes by 2-3% because i_version because all data writes log a
> version change rather than only logging a change when mtime updates.
> We take that penalty because NFS requires specific change attribute
> behaviour, otherwise we wouldn't have implemented it at all in
> XFS...
>
> > For file systems that
> > don't support i_version, assume that the file has changed.
> >
> > For file systems that don't support i_version, instead of assuming
> > that the file has changed, we can at least use i_generation.
>
> I'm not sure what you mean here - the struct inode already has a
> i_generation variable. It's a lifecycle indicator used to
> discriminate between alloc/free cycles on the same inode number.
> i.e. It only changes at inode allocation time, not whenever the data
> in the inode changes...
Sigh, my error.
>
> > With Linus' suggested changes, I think this will work nicely.
> >
> > > The IMA code should be able to sample that at measurement time and
> > > either fail or be retried if i_version changes during measurement.
> > > We can then simply make the IMA xattr write conditional on the
> > > i_version value being unchanged from the sample the IMA code passes
> > > into the filesystem once the filesystem holds all the locks it needs
> > > to write the xattr...
> >
> > > I note that IMA already grabs the i_version in
> > > ima_collect_measurement(), so this shouldn't be too hard to do.
> > > Perhaps we don't need any new locks or counterst all, maybe just
> > > the ability to feed a version cookie to the set_xattr method?
> >
> > The security.ima xattr is normally written out in
> > ima_check_last_writer(), not in ima_collect_measurement().
>
> Which, if IIUC, does this to measure and update the xattr:
>
> ima_check_last_writer
> -> ima_update_xattr
> -> ima_collect_measurement
> -> ima_fix_xattr
>
> > ima_collect_measurement() calculates the file hash for storing in the
> > measurement list (IMA-measurement), verifying the hash/signature (IMA-
> > appraisal) already stored in the xattr, and auditing (IMA-audit).
>
> Yup, and it samples the i_version before it calculates the hash and
> stores it in the iint, which then gets passed to ima_fix_xattr().
> Looks like all that is needed is to pass the i_version back to the
> filesystem through the xattr call....
>
> IOWs, sample the i_version early while we hold the inode lock and
> check the writer count, then if it is the last writer drop the inode
> lock and call ima_update_xattr(). The sampled i_version then tells
> us if the file has changed before we write the updated xattr...
>
> > The only time that ima_collect_measurement() writes the file xattr is
> > in "fix" mode. Writing the xattr will need to be deferred until after
> > the iint->mutex is released.
>
> ima_collect_measurement() doesn't write an xattr at all - it just
> reads the file data and calculates the hash.
There's another call to ima_fix_xattr() from ima_appraise_measurement().
> > There should be no open writers in ima_check_last_writer(), so the
> > file shouldn't be changing.
>
> If that code is not holding the inode i_rwsem across
> ima_update_xattr(), then the writer check is racy as hell. We're
> trying to get rid of the need for this code to hold the inode lock
> to stabilise the writer count for the entire operation, and it looks
> to me like everything is there to use the i_version to ensure the
> the IMA code doesn't need to hold the inode lock across
> ima_collect_measurement() and ima_fix_xattr()...
Ok
Mimi
next prev parent reply other threads:[~2017-10-02 12:10 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-28 12:39 [RFC PATCH 0/3] define new read_iter file operation rwf flag Mimi Zohar
2017-09-28 12:39 ` [RFC PATCH 1/3] fs: define new read_iter " Mimi Zohar
2017-09-28 13:54 ` Matthew Wilcox
2017-09-28 14:33 ` Mimi Zohar
2017-09-28 15:51 ` Linus Torvalds
2017-09-28 12:39 ` [RFC PATCH 2/3] integrity: use call_read_iter to calculate the file hash Mimi Zohar
2017-09-28 12:39 ` [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively Mimi Zohar
2017-09-28 22:02 ` Dave Chinner
2017-09-28 23:39 ` Linus Torvalds
2017-09-29 0:12 ` Mimi Zohar
2017-09-29 0:33 ` Linus Torvalds
2017-09-29 1:53 ` Mimi Zohar
2017-09-29 3:26 ` Linus Torvalds
2017-10-01 1:33 ` Eric W. Biederman
[not found] ` <CA+55aFx726wT4VprN-sHm6s8Q_PV_VjhTBC4goEbMcerYU1Tig@mail.gmail.com>
2017-10-01 12:08 ` Mimi Zohar
2017-10-01 18:41 ` Linus Torvalds
2017-10-01 22:34 ` Dave Chinner
2017-10-01 23:15 ` Linus Torvalds
2017-10-02 3:54 ` Dave Chinner
2017-10-01 23:42 ` Mimi Zohar
2017-10-02 3:25 ` Eric W. Biederman
2017-10-02 12:25 ` Mimi Zohar
2017-10-02 4:35 ` Dave Chinner
2017-10-02 12:09 ` Mimi Zohar [this message]
2017-10-02 12:43 ` Jeff Layton
2017-10-01 22:06 ` Eric W. Biederman
2017-10-01 22:20 ` Linus Torvalds
2017-10-01 23:54 ` Mimi Zohar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1506946195.5691.268.camel@linux.vnet.ibm.com \
--to=zohar@linux.vnet.ibm.com \
--cc=david@fromorbit.com \
--cc=ebiederm@xmission.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-integrity@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=s.hauer@pengutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).