All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Dave Jones <davej@redhat.com>,
	xfs@oss.sgi.com, Linux Kernel <linux-kernel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: fs/attr.c:notify_change locking warning.
Date: Wed, 16 Oct 2013 08:36:18 +1100	[thread overview]
Message-ID: <20131015213618.GU4446@dastard> (raw)
In-Reply-To: <20131015201905.GA7509@infradead.org>

On Tue, Oct 15, 2013 at 01:19:05PM -0700, Christoph Hellwig wrote:
> On Sat, Oct 05, 2013 at 01:19:18PM +1000, Dave Chinner wrote:
> > Yup, we don't hold the i_mutex *at all* through the fast path for
> > direct IO writes. Having to grab the i_mutex on every IO just for
> > the extremely unlikely case we need to remove a suid bit on the file
> > would add a significant serialisation point into the direct Io model
> > that XFS uses, and is the difference between 50,000 and 2+ million
> > direct IO IOPS to a single file.
> > 
> > I'm unwilling to sacrifice the concurrency of direct IO writes just
> > to shut up ths warning, especially as the actual modifications that
> > are made to remove SUID bits are correctly serialised within XFS
> > once notify_change() calls ->setattr(). If it really matters, I'll
> > just open code file_remove_suid() into XFS like ocfs2 does just so
> > we don't get that warning being emitted by trinity.
> 
> But the i_lock doesn't synchronize against the VFS modifying various
> struct inode fields.

Sure, but file_remove_suid() doesn't actually modify any VFS inode
structures until we process the flags and the modifications within
->setattr, which in XFS are all done under the XFS_ILOCK_EXCL via
xfs_setattr_mode(). i.e. both the VFS and XFS inodes S*ID bits are
removed only under XFS_ILOCK_EXCL....

Hence I see no point in adding extra serialisation via the i_mutex
to this path when we can just do something like:

	killsuid = should_remove_suid(file->f_path.dentry);
	if (killsuid) {
		struct iattr	newattr;

		newattr.ia_valid = ATTR_FORCE | killsuid;
		error = xfs_setattr_nonsize(ip, &newattr, 0);
		if (error)
			return error;
	}

and not require the i_mutex at all...

Indeed, this is exactly what do_truncate() does - the check outside
the i_mutex, then calls notify_change() with the i_mutex held. IOWs,
the i_mutex does nothing to serialise concurrent attempts to check
and remove S*ID bits....

> The right fix is to take i_mutex just in case
> we actually need to remove the suid bit.  The patch below should fix it,
> although I need to write a testcase that actually exercises it first.
> 
> Dave (J.): if you have time to try the patch below please go ahead,
> if not I'll make sure to write an isolated test ASAP to verify it and
> will then submit the change.
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 4c749ab..e879f96 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -590,8 +590,22 @@ restart:
>  	 * If we're writing the file then make sure to clear the setuid and
>  	 * setgid bits if the process is not being run by root.  This keeps
>  	 * people from modifying setuid and setgid binaries.
> +	 *
> +	 * Note that file_remove_suid must be called with the i_mutex held,
> +	 * so we have to go through some hoops here to make sure we hold it.
>  	 */
> -	return file_remove_suid(file);
> +	if (!IS_NOSEC(inode) && should_remove_suid(file->f_path.dentry)) {
> +		if (*iolock == XFS_IOLOCK_SHARED) {
> +			mutex_lock(&inode->i_mutex);
> +			error = file_remove_suid(file);
> +			mutex_unlock(&inode->i_mutex);

Lock inversion - i_mutex is always outside i_iolock. i.e. this will
deadlock if someone else calls xfs_rw_ilock(XFS_ILOCK_EXCL) at the
same time because we already hold the i_iolock in shared mode. It's
the same case that this function already handles for the EOF zeroing
relocking.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	xfs@oss.sgi.com
Subject: Re: fs/attr.c:notify_change locking warning.
Date: Wed, 16 Oct 2013 08:36:18 +1100	[thread overview]
Message-ID: <20131015213618.GU4446@dastard> (raw)
In-Reply-To: <20131015201905.GA7509@infradead.org>

On Tue, Oct 15, 2013 at 01:19:05PM -0700, Christoph Hellwig wrote:
> On Sat, Oct 05, 2013 at 01:19:18PM +1000, Dave Chinner wrote:
> > Yup, we don't hold the i_mutex *at all* through the fast path for
> > direct IO writes. Having to grab the i_mutex on every IO just for
> > the extremely unlikely case we need to remove a suid bit on the file
> > would add a significant serialisation point into the direct Io model
> > that XFS uses, and is the difference between 50,000 and 2+ million
> > direct IO IOPS to a single file.
> > 
> > I'm unwilling to sacrifice the concurrency of direct IO writes just
> > to shut up ths warning, especially as the actual modifications that
> > are made to remove SUID bits are correctly serialised within XFS
> > once notify_change() calls ->setattr(). If it really matters, I'll
> > just open code file_remove_suid() into XFS like ocfs2 does just so
> > we don't get that warning being emitted by trinity.
> 
> But the i_lock doesn't synchronize against the VFS modifying various
> struct inode fields.

Sure, but file_remove_suid() doesn't actually modify any VFS inode
structures until we process the flags and the modifications within
->setattr, which in XFS are all done under the XFS_ILOCK_EXCL via
xfs_setattr_mode(). i.e. both the VFS and XFS inodes S*ID bits are
removed only under XFS_ILOCK_EXCL....

Hence I see no point in adding extra serialisation via the i_mutex
to this path when we can just do something like:

	killsuid = should_remove_suid(file->f_path.dentry);
	if (killsuid) {
		struct iattr	newattr;

		newattr.ia_valid = ATTR_FORCE | killsuid;
		error = xfs_setattr_nonsize(ip, &newattr, 0);
		if (error)
			return error;
	}

and not require the i_mutex at all...

Indeed, this is exactly what do_truncate() does - the check outside
the i_mutex, then calls notify_change() with the i_mutex held. IOWs,
the i_mutex does nothing to serialise concurrent attempts to check
and remove S*ID bits....

> The right fix is to take i_mutex just in case
> we actually need to remove the suid bit.  The patch below should fix it,
> although I need to write a testcase that actually exercises it first.
> 
> Dave (J.): if you have time to try the patch below please go ahead,
> if not I'll make sure to write an isolated test ASAP to verify it and
> will then submit the change.
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 4c749ab..e879f96 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -590,8 +590,22 @@ restart:
>  	 * If we're writing the file then make sure to clear the setuid and
>  	 * setgid bits if the process is not being run by root.  This keeps
>  	 * people from modifying setuid and setgid binaries.
> +	 *
> +	 * Note that file_remove_suid must be called with the i_mutex held,
> +	 * so we have to go through some hoops here to make sure we hold it.
>  	 */
> -	return file_remove_suid(file);
> +	if (!IS_NOSEC(inode) && should_remove_suid(file->f_path.dentry)) {
> +		if (*iolock == XFS_IOLOCK_SHARED) {
> +			mutex_lock(&inode->i_mutex);
> +			error = file_remove_suid(file);
> +			mutex_unlock(&inode->i_mutex);

Lock inversion - i_mutex is always outside i_iolock. i.e. this will
deadlock if someone else calls xfs_rw_ilock(XFS_ILOCK_EXCL) at the
same time because we already hold the i_iolock in shared mode. It's
the same case that this function already handles for the EOF zeroing
relocking.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2013-10-15 21:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-05  0:52 fs/attr.c:notify_change locking warning Dave Jones
2013-10-05  0:52 ` Dave Jones
2013-10-05  3:19 ` Dave Chinner
2013-10-05  3:19   ` Dave Chinner
2013-10-15 20:19   ` Christoph Hellwig
2013-10-15 20:19     ` Christoph Hellwig
2013-10-15 21:36     ` Dave Chinner [this message]
2013-10-15 21:36       ` Dave Chinner
2013-10-16  7:05       ` Christoph Hellwig
2013-10-16  7:05         ` Christoph Hellwig
2013-10-16 10:26         ` Dave Chinner
2013-10-16 10:26           ` Dave Chinner
2013-10-16 18:12           ` Christoph Hellwig
2013-10-16 18:12             ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131015213618.GU4446@dastard \
    --to=david@fromorbit.com \
    --cc=davej@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.