[PATCH] mm: add missing mutex lock arround notify

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: add missing mutex lock arround notify_change
@ 2011-12-16 11:25 Djalal Harouni
  2011-12-16 20:55 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Djalal Harouni @ 2011-12-16 11:25 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Minchan Kim, KAMEZAWA Hiroyuki,
	Wu Fengguang
  Cc: linux-mm, linux-kernel


Calls to notify_change() must hold i_mutex.

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
---
 mm/filemap.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index c106d3b..0670ec1 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1994,10 +1994,16 @@ EXPORT_SYMBOL(should_remove_suid);
 
 static int __remove_suid(struct dentry *dentry, int kill)
 {
+	int ret;
 	struct iattr newattrs;
 
 	newattrs.ia_valid = ATTR_FORCE | kill;
-	return notify_change(dentry, &newattrs);
+
+	mutex_lock(&dentry->d_inode->i_mutex);
+	ret = notify_change(dentry, &newattrs);
+	mutex_unlock(&dentry->d_inode->i_mutex);
+
+	return ret;
 }
 
 int file_remove_suid(struct file *file)
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: add missing mutex lock arround notify_change
  2011-12-16 11:25 [PATCH] mm: add missing mutex lock arround notify_change Djalal Harouni
@ 2011-12-16 20:55 ` Andrew Morton
  2011-12-16 21:54   ` Djalal Harouni
  2011-12-17 21:41   ` Al Viro
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2011-12-16 20:55 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Hugh Dickins, Minchan Kim, KAMEZAWA Hiroyuki, Wu Fengguang,
	linux-mm, linux-kernel, Al Viro, J. Bruce Fields, Neil Brown,
	Mikulas Patocka

On Fri, 16 Dec 2011 12:25:34 +0100
Djalal Harouni <tixxdz@opendz.org> wrote:

> 
> Calls to notify_change() must hold i_mutex.
> 

It seems that this is true.  It's tersely documented in
Documentation/filesystems/porting.  It should be mentioned in the
non-existent notify_change() kerneldoc.

<does a quick audit>

fs/hpfs/namei.c and fs/nfsd/vfs.c:nfsd_setattr() aren't obviosuly
holding that lock when calling notify_change().  Everything else under
fs/ looks OK.

I'll add the below to my tree and shall slip it into linux-next, but not
mainline:

--- a/fs/attr.c~a
+++ a/fs/attr.c
@@ -171,6 +171,8 @@ int notify_change(struct dentry * dentry
 	struct timespec now;
 	unsigned int ia_valid = attr->ia_valid;
 
+	WARN_ON_ONCE(!mutex_is_locked(&inode->i_mutex));
+
 	if (ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID | ATTR_TIMES_SET)) {
 		if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
 			return -EPERM;
@@ -245,5 +247,4 @@ int notify_change(struct dentry * dentry
 
 	return error;
 }
-
 EXPORT_SYMBOL(notify_change);


> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1994,10 +1994,16 @@ EXPORT_SYMBOL(should_remove_suid);
>  
>  static int __remove_suid(struct dentry *dentry, int kill)
>  {
> +	int ret;
>  	struct iattr newattrs;
>  
>  	newattrs.ia_valid = ATTR_FORCE | kill;
> -	return notify_change(dentry, &newattrs);
> +
> +	mutex_lock(&dentry->d_inode->i_mutex);
> +	ret = notify_change(dentry, &newattrs);
> +	mutex_unlock(&dentry->d_inode->i_mutex);
> +
> +	return ret;
>  }
>  
>  int file_remove_suid(struct file *file)

Looks good to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: add missing mutex lock arround notify_change
  2011-12-16 20:55 ` Andrew Morton
@ 2011-12-16 21:54   ` Djalal Harouni
  2011-12-17 21:41   ` Al Viro
  1 sibling, 0 replies; 12+ messages in thread
From: Djalal Harouni @ 2011-12-16 21:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Minchan Kim, KAMEZAWA Hiroyuki, Wu Fengguang,
	linux-mm, linux-kernel, Al Viro, J. Bruce Fields, Neil Brown,
	Mikulas Patocka

On Fri, Dec 16, 2011 at 12:55:56PM -0800, Andrew Morton wrote:
> On Fri, 16 Dec 2011 12:25:34 +0100
> Djalal Harouni <tixxdz@opendz.org> wrote:
> 
> > 
> > Calls to notify_change() must hold i_mutex.
> > 
>
> ...
> 
> <does a quick audit>
> 
> fs/hpfs/namei.c and fs/nfsd/vfs.c:nfsd_setattr() aren't obviosuly
> holding that lock when calling notify_change().  Everything else under
> fs/ looks OK.

fs/nfsd/vfs.c:nfsd_setattr() is calling fh_lock() which calls
mutex_lock_nested() with the appropriate i_mutex of the dentry object.
There are some extra functions before the lock which are related to nfsd.

fs/hpfs/namei.c:hpfs_unlink() is using hpfs_lock() to lock the whole
filesystem.

So they are OK.

-- 
tixxdz
http://opendz.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: add missing mutex lock arround notify_change
  2011-12-16 20:55 ` Andrew Morton
  2011-12-16 21:54   ` Djalal Harouni
@ 2011-12-17 21:41   ` Al Viro
  2011-12-17 22:10     ` Al Viro
  2011-12-19  1:43     ` Dave Chinner
  1 sibling, 2 replies; 12+ messages in thread
From: Al Viro @ 2011-12-17 21:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Djalal Harouni, Hugh Dickins, Minchan Kim, KAMEZAWA Hiroyuki,
	Wu Fengguang, linux-mm, linux-kernel, J. Bruce Fields, Neil Brown,
	Mikulas Patocka, Christoph Hellwig

On Fri, Dec 16, 2011 at 12:55:56PM -0800, Andrew Morton wrote:

> >  static int __remove_suid(struct dentry *dentry, int kill)
> >  {
> > +	int ret;
> >  	struct iattr newattrs;
> >  
> >  	newattrs.ia_valid = ATTR_FORCE | kill;
> > -	return notify_change(dentry, &newattrs);
> > +
> > +	mutex_lock(&dentry->d_inode->i_mutex);
> > +	ret = notify_change(dentry, &newattrs);
> > +	mutex_unlock(&dentry->d_inode->i_mutex);
> > +
> > +	return ret;
> >  }

Consider this:
generic_file_aio_write():
        mutex_lock(&inode->i_mutex);
...
        ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);

and from there we have
        err = file_remove_suid(file);
which calls __remove_suid()

Deadlock.  OK, let's look at the callers:

__remove_suid() <- file_remove_suid()

file_remove_suid() <-
	xip_file_write()			! we grab i_mutex there
	__generic_file_aio_write() <-
		generic_file_aio_write()	! we grab i_mutex there
		pohmelfs_write()		! we grab i_mutex there
		blkdev_aio_write()
	generic_file_splice_write()		! we grab i_mutex there
	xfs_file_aio_write_checks()
	ntfs_file_aio_write_nolock() <-
		ntfs_file_aio_write()		! we grab i_mutex there
	fuse_file_aio_write()			! we grab i_mutex there
	btrfs_file_aio_write()			! we grab i_mutex there
	ext4_ioctl(), EXT4_IOC_MOVE_EXT case

We have a shitload of deadlocks on very common paths with that patch.  What
of the paths that do lead to file_remove_suid() without i_mutex?
*	xfs_file_aio_write_checks(): we drop i_mutex (via xfs_rw_iunlock())
just before calling file_remove_suid().  Racy, the fix is obvious - move
file_remove_suid() call before unlocking.
*	ext4_ioctl(): doesn't bother with i_mutex at all, very likely to be
racy.  BTW, that file_remove_suid() belongs *before* mnt_drop_write(), for
obvious reasons.
*	blkdev_aio_write(): file_remove_suid() will be called, but it won't
reach __remove_suid() - should_remove_suid() returns 0 unless we are dealing
with regular file.  And for blkdev_aio_write() that file will be a block
device.

IOW, this patch is bogus and would have deadlocked the box as soon as one
would try to do write(2) on suid file.  Testing Is A Good Thing(tm).

xfs and ext4_ioctl() need to be fixed; XFS fix follows, ext4 I'd rather left
to ext4 folks - I don't know how wide an area needs i_mutex there

xfs: call file_remove_suid() before dropping i_mutex

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 753ed9b..33705b1 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -750,17 +750,16 @@ restart:
 		*new_sizep = new_size;
 	}
 
-	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
-	if (error)
-		return error;
-
 	/*
 	 * If we're writing the file then make sure to clear the setuid and
 	 * setgid bits if the process is not being run by root.  This keeps
 	 * people from modifying setuid and setgid binaries.
 	 */
-	return file_remove_suid(file);
+	if (!error)
+		error = file_remove_suid(file);
 
+	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
+	return error;
 }
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: add missing mutex lock arround notify_change
  2011-12-17 21:41   ` Al Viro
@ 2011-12-17 22:10     ` Al Viro
  2011-12-20 22:09       ` Ted Ts'o
  2011-12-19  1:43     ` Dave Chinner
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2011-12-17 22:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Djalal Harouni, Hugh Dickins, Minchan Kim, KAMEZAWA Hiroyuki,
	Wu Fengguang, linux-mm, linux-kernel, J. Bruce Fields, Neil Brown,
	Mikulas Patocka, Christoph Hellwig, Theodore Ts'o, linux-ext4

On Sat, Dec 17, 2011 at 09:41:37PM +0000, Al Viro wrote:

> xfs and ext4_ioctl() need to be fixed; XFS fix follows, ext4 I'd rather left
> to ext4 folks - I don't know how wide an area needs i_mutex there

Oh, for fsck sake...  People, this is *obviously* broken - if nothing else,
removing suid after modifying the file contents is too late.  Moreover,
this mext_inode_double_lock() thing is asking for trouble; it's deadlock-free
only because nothing else takes i_mutex on more than one non-directory inode
and does that as the innermost lock.  Start calling it for directories
(or have somebody cut'n'paste it and use it for directories) and you've got
a nice, shiny deadlock...  BTW, is ordering really needed in
double_down_write_data_sem()?  IOW, can we get contention between several
callers of that thing?

>From my reading of that code, all call chains leading to this sucker
are guaranteed to already hold i_mutex on both inodes.  If that is true,
we don't need any ordering in double_down_write_data_sem() at all...

AFAICS, the minimal fix is to move file_remove_suid() call into
ext4_move_extents(), just after we have acquired i_mutex in there.
Moreover, I think it should be done to *both* files, since both have
contents modified.  And I see no point in making that conditional...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: add missing mutex lock arround notify_change
  2011-12-17 21:41   ` Al Viro
  2011-12-17 22:10     ` Al Viro
@ 2011-12-19  1:43     ` Dave Chinner
  2011-12-19  2:03       ` Al Viro
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2011-12-19  1:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Djalal Harouni, Hugh Dickins, Minchan Kim,
	KAMEZAWA Hiroyuki, Wu Fengguang, linux-mm, linux-kernel,
	J. Bruce Fields, Neil Brown, Mikulas Patocka, Christoph Hellwig

On Sat, Dec 17, 2011 at 09:41:37PM +0000, Al Viro wrote:
> On Fri, Dec 16, 2011 at 12:55:56PM -0800, Andrew Morton wrote:
> 
....
> 
> We have a shitload of deadlocks on very common paths with that patch.  What
> of the paths that do lead to file_remove_suid() without i_mutex?
> *	xfs_file_aio_write_checks(): we drop i_mutex (via xfs_rw_iunlock())
> just before calling file_remove_suid().  Racy, the fix is obvious - move
> file_remove_suid() call before unlocking.

Not exactly. xfs_rw_iunlock() is not doing what you think it's doing
there.....

> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 753ed9b..33705b1 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -750,17 +750,16 @@ restart:
>  		*new_sizep = new_size;
>  	}
>  
> -	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
> -	if (error)
> -		return error;
> -
>  	/*
>  	 * If we're writing the file then make sure to clear the setuid and
>  	 * setgid bits if the process is not being run by root.  This keeps
>  	 * people from modifying setuid and setgid binaries.
>  	 */
> -	return file_remove_suid(file);
> +	if (!error)
> +		error = file_remove_suid(file);
>  
> +	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
                               ^^^^^
> +	return error;

Wrong lock.  That's dropping the internal XFS inode metadata lock,
but the VFS i_mutex is associated with the internal XFS inode IO
lock, which is accessed via XFS_IOLOCK_*. Only if we take the iolock
via XFS_IOLOCK_EXCL do we actually take the i_mutex.

Now it gets complex. For buffered IO, we are guaranteed to already
be holding the i_mutex because we do:

        *iolock = XFS_IOLOCK_EXCL;
        xfs_rw_ilock(ip, *iolock);

        ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);

So that is safe and non-racy right now.

For direct IO, however, we don't always take the IOLOCK exclusively.
Indeed, we try really, really hard not to do this so we can do
concurrent reads and writes to the inode, and that results
in a bunch of lock juggling when we actually need the IOLOCK
exclusive (like in xfs_file_aio_write_checks()). It sounds like we
need to know if we are going to have to remove the SUID bit ahead of
time so that we can  take the correct lock up front. I haven't
looked at what is needed to do that yet.

As it is, Christoph has a patch set out that I've already reviewed
for 3.3 that significantly changes the logic and flow of the locking
through this path, so we probably should fix this in that series as
for most applications it is already OK and non-racy.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: add missing mutex lock arround notify_change
  2011-12-19  1:43     ` Dave Chinner
@ 2011-12-19  2:03       ` Al Viro
  2011-12-19  2:06         ` Al Viro
  2011-12-19  4:22         ` Dave Chinner
  0 siblings, 2 replies; 12+ messages in thread
From: Al Viro @ 2011-12-19  2:03 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andrew Morton, Djalal Harouni, Hugh Dickins, Minchan Kim,
	KAMEZAWA Hiroyuki, Wu Fengguang, linux-mm, linux-kernel,
	J. Bruce Fields, Neil Brown, Mikulas Patocka, Christoph Hellwig

On Mon, Dec 19, 2011 at 12:43:43PM +1100, Dave Chinner wrote:
> > We have a shitload of deadlocks on very common paths with that patch.  What
> > of the paths that do lead to file_remove_suid() without i_mutex?
> > *	xfs_file_aio_write_checks(): we drop i_mutex (via xfs_rw_iunlock())
> > just before calling file_remove_suid().  Racy, the fix is obvious - move
> > file_remove_suid() call before unlocking.
> 
> Not exactly. xfs_rw_iunlock() is not doing what you think it's doing
> there.....

Huh?  It is called as 

> > -	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);

and thus in
static inline void
xfs_rw_iunlock(
        struct xfs_inode        *ip,
        int                     type)
{
        xfs_iunlock(ip, type);
        if (type & XFS_IOLOCK_EXCL)
                mutex_unlock(&VFS_I(ip)->i_mutex);
}
we are guaranteed to hit i_mutex.  

> Wrong lock.  That's dropping the internal XFS inode metadata lock,
> but the VFS i_mutex is associated with the internal XFS inode IO
> lock, which is accessed via XFS_IOLOCK_*. Only if we take the iolock
> via XFS_IOLOCK_EXCL do we actually take the i_mutex.

> Now it gets complex. For buffered IO, we are guaranteed to already
> be holding the i_mutex because we do:
> 
>         *iolock = XFS_IOLOCK_EXCL;
>         xfs_rw_ilock(ip, *iolock);
> 
>         ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
> 
> So that is safe and non-racy right now.

No, it is not - we *drop* it before calling file_remove_suid().  Explicitly.
Again, look at that xfs_rw_iunlock() call there - it does drop i_mutex
(which is to say, you'd better have taken it prior to that, or you have
far worse problems).

> For direct IO, however, we don't always take the IOLOCK exclusively.
> Indeed, we try really, really hard not to do this so we can do
> concurrent reads and writes to the inode, and that results
> in a bunch of lock juggling when we actually need the IOLOCK
> exclusive (like in xfs_file_aio_write_checks()). It sounds like we
> need to know if we are going to have to remove the SUID bit ahead of
> time so that we can  take the correct lock up front. I haven't
> looked at what is needed to do that yet.

OK, I'm definitely missing something.  The very first thing
xfs_file_aio_write_checks() does is
        xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
which really makes me wonder how the hell does that manage to avoid an
instant deadlock in case of call via xfs_file_buffered_aio_write()
where we have:
        struct address_space    *mapping = file->f_mapping;
        struct inode            *inode = mapping->host;
        struct xfs_inode        *ip = XFS_I(inode);
        *iolock = XFS_IOLOCK_EXCL;
        xfs_rw_ilock(ip, *iolock);
        ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
which leads to
        struct inode            *inode = file->f_mapping->host;
        struct xfs_inode        *ip = XFS_I(inode);
(IOW, inode and ip are the same as in the caller) followed by
        xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
and with both xfs_rw_ilock() calls turning into
	mutex_lock(&VFS_I(ip)->i_mutex);
        xfs_ilock(ip, XFS_ILOCK_EXCL);
we ought to deadlock on that i_mutex.  What am I missing and how do we manage
to survive that?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: add missing mutex lock arround notify_change
  2011-12-19  2:03       ` Al Viro
@ 2011-12-19  2:06         ` Al Viro
  2011-12-19  5:07           ` Dave Chinner
  2011-12-19  4:22         ` Dave Chinner
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2011-12-19  2:06 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andrew Morton, Djalal Harouni, Hugh Dickins, Minchan Kim,
	KAMEZAWA Hiroyuki, Wu Fengguang, linux-mm, linux-kernel,
	J. Bruce Fields, Neil Brown, Mikulas Patocka, Christoph Hellwig

On Mon, Dec 19, 2011 at 02:03:40AM +0000, Al Viro wrote:

> OK, I'm definitely missing something.  The very first thing
> xfs_file_aio_write_checks() does is
>         xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
> which really makes me wonder how the hell does that manage to avoid an
> instant deadlock in case of call via xfs_file_buffered_aio_write()
> where we have:
>         struct address_space    *mapping = file->f_mapping;
>         struct inode            *inode = mapping->host;
>         struct xfs_inode        *ip = XFS_I(inode);
>         *iolock = XFS_IOLOCK_EXCL;
>         xfs_rw_ilock(ip, *iolock);
>         ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
> which leads to
>         struct inode            *inode = file->f_mapping->host;
>         struct xfs_inode        *ip = XFS_I(inode);
> (IOW, inode and ip are the same as in the caller) followed by
>         xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
> and with both xfs_rw_ilock() calls turning into
> 	mutex_lock(&VFS_I(ip)->i_mutex);
>         xfs_ilock(ip, XFS_ILOCK_EXCL);
> we ought to deadlock on that i_mutex.  What am I missing and how do we manage
> to survive that?

Arrrgh...  OK, I see...  What I missed is that XFS_IOLOCK_EXCL is not
XFS_ILOCK_EXCL.  Nice naming, that...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: add missing mutex lock arround notify_change
  2011-12-19  2:03       ` Al Viro
  2011-12-19  2:06         ` Al Viro
@ 2011-12-19  4:22         ` Dave Chinner
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2011-12-19  4:22 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Djalal Harouni, Hugh Dickins, Minchan Kim,
	KAMEZAWA Hiroyuki, Wu Fengguang, linux-mm, linux-kernel,
	J. Bruce Fields, Neil Brown, Mikulas Patocka, Christoph Hellwig

On Mon, Dec 19, 2011 at 02:03:40AM +0000, Al Viro wrote:
> On Mon, Dec 19, 2011 at 12:43:43PM +1100, Dave Chinner wrote:
> > > We have a shitload of deadlocks on very common paths with that patch.  What
> > > of the paths that do lead to file_remove_suid() without i_mutex?
> > > *	xfs_file_aio_write_checks(): we drop i_mutex (via xfs_rw_iunlock())
> > > just before calling file_remove_suid().  Racy, the fix is obvious - move
> > > file_remove_suid() call before unlocking.
> > 
> > Not exactly. xfs_rw_iunlock() is not doing what you think it's doing
> > there.....
> 
> Huh?  It is called as 
> 
> > > -	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
> 
> and thus in
> static inline void
> xfs_rw_iunlock(
>         struct xfs_inode        *ip,
>         int                     type)
> {
>         xfs_iunlock(ip, type);
>         if (type & XFS_IOLOCK_EXCL)
>                 mutex_unlock(&VFS_I(ip)->i_mutex);
> }
> we are guaranteed to hit i_mutex.  

XFS_ILOCK_EXCL != XFS_IOLOCK_EXCL.

There are 3 locks here, not 2 (outermost first):

	VFS_I(ip)->i_mutex (mutex)
	ip->i_iolock (rwsem)
	ip->i_ilock (rwsem)

xfs_ilock() maintains iolock vs ilock ordering for the entire of
XFS, while xfs_rw_ilock() maintains i_mutex vs iolock vs ilock
ordering for the IO path. it is all lockdep annotated, so if we make
a mistake, we know about it pretty quickly....

> > Wrong lock.  That's dropping the internal XFS inode metadata lock,
> > but the VFS i_mutex is associated with the internal XFS inode IO
> > lock, which is accessed via XFS_IOLOCK_*. Only if we take the iolock
> > via XFS_IOLOCK_EXCL do we actually take the i_mutex.
> 
> > Now it gets complex. For buffered IO, we are guaranteed to already
> > be holding the i_mutex because we do:
> > 
> >         *iolock = XFS_IOLOCK_EXCL;
> >         xfs_rw_ilock(ip, *iolock);
> > 
> >         ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
> > 
> > So that is safe and non-racy right now.
> 
> No, it is not - we *drop* it before calling file_remove_suid().  Explicitly.

We drop the innermost ip->i_ilock before calling file_remove_suid().
We are still holding ip->i_iolock in EXCL mode and the
VFS_I(ip)->i_mutex as well at this point.

> Again, look at that xfs_rw_iunlock() call there - it does drop i_mutex
> (which is to say, you'd better have taken it prior to that, or you have
> far worse problems).

> 
> > For direct IO, however, we don't always take the IOLOCK exclusively.
> > Indeed, we try really, really hard not to do this so we can do
> > concurrent reads and writes to the inode, and that results
> > in a bunch of lock juggling when we actually need the IOLOCK
> > exclusive (like in xfs_file_aio_write_checks()). It sounds like we
> > need to know if we are going to have to remove the SUID bit ahead of
> > time so that we can  take the correct lock up front. I haven't
> > looked at what is needed to do that yet.
> 
> OK, I'm definitely missing something.  The very first thing
> xfs_file_aio_write_checks() does is
>         xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
> which really makes me wonder how the hell does that manage to avoid an
> instant deadlock in case of call via xfs_file_buffered_aio_write()
> where we have:
>         struct address_space    *mapping = file->f_mapping;
>         struct inode            *inode = mapping->host;
>         struct xfs_inode        *ip = XFS_I(inode);
>         *iolock = XFS_IOLOCK_EXCL;
>         xfs_rw_ilock(ip, *iolock);
>         ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
> which leads to
>         struct inode            *inode = file->f_mapping->host;
>         struct xfs_inode        *ip = XFS_I(inode);
> (IOW, inode and ip are the same as in the caller) followed by
>         xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
> and with both xfs_rw_ilock() calls turning into
> 	mutex_lock(&VFS_I(ip)->i_mutex);
>         xfs_ilock(ip, XFS_ILOCK_EXCL);

> we ought to deadlock on that i_mutex.  What am I missing and how do we manage
> to survive that?

Translation of the buffered write IO path locking without wrappers:

	mutex_lock(&VFS_I(ip)->i_mutex);
	down_write_nested(&ip->i_iolock);
	xfs_file_aio_write_checks() {
		down_write_nested(&ip->i_ilock);
		.....
		up_write(&ip->i_ilock);
		file_remove_suid();
	}
	generic_file_buffered_write()
	up_write(&ip->i_iolock);
	mutex_unlock(&ip->i_iolock);

Locking is correctly ordered, and file_remove_suid() is called with
the i_mutex held.

The direct IO write fast path does this:

	down_read_nested(&ip->i_iolock);
	xfs_file_aio_write_checks() {
		down_write_nested(&ip->i_ilock);
		.....
		up_write(&ip->i_ilock);
		file_remove_suid();
	}
	generic_file_direct_write()
	up_read(&ip->i_iolock);

So isn't holding the i_mutex when file_remove_suid() is called. We
do conditionally call xfs_file_aio_write_checks() with the correct
locks held if we've had to flush the page cache, we are doing
unaligned IO or extending the file, in which case the locks taken
and the ordering is the same as for the buffered write IO path. We
can do exactly the same relock dance if we know we have to remove
the SUID bit before we actually do remove it...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: add missing mutex lock arround notify_change
  2011-12-19  2:06         ` Al Viro
@ 2011-12-19  5:07           ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2011-12-19  5:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Djalal Harouni, Hugh Dickins, Minchan Kim,
	KAMEZAWA Hiroyuki, Wu Fengguang, linux-mm, linux-kernel,
	J. Bruce Fields, Neil Brown, Mikulas Patocka, Christoph Hellwig

On Mon, Dec 19, 2011 at 02:06:37AM +0000, Al Viro wrote:
> On Mon, Dec 19, 2011 at 02:03:40AM +0000, Al Viro wrote:
> 
> > OK, I'm definitely missing something.  The very first thing
> > xfs_file_aio_write_checks() does is
> >         xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
> > which really makes me wonder how the hell does that manage to avoid an
> > instant deadlock in case of call via xfs_file_buffered_aio_write()
> > where we have:
> >         struct address_space    *mapping = file->f_mapping;
> >         struct inode            *inode = mapping->host;
> >         struct xfs_inode        *ip = XFS_I(inode);
> >         *iolock = XFS_IOLOCK_EXCL;
> >         xfs_rw_ilock(ip, *iolock);
> >         ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
> > which leads to
> >         struct inode            *inode = file->f_mapping->host;
> >         struct xfs_inode        *ip = XFS_I(inode);
> > (IOW, inode and ip are the same as in the caller) followed by
> >         xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
> > and with both xfs_rw_ilock() calls turning into
> > 	mutex_lock(&VFS_I(ip)->i_mutex);
> >         xfs_ilock(ip, XFS_ILOCK_EXCL);
> > we ought to deadlock on that i_mutex.  What am I missing and how do we manage
> > to survive that?
> 
> Arrrgh...  OK, I see...  What I missed is that XFS_IOLOCK_EXCL is not
> XFS_ILOCK_EXCL.  Nice naming, that...

Been that way for 15 years. :/

However, the naming makes sense to me - the IO lock is for
serialising IO operations on the inode, while the I lock is for
serialising metadata operations on the inode. I guess I'm used to
it, though, so I'll conceed that it might look strange/confusing to
someone who only occassionally looks at the internal XFS locking
code....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: add missing mutex lock arround notify_change
  2011-12-17 22:10     ` Al Viro
@ 2011-12-20 22:09       ` Ted Ts'o
  2011-12-20 22:45         ` Ted Ts'o
  0 siblings, 1 reply; 12+ messages in thread
From: Ted Ts'o @ 2011-12-20 22:09 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Djalal Harouni, Hugh Dickins, Minchan Kim,
	KAMEZAWA Hiroyuki, Wu Fengguang, linux-mm, linux-kernel,
	J. Bruce Fields, Neil Brown, Mikulas Patocka, Christoph Hellwig,
	linux-ext4

On Sat, Dec 17, 2011 at 10:10:28PM +0000, Al Viro wrote:
> 
> Oh, for fsck sake...  People, this is *obviously* broken - if nothing else,
> removing suid after modifying the file contents is too late.  Moreover,
> this mext_inode_double_lock() thing is asking for trouble; it's deadlock-free
> only because nothing else takes i_mutex on more than one non-directory inode
> and does that as the innermost lock.

Well, we need to define *some* kind of lock ordering for i_mutex
belonging to regular files within a single file system, and ordering
them by inode number seemed to make the most amount of sense.  If it
turns out some other routine needs to do i_mutex locking of regular
files with some other lock ordering, we're certainly open to using
something else.

> BTW, is ordering really needed in
> double_down_write_data_sem()?  IOW, can we get contention between several
> callers of that thing?
>
> From my reading of that code, all call chains leading to this sucker
> are guaranteed to already hold i_mutex on both inodes.  If that is true,
> we don't need any ordering in double_down_write_data_sem() at all...

Yes, you're right, the ordering in double_down_write_data_sem() can go
away; it's harmless, and doesn't cost much, but it's strictly speaking
not necessary.

> AFAICS, the minimal fix is to move file_remove_suid() call into
> ext4_move_extents(), just after we have acquired i_mutex in there.
> Moreover, I think it should be done to *both* files, since both have
> contents modified.  And I see no point in making that conditional...

Actually, we're not modifying the contents of the file that is being
defragged, only the donor file, so there shouldn't be any need to nuke
the suid flag for the target file, just the donor.  But yes, we should
move the call into ext4_move_extents(), and since the donor file
should never have the suid flag on it anyway (unless someone is trying
to play tricks on us), the conditional shouldn't be necessary.  

				- Ted

P.S.  Maybe it would be a good idea to add a mention of the fact that
file_remove_suid() needs i_mutex, either in mm/filemap.c as a comment,
or in Documentation/filesystems/vfs.txt, or both?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: add missing mutex lock arround notify_change
  2011-12-20 22:09       ` Ted Ts'o
@ 2011-12-20 22:45         ` Ted Ts'o
  0 siblings, 0 replies; 12+ messages in thread
From: Ted Ts'o @ 2011-12-20 22:45 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Djalal Harouni, Hugh Dickins, Minchan Kim,
	KAMEZAWA Hiroyuki, Wu Fengguang, linux-mm, linux-kernel,
	J. Bruce Fields, Neil Brown, Mikulas Patocka, Christoph Hellwig,
	linux-ext4

I just took a closer look, and we don't need to take immediate action;
there is no security issue here were someone could modify a writable
suid file as I had originally feared.  It's not as obvious as it could
be because of how the code is broken up, but in mext_check_arguments()
in fs/ext4/move_extent.c, we return with an error if the donor file
has the SUID or SGID bit set, so we'll never actually end up calling
file_remove_suid().  So in fact the right patch is just to remove the
call to file_remove_suid() altogether.

						- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-12-20 22:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-16 11:25 [PATCH] mm: add missing mutex lock arround notify_change Djalal Harouni
2011-12-16 20:55 ` Andrew Morton
2011-12-16 21:54   ` Djalal Harouni
2011-12-17 21:41   ` Al Viro
2011-12-17 22:10     ` Al Viro
2011-12-20 22:09       ` Ted Ts'o
2011-12-20 22:45         ` Ted Ts'o
2011-12-19  1:43     ` Dave Chinner
2011-12-19  2:03       ` Al Viro
2011-12-19  2:06         ` Al Viro
2011-12-19  5:07           ` Dave Chinner
2011-12-19  4:22         ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).