* [PATCH 0/2] splice: i_mutex vs splice write deadlock @ 2011-07-18 4:04 Dave Chinner 2011-07-18 4:04 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Dave Chinner @ 2011-07-18 4:04 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-kernel, xfs generic_file_splice_write() takes the inode->i_mutex after the filesystem has taken whatever locks it needs to ensure sanity. however, this typically violates the locking order of filesystems with their own locks in that the order is usually i_mutex -> filesystem lock. XFS is such a case, and generic_file_splice_write() is generating lockdep warnings because of lock inversions between the inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a reported case of fio causing a deadlock when it mixes IO types (e.g. splice vs direct IO). This patch set introduces generic_file_splice_write_unlocked() and factors the code such that __generic_file_splice_write() will only lock the i_mutex if called from the locked variant. The second patch modifies XFS to use the new function. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] vfs: split generic splice code from i_mutex locking 2011-07-18 4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner @ 2011-07-18 4:04 ` Dave Chinner 2011-07-18 4:04 ` [PATCH 2/2] xfs: fix splice/direct-IO deadlock Dave Chinner 2011-07-19 3:10 ` [PATCH 0/2] splice: i_mutex vs splice write deadlock Christoph Hellwig 2 siblings, 0 replies; 9+ messages in thread From: Dave Chinner @ 2011-07-18 4:04 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-kernel, xfs From: Dave Chinner <dchinner@redhat.com> XFS holds locks that should be nested inside the inode->i_mutex when generic_file_splice_write is called. This function takes the i_mutex, and so we get a lock inversion that triggers lockdep warnings and has been found to cause real deadlocks. XFS does not need the splice code to take the i_mutex to do the page cache manipulation, so add a new function generic_file_splice_write_unlocked() that avoids the locking of the i_mutex for XFS to call. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/splice.c | 39 +++++++++++++++++++++++++++++++++++---- include/linux/fs.h | 2 ++ 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index aa866d3..c15137d 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -980,8 +980,9 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out, * */ ssize_t -generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, - loff_t *ppos, size_t len, unsigned int flags) +__generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags, + int need_imutex) { struct address_space *mapping = out->f_mapping; struct inode *inode = mapping->host; @@ -1001,13 +1002,15 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, if (ret <= 0) break; - mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); + if (need_imutex) + mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); ret = file_remove_suid(out); if (!ret) { file_update_time(out); ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file); } - mutex_unlock(&inode->i_mutex); + if (need_imutex) + mutex_unlock(&inode->i_mutex); } while (ret > 0); splice_from_pipe_end(pipe, &sd); @@ -1033,8 +1036,36 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, return ret; } +/** + * generic_file_splice_write - splice data from a pipe to a file + * @pipe: pipe info + * @out: file to write to + * @ppos: position in @out + * @len: number of bytes to splice + * @flags: splice modifier flags + * + * Description: + * Will either move or copy pages (determined by @flags options) from + * the given pipe inode to the given file. + * + */ +ssize_t +generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags) +{ + return __generic_file_splice_write(pipe, out, ppos, len, flags, 1); +} EXPORT_SYMBOL(generic_file_splice_write); +ssize_t +generic_file_splice_write_unlocked(struct pipe_inode_info *pipe, + struct file *out, loff_t *ppos, + size_t len, unsigned int flags) +{ + return __generic_file_splice_write(pipe, out, ppos, len, flags, 0); +} +EXPORT_SYMBOL(generic_file_splice_write_unlocked); + static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd) { diff --git a/include/linux/fs.h b/include/linux/fs.h index 54c49e5..3a8b984 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2368,6 +2368,8 @@ extern ssize_t default_file_splice_read(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); extern ssize_t generic_file_splice_write(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); +extern ssize_t generic_file_splice_write_unlocked(struct pipe_inode_info *, + struct file *, loff_t *, size_t, unsigned int); extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out, loff_t *, size_t len, unsigned int flags); extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, -- 1.7.5.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] xfs: fix splice/direct-IO deadlock 2011-07-18 4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner 2011-07-18 4:04 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner @ 2011-07-18 4:04 ` Dave Chinner 2011-07-19 3:10 ` [PATCH 0/2] splice: i_mutex vs splice write deadlock Christoph Hellwig 2 siblings, 0 replies; 9+ messages in thread From: Dave Chinner @ 2011-07-18 4:04 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-kernel, xfs From: Dave Chinner <dchinner@redhat.com> lockdep reports splice vs direct-io write lock inversions due to generic_file_splice_write() taking the inode->i_mutex inside XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can deadlock. Use generic_file_splice_write_unlocked() because the i_mutex does not need to be held over the operations that are done in the XFS splice write path. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/linux-2.6/xfs_file.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c index f51a384..1e641e6 100644 --- a/fs/xfs/linux-2.6/xfs_file.c +++ b/fs/xfs/linux-2.6/xfs_file.c @@ -463,7 +463,8 @@ xfs_file_splice_write( trace_xfs_file_splice_write(ip, count, *ppos, ioflags); - ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags); + ret = generic_file_splice_write_unlocked(pipe, outfilp, ppos, + count, flags); xfs_aio_write_isize_update(inode, ppos, ret); xfs_aio_write_newsize_update(ip); -- 1.7.5.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] splice: i_mutex vs splice write deadlock 2011-07-18 4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner 2011-07-18 4:04 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner 2011-07-18 4:04 ` [PATCH 2/2] xfs: fix splice/direct-IO deadlock Dave Chinner @ 2011-07-19 3:10 ` Christoph Hellwig 2011-07-19 4:07 ` Dave Chinner 2 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2011-07-19 3:10 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, xfs I don't really like this very much. Not taking the i_mutex at all makes the splice_write method in XFS use different locking than everyone else, and different from the normal XFS write path. For example ocfs2 which has the same locking issues just has an own implementation of the splice_write method, which isn't too nice but at least marginally better. I think the right fix for both xfs and ocfs2 would be to have a generic_file_splice_write variant that takes an "actor" function pointer, which defaults to a smaller wrapper around file_remove_suid, file_update_time and splice_from_pipe_feed, and then XFS and ocfs2 can provide their own actors that add the additional locking. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] splice: i_mutex vs splice write deadlock 2011-07-19 3:10 ` [PATCH 0/2] splice: i_mutex vs splice write deadlock Christoph Hellwig @ 2011-07-19 4:07 ` Dave Chinner 0 siblings, 0 replies; 9+ messages in thread From: Dave Chinner @ 2011-07-19 4:07 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-fsdevel, linux-kernel, xfs On Mon, Jul 18, 2011 at 11:10:03PM -0400, Christoph Hellwig wrote: > I don't really like this very much. Not taking the i_mutex at all > makes the splice_write method in XFS use different locking than > everyone else, and different from the normal XFS write path. > > For example ocfs2 which has the same locking issues just has an > own implementation of the splice_write method, which isn't > too nice but at least marginally better. I think the right > fix for both xfs and ocfs2 would be to have a generic_file_splice_write > variant that takes an "actor" function pointer, which defaults to > a smaller wrapper around file_remove_suid, file_update_time and > splice_from_pipe_feed, and then XFS and ocfs2 can provide their > own actors that add the additional locking. Yeah I thought about doing that, but wanted to try a simpler version first. I'll code up the actor variant. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/2] splice: i_mutex vs splice write deadlock V2 @ 2011-08-08 6:45 Dave Chinner 2011-08-08 6:45 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner 0 siblings, 1 reply; 9+ messages in thread From: Dave Chinner @ 2011-08-08 6:45 UTC (permalink / raw) To: xfs; +Cc: linux-fsdevel generic_file_splice_write() takes the inode->i_mutex after the filesystem has taken whatever locks it needs to ensure sanity. however, this typically violates the locking order of filesystems with their own locks in that the order is usually i_mutex -> filesystem lock. XFS is such a case, and generic_file_splice_write() is generating lockdep warnings because of lock inversions between the inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a reported case of fio causing a deadlock when it mixes IO types (e.g. splice vs direct IO). Version 2: - add a new function to take an actor to do the work of splicing the data to the file. Convert generic_file_splice_write to use this, and make XFS call it with a different actor fuction that avoids the i_mutex. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] vfs: split generic splice code from i_mutex locking 2011-08-08 6:45 [PATCH 0/2] splice: i_mutex vs splice write deadlock V2 Dave Chinner @ 2011-08-08 6:45 ` Dave Chinner 2011-08-09 11:36 ` Jan Kara 2011-08-10 10:12 ` Christoph Hellwig 0 siblings, 2 replies; 9+ messages in thread From: Dave Chinner @ 2011-08-08 6:45 UTC (permalink / raw) To: xfs; +Cc: linux-fsdevel From: Dave Chinner <dchinner@redhat.com> XFS holds locks that should be nested inside the inode->i_mutex when generic_file_splice_write is called. This function takes the i_mutex, and so we get a lock inversion that triggers lockdep warnings and has been found to cause real deadlocks. XFS does not need the splice code to take the i_mutex to do the page cache manipulation, so modify the generic splice code to use an actor function for the code that currently requires the i_mutex to be taken. Convert generic_file_splice_write() to use this new interface supplying a generic actor function that performs the same actions as the existing code. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/splice.c | 58 ++++++++++++++++++++++++++++++++++++++---------- include/linux/splice.h | 5 ++++ 2 files changed, 51 insertions(+), 12 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index fa2defa..a02dddc 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -967,24 +967,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out, } /** - * generic_file_splice_write - splice data from a pipe to a file + * splice_write_to_file - splice data from a pipe to a file * @pipe: pipe info * @out: file to write to * @ppos: position in @out * @len: number of bytes to splice * @flags: splice modifier flags + * @actor: worker that does the splicing from the pipe to the file * * Description: * Will either move or copy pages (determined by @flags options) from * the given pipe inode to the given file. * */ -ssize_t -generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, - loff_t *ppos, size_t len, unsigned int flags) +ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags, + splice_write_actor actor) { struct address_space *mapping = out->f_mapping; - struct inode *inode = mapping->host; struct splice_desc sd = { .total_len = len, .flags = flags, @@ -1001,13 +1001,8 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, if (ret <= 0) break; - mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); - ret = file_remove_suid(out); - if (!ret) { - file_update_time(out); - ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file); - } - mutex_unlock(&inode->i_mutex); + ret = actor(pipe, &sd); + } while (ret > 0); splice_from_pipe_end(pipe, &sd); @@ -1032,7 +1027,46 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, return ret; } +EXPORT_SYMBOL(splice_write_to_file); +static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe, + struct splice_desc *sd) +{ + struct file *out = sd->u.file; + struct inode *inode = out->f_mapping->host; + ssize_t ret; + + mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); + ret = file_remove_suid(out); + if (!ret) { + file_update_time(out); + ret = splice_from_pipe_feed(pipe, sd, pipe_to_file); + } + mutex_unlock(&inode->i_mutex); + + return ret; +} + +/** + * generic_file_splice_write - splice data from a pipe to a file + * @pipe: pipe info + * @out: file to write to + * @ppos: position in @out + * @len: number of bytes to splice + * @flags: splice modifier flags + * + * Description: + * Will either move or copy pages (determined by @flags options) from + * the given pipe inode to the given file. + * + */ +ssize_t +generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags) +{ + return splice_write_to_file(pipe, out, ppos, len, flags, + generic_file_splice_write_actor); +} EXPORT_SYMBOL(generic_file_splice_write); static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf, diff --git a/include/linux/splice.h b/include/linux/splice.h index 26e5b61..bd14281 100644 --- a/include/linux/splice.h +++ b/include/linux/splice.h @@ -61,6 +61,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *, struct splice_desc *); typedef int (splice_direct_actor)(struct pipe_inode_info *, struct splice_desc *); +typedef ssize_t (splice_write_actor)(struct pipe_inode_info *, + struct splice_desc *); extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int, @@ -82,6 +84,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *, extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *, splice_direct_actor *); +extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *, + loff_t *, size_t, unsigned int, + splice_write_actor *); /* * for dynamic pipe sizing */ -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] vfs: split generic splice code from i_mutex locking 2011-08-08 6:45 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner @ 2011-08-09 11:36 ` Jan Kara 2011-08-10 10:12 ` Christoph Hellwig 1 sibling, 0 replies; 9+ messages in thread From: Jan Kara @ 2011-08-09 11:36 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-fsdevel On Mon 08-08-11 16:45:26, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > XFS holds locks that should be nested inside the inode->i_mutex when > generic_file_splice_write is called. This function takes the > i_mutex, and so we get a lock inversion that triggers lockdep > warnings and has been found to cause real deadlocks. > > XFS does not need the splice code to take the i_mutex to do the page > cache manipulation, so modify the generic splice code to use an > actor function for the code that currently requires the i_mutex to > be taken. Convert generic_file_splice_write() to use this new > interface supplying a generic actor function that performs the same > actions as the existing code. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> The patch looks good to me. Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/splice.c | 58 ++++++++++++++++++++++++++++++++++++++---------- > include/linux/splice.h | 5 ++++ > 2 files changed, 51 insertions(+), 12 deletions(-) > > diff --git a/fs/splice.c b/fs/splice.c > index fa2defa..a02dddc 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -967,24 +967,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out, > } > > /** > - * generic_file_splice_write - splice data from a pipe to a file > + * splice_write_to_file - splice data from a pipe to a file > * @pipe: pipe info > * @out: file to write to > * @ppos: position in @out > * @len: number of bytes to splice > * @flags: splice modifier flags > + * @actor: worker that does the splicing from the pipe to the file > * > * Description: > * Will either move or copy pages (determined by @flags options) from > * the given pipe inode to the given file. > * > */ > -ssize_t > -generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > - loff_t *ppos, size_t len, unsigned int flags) > +ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out, > + loff_t *ppos, size_t len, unsigned int flags, > + splice_write_actor actor) > { > struct address_space *mapping = out->f_mapping; > - struct inode *inode = mapping->host; > struct splice_desc sd = { > .total_len = len, > .flags = flags, > @@ -1001,13 +1001,8 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > if (ret <= 0) > break; > > - mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); > - ret = file_remove_suid(out); > - if (!ret) { > - file_update_time(out); > - ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file); > - } > - mutex_unlock(&inode->i_mutex); > + ret = actor(pipe, &sd); > + > } while (ret > 0); > splice_from_pipe_end(pipe, &sd); > > @@ -1032,7 +1027,46 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > > return ret; > } > +EXPORT_SYMBOL(splice_write_to_file); > > +static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe, > + struct splice_desc *sd) > +{ > + struct file *out = sd->u.file; > + struct inode *inode = out->f_mapping->host; > + ssize_t ret; > + > + mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); > + ret = file_remove_suid(out); > + if (!ret) { > + file_update_time(out); > + ret = splice_from_pipe_feed(pipe, sd, pipe_to_file); > + } > + mutex_unlock(&inode->i_mutex); > + > + return ret; > +} > + > +/** > + * generic_file_splice_write - splice data from a pipe to a file > + * @pipe: pipe info > + * @out: file to write to > + * @ppos: position in @out > + * @len: number of bytes to splice > + * @flags: splice modifier flags > + * > + * Description: > + * Will either move or copy pages (determined by @flags options) from > + * the given pipe inode to the given file. > + * > + */ > +ssize_t > +generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > + loff_t *ppos, size_t len, unsigned int flags) > +{ > + return splice_write_to_file(pipe, out, ppos, len, flags, > + generic_file_splice_write_actor); > +} > EXPORT_SYMBOL(generic_file_splice_write); > > static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > diff --git a/include/linux/splice.h b/include/linux/splice.h > index 26e5b61..bd14281 100644 > --- a/include/linux/splice.h > +++ b/include/linux/splice.h > @@ -61,6 +61,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *, > struct splice_desc *); > typedef int (splice_direct_actor)(struct pipe_inode_info *, > struct splice_desc *); > +typedef ssize_t (splice_write_actor)(struct pipe_inode_info *, > + struct splice_desc *); > > extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *, > loff_t *, size_t, unsigned int, > @@ -82,6 +84,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *, > extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *, > splice_direct_actor *); > > +extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *, > + loff_t *, size_t, unsigned int, > + splice_write_actor *); > /* > * for dynamic pipe sizing > */ > -- > 1.7.5.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] vfs: split generic splice code from i_mutex locking 2011-08-08 6:45 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner 2011-08-09 11:36 ` Jan Kara @ 2011-08-10 10:12 ` Christoph Hellwig 1 sibling, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2011-08-10 10:12 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-fsdevel Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/2] splice: fix direct IO/splice deadlock @ 2012-11-28 2:12 Dave Chinner 2012-11-28 2:12 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner 0 siblings, 1 reply; 9+ messages in thread From: Dave Chinner @ 2012-11-28 2:12 UTC (permalink / raw) To: linux-fsdevel; +Cc: xfs Hi Folks, These two patches have been sitting in my tree for some time. I think I've even posted them before. Basically, XFS can deadlock when you use splice and direct IO on the same file concurrently because the splice write inverts the locking order of the i_mutex and the xfs inode i_iolock. The first patch moves the guts of the i_mutex protected region of the splice write to an actor function, and the second uses this structure to enable XFS to provide an actor that uses the correct locking order and hence avoid the deadlock. Comments? Cheers, Dave. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] vfs: split generic splice code from i_mutex locking 2012-11-28 2:12 [PATCH 0/2] splice: fix direct IO/splice deadlock Dave Chinner @ 2012-11-28 2:12 ` Dave Chinner 0 siblings, 0 replies; 9+ messages in thread From: Dave Chinner @ 2012-11-28 2:12 UTC (permalink / raw) To: linux-fsdevel; +Cc: xfs From: Dave Chinner <dchinner@redhat.com> XFS holds locks that should be nested inside the inode->i_mutex when generic_file_splice_write is called. This function takes the i_mutex, and so we get a lock inversion that triggers lockdep warnings and has been found to cause real deadlocks. XFS does not need the splice code to take the i_mutex to do the page cache manipulation, so modify the generic splice code to use an actor function for the code that currently requires the i_mutex to be taken. Convert generic_file_splice_write() to use this new interface supplying a generic actor function that performs the same actions as the existing code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> --- fs/splice.c | 64 ++++++++++++++++++++++++++++++++++++------------ include/linux/splice.h | 5 ++++ 2 files changed, 53 insertions(+), 16 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 13e5b47..66d4f24 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -970,24 +970,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out, } /** - * generic_file_splice_write - splice data from a pipe to a file + * splice_write_to_file - splice data from a pipe to a file * @pipe: pipe info * @out: file to write to * @ppos: position in @out * @len: number of bytes to splice * @flags: splice modifier flags + * @actor: worker that does the splicing from the pipe to the file * * Description: * Will either move or copy pages (determined by @flags options) from * the given pipe inode to the given file. * */ -ssize_t -generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, - loff_t *ppos, size_t len, unsigned int flags) +ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags, + splice_write_actor actor) { struct address_space *mapping = out->f_mapping; - struct inode *inode = mapping->host; struct splice_desc sd = { .total_len = len, .flags = flags, @@ -996,7 +996,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, }; ssize_t ret; - sb_start_write(inode->i_sb); + sb_start_write(mapping->host->i_sb); pipe_lock(pipe); @@ -1006,15 +1006,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, if (ret <= 0) break; - mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); - ret = file_remove_suid(out); - if (!ret) { - ret = file_update_time(out); - if (!ret) - ret = splice_from_pipe_feed(pipe, &sd, - pipe_to_file); - } - mutex_unlock(&inode->i_mutex); + ret = actor(pipe, &sd); } while (ret > 0); splice_from_pipe_end(pipe, &sd); @@ -1036,11 +1028,51 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, *ppos += ret; balance_dirty_pages_ratelimited_nr(mapping, nr_pages); } - sb_end_write(inode->i_sb); + sb_end_write(mapping->host->i_sb); return ret; } +EXPORT_SYMBOL(splice_write_to_file); +static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe, + struct splice_desc *sd) +{ + struct file *out = sd->u.file; + struct inode *inode = out->f_mapping->host; + ssize_t ret; + + mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); + ret = file_remove_suid(out); + if (!ret) { + ret = file_update_time(out); + if (!ret) + ret = splice_from_pipe_feed(pipe, sd, pipe_to_file); + } + mutex_unlock(&inode->i_mutex); + + return ret; +} + +/** + * generic_file_splice_write - splice data from a pipe to a file + * @pipe: pipe info + * @out: file to write to + * @ppos: position in @out + * @len: number of bytes to splice + * @flags: splice modifier flags + * + * Description: + * Will either move or copy pages (determined by @flags options) from + * the given pipe inode to the given file. + * + */ +ssize_t +generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags) +{ + return splice_write_to_file(pipe, out, ppos, len, flags, + generic_file_splice_write_actor); +} EXPORT_SYMBOL(generic_file_splice_write); static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf, diff --git a/include/linux/splice.h b/include/linux/splice.h index 09a545a..44ce6f6b 100644 --- a/include/linux/splice.h +++ b/include/linux/splice.h @@ -62,6 +62,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *, struct splice_desc *); typedef int (splice_direct_actor)(struct pipe_inode_info *, struct splice_desc *); +typedef ssize_t (splice_write_actor)(struct pipe_inode_info *, + struct splice_desc *); extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int, @@ -83,6 +85,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *, extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *, splice_direct_actor *); +extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *, + loff_t *, size_t, unsigned int, + splice_write_actor *); /* * for dynamic pipe sizing */ -- 1.7.10 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-11-28 2:13 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-07-18 4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner 2011-07-18 4:04 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner 2011-07-18 4:04 ` [PATCH 2/2] xfs: fix splice/direct-IO deadlock Dave Chinner 2011-07-19 3:10 ` [PATCH 0/2] splice: i_mutex vs splice write deadlock Christoph Hellwig 2011-07-19 4:07 ` Dave Chinner -- strict thread matches above, loose matches on Subject: below -- 2011-08-08 6:45 [PATCH 0/2] splice: i_mutex vs splice write deadlock V2 Dave Chinner 2011-08-08 6:45 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner 2011-08-09 11:36 ` Jan Kara 2011-08-10 10:12 ` Christoph Hellwig 2012-11-28 2:12 [PATCH 0/2] splice: fix direct IO/splice deadlock Dave Chinner 2012-11-28 2:12 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).