linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch|rfc] ext4: fix race between unwritten extent conversion and truncate
@ 2012-01-25 20:40 Jeff Moyer
  2012-01-26 17:02 ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Moyer @ 2012-01-25 20:40 UTC (permalink / raw)
  To: linux-ext4

Hi,

The following comment in ext4_end_io_dio caught my attention:

	/* XXX: probably should move into the real I/O completion handler */
        inode_dio_done(inode);

The truncate code takes i_mutex, then calls inode_dio_wait.  Because the
ext4 code path above will end up dropping the mutex before it is
reacquired by the worker thread that does the extent conversion, it
seems to me that the truncate can happen out of order.  Does it matter?
I really don't know, but I'm hoping someone here might.  ;-)  Anyway,
here's a patch I cooked up to address the issue.  I'm not sure what the
result of a race would even be, so I haven't really been able to test
that it works as intended.

So, comments?

Cheers,
Jeff

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 513004f..fc2a373 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2286,7 +2286,7 @@ extern void ext4_exit_pageio(void);
 extern void ext4_ioend_wait(struct inode *);
 extern void ext4_free_io_end(ext4_io_end_t *io);
 extern ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags);
-extern int ext4_end_io_nolock(ext4_io_end_t *io);
+extern int ext4_end_io_nolock(ext4_io_end_t *io, bool direct);
 extern void ext4_io_submit(struct ext4_io_submit *io);
 extern int ext4_bio_write_page(struct ext4_io_submit *io,
 			       struct page *page,
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 00a2cb7..f9aec9a 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -104,7 +104,7 @@ int ext4_flush_completed_IO(struct inode *inode)
 		 * queue work.
 		 */
 		spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
-		ret = ext4_end_io_nolock(io);
+		ret = ext4_end_io_nolock(io, false);
 		if (ret < 0)
 			ret2 = ret;
 		spin_lock_irqsave(&ei->i_completed_io_lock, flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index feaa82f..4e76c30 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2795,9 +2795,6 @@ out:
 
 	/* queue the work to convert unwritten extents to written */
 	queue_work(wq, &io_end->work);
-
-	/* XXX: probably should move into the real I/O completion handler */
-	inode_dio_done(inode);
 }
 
 static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 4758518..47c4a03 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -87,7 +87,7 @@ void ext4_free_io_end(ext4_io_end_t *io)
  * Called with inode->i_mutex; we depend on this when we manipulate
  * io->flag, since we could otherwise race with ext4_flush_completed_IO()
  */
-int ext4_end_io_nolock(ext4_io_end_t *io)
+int ext4_end_io_nolock(ext4_io_end_t *io, bool direct)
 {
 	struct inode *inode = io->inode;
 	loff_t offset = io->offset;
@@ -110,6 +110,8 @@ int ext4_end_io_nolock(ext4_io_end_t *io)
 	if (io->iocb)
 		aio_complete(io->iocb, io->result, 0);
 
+	if (direct)
+		inode_dio_done(inode);
 	/* Wake up anyone waiting on unwritten extent conversion */
 	if (atomic_dec_and_test(&EXT4_I(inode)->i_aiodio_unwritten))
 		wake_up_all(ext4_ioend_wq(io->inode));
@@ -152,7 +154,7 @@ static void ext4_end_io_work(struct work_struct *work)
 	}
 	list_del_init(&io->list);
 	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
-	(void) ext4_end_io_nolock(io);
+	(void) ext4_end_io_nolock(io, true);
 	mutex_unlock(&inode->i_mutex);
 free:
 	ext4_free_io_end(io);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [patch|rfc] ext4: fix race between unwritten extent conversion and truncate
  2012-01-25 20:40 [patch|rfc] ext4: fix race between unwritten extent conversion and truncate Jeff Moyer
@ 2012-01-26 17:02 ` Jan Kara
  2012-01-26 18:04   ` Jeff Moyer
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2012-01-26 17:02 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-ext4

  Hi,

On Wed 25-01-12 15:40:56, Jeff Moyer wrote:
> The following comment in ext4_end_io_dio caught my attention:
> 
> 	/* XXX: probably should move into the real I/O completion handler */
>         inode_dio_done(inode);
> 
> The truncate code takes i_mutex, then calls inode_dio_wait.  Because the
> ext4 code path above will end up dropping the mutex before it is
> reacquired by the worker thread that does the extent conversion, it
> seems to me that the truncate can happen out of order.  Does it matter?
> I really don't know, but I'm hoping someone here might.  ;-)  Anyway,
> here's a patch I cooked up to address the issue.  I'm not sure what the
> result of a race would even be, so I haven't really been able to test
> that it works as intended.
> 
> So, comments?
  Yeah, the race looks real. We won't probably crash but we will certainly
curse into the logs which isn't nice ;). Thanks for spotting this.

> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 513004f..fc2a373 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2286,7 +2286,7 @@ extern void ext4_exit_pageio(void);
>  extern void ext4_ioend_wait(struct inode *);
>  extern void ext4_free_io_end(ext4_io_end_t *io);
>  extern ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags);
> -extern int ext4_end_io_nolock(ext4_io_end_t *io);
> +extern int ext4_end_io_nolock(ext4_io_end_t *io, bool direct);
>  extern void ext4_io_submit(struct ext4_io_submit *io);
>  extern int ext4_bio_write_page(struct ext4_io_submit *io,
>  			       struct page *page,
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 00a2cb7..f9aec9a 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -104,7 +104,7 @@ int ext4_flush_completed_IO(struct inode *inode)
>  		 * queue work.
>  		 */
>  		spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> -		ret = ext4_end_io_nolock(io);
> +		ret = ext4_end_io_nolock(io, false);
  This is wrong. i_completed_io_list contains work items for both direct
and buffered IO. Just in ext4_flush_completed_IO() we process the list
synchronously while ext4_end_io_work() processes the list in the
background. So what you have to do is store in ext4_io_end_t whether the IO
was direct or not and then use that in ext4_end_io_nolock() function.

>  		if (ret < 0)
>  			ret2 = ret;
>  		spin_lock_irqsave(&ei->i_completed_io_lock, flags);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index feaa82f..4e76c30 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2795,9 +2795,6 @@ out:
>  
>  	/* queue the work to convert unwritten extents to written */
>  	queue_work(wq, &io_end->work);
> -
> -	/* XXX: probably should move into the real I/O completion handler */
> -	inode_dio_done(inode);
>  }
>  
>  static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 4758518..47c4a03 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -87,7 +87,7 @@ void ext4_free_io_end(ext4_io_end_t *io)
>   * Called with inode->i_mutex; we depend on this when we manipulate
>   * io->flag, since we could otherwise race with ext4_flush_completed_IO()
>   */
> -int ext4_end_io_nolock(ext4_io_end_t *io)
> +int ext4_end_io_nolock(ext4_io_end_t *io, bool direct)
>  {
>  	struct inode *inode = io->inode;
>  	loff_t offset = io->offset;
> @@ -110,6 +110,8 @@ int ext4_end_io_nolock(ext4_io_end_t *io)
>  	if (io->iocb)
>  		aio_complete(io->iocb, io->result, 0);
>  
> +	if (direct)
> +		inode_dio_done(inode);
>  	/* Wake up anyone waiting on unwritten extent conversion */
>  	if (atomic_dec_and_test(&EXT4_I(inode)->i_aiodio_unwritten))
>  		wake_up_all(ext4_ioend_wq(io->inode));
> @@ -152,7 +154,7 @@ static void ext4_end_io_work(struct work_struct *work)
>  	}
>  	list_del_init(&io->list);
>  	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> -	(void) ext4_end_io_nolock(io);
> +	(void) ext4_end_io_nolock(io, true);
>  	mutex_unlock(&inode->i_mutex);
>  free:
>  	ext4_free_io_end(io);

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch|rfc] ext4: fix race between unwritten extent conversion and truncate
  2012-01-26 17:02 ` Jan Kara
@ 2012-01-26 18:04   ` Jeff Moyer
  2012-01-26 20:10     ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Moyer @ 2012-01-26 18:04 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

Jan Kara <jack@suse.cz> writes:

>> --- a/fs/ext4/fsync.c
>> +++ b/fs/ext4/fsync.c
>> @@ -104,7 +104,7 @@ int ext4_flush_completed_IO(struct inode *inode)
>>  		 * queue work.
>>  		 */
>>  		spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
>> -		ret = ext4_end_io_nolock(io);
>> +		ret = ext4_end_io_nolock(io, false);
>   This is wrong. i_completed_io_list contains work items for both direct
> and buffered IO. Just in ext4_flush_completed_IO() we process the list
> synchronously while ext4_end_io_work() processes the list in the
> background. So what you have to do is store in ext4_io_end_t whether the IO
> was direct or not and then use that in ext4_end_io_nolock() function.

OK, thanks for the review.  Is the following what you had in mind?  If
so, I'll repost with a proper header.

Cheers,
Jeff

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 513004f..2d55d7c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -184,6 +184,7 @@ struct mpage_da_data {
 #define	EXT4_IO_END_UNWRITTEN	0x0001
 #define EXT4_IO_END_ERROR	0x0002
 #define EXT4_IO_END_QUEUED	0x0004
+#define EXT4_IO_END_DIRECT	0x0008
 
 struct ext4_io_page {
 	struct page	*p_page;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index feaa82f..f6dc02b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2795,9 +2795,6 @@ out:
 
 	/* queue the work to convert unwritten extents to written */
 	queue_work(wq, &io_end->work);
-
-	/* XXX: probably should move into the real I/O completion handler */
-	inode_dio_done(inode);
 }
 
 static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
@@ -2921,9 +2918,12 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 		iocb->private = NULL;
 		EXT4_I(inode)->cur_aio_dio = NULL;
 		if (!is_sync_kiocb(iocb)) {
-			iocb->private = ext4_init_io_end(inode, GFP_NOFS);
-			if (!iocb->private)
+			ext4_io_end_t *io_end =
+				ext4_init_io_end(inode, GFP_NOFS);
+			if (!io_end)
 				return -ENOMEM;
+			io_end->flag |= EXT4_IO_END_DIRECT;
+			iocb->private = io_end;
 			/*
 			 * we save the io structure for current async
 			 * direct IO, so that later ext4_map_blocks()
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 4758518..9e1b8eb 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -110,6 +110,8 @@ int ext4_end_io_nolock(ext4_io_end_t *io)
 	if (io->iocb)
 		aio_complete(io->iocb, io->result, 0);
 
+	if (io->flag & EXT4_IO_END_DIRECT)
+		inode_dio_done(inode);
 	/* Wake up anyone waiting on unwritten extent conversion */
 	if (atomic_dec_and_test(&EXT4_I(inode)->i_aiodio_unwritten))
 		wake_up_all(ext4_ioend_wq(io->inode));

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [patch|rfc] ext4: fix race between unwritten extent conversion and truncate
  2012-01-26 18:04   ` Jeff Moyer
@ 2012-01-26 20:10     ` Jan Kara
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2012-01-26 20:10 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Jan Kara, linux-ext4

On Thu 26-01-12 13:04:22, Jeff Moyer wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >> --- a/fs/ext4/fsync.c
> >> +++ b/fs/ext4/fsync.c
> >> @@ -104,7 +104,7 @@ int ext4_flush_completed_IO(struct inode *inode)
> >>  		 * queue work.
> >>  		 */
> >>  		spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> >> -		ret = ext4_end_io_nolock(io);
> >> +		ret = ext4_end_io_nolock(io, false);
> >   This is wrong. i_completed_io_list contains work items for both direct
> > and buffered IO. Just in ext4_flush_completed_IO() we process the list
> > synchronously while ext4_end_io_work() processes the list in the
> > background. So what you have to do is store in ext4_io_end_t whether the IO
> > was direct or not and then use that in ext4_end_io_nolock() function.
> 
> OK, thanks for the review.  Is the following what you had in mind?  If
> so, I'll repost with a proper header.
  Exactly. Thanks!

								Honza
> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 513004f..2d55d7c 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -184,6 +184,7 @@ struct mpage_da_data {
>  #define	EXT4_IO_END_UNWRITTEN	0x0001
>  #define EXT4_IO_END_ERROR	0x0002
>  #define EXT4_IO_END_QUEUED	0x0004
> +#define EXT4_IO_END_DIRECT	0x0008
>  
>  struct ext4_io_page {
>  	struct page	*p_page;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index feaa82f..f6dc02b 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2795,9 +2795,6 @@ out:
>  
>  	/* queue the work to convert unwritten extents to written */
>  	queue_work(wq, &io_end->work);
> -
> -	/* XXX: probably should move into the real I/O completion handler */
> -	inode_dio_done(inode);
>  }
>  
>  static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
> @@ -2921,9 +2918,12 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
>  		iocb->private = NULL;
>  		EXT4_I(inode)->cur_aio_dio = NULL;
>  		if (!is_sync_kiocb(iocb)) {
> -			iocb->private = ext4_init_io_end(inode, GFP_NOFS);
> -			if (!iocb->private)
> +			ext4_io_end_t *io_end =
> +				ext4_init_io_end(inode, GFP_NOFS);
> +			if (!io_end)
>  				return -ENOMEM;
> +			io_end->flag |= EXT4_IO_END_DIRECT;
> +			iocb->private = io_end;
>  			/*
>  			 * we save the io structure for current async
>  			 * direct IO, so that later ext4_map_blocks()
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 4758518..9e1b8eb 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -110,6 +110,8 @@ int ext4_end_io_nolock(ext4_io_end_t *io)
>  	if (io->iocb)
>  		aio_complete(io->iocb, io->result, 0);
>  
> +	if (io->flag & EXT4_IO_END_DIRECT)
> +		inode_dio_done(inode);
>  	/* Wake up anyone waiting on unwritten extent conversion */
>  	if (atomic_dec_and_test(&EXT4_I(inode)->i_aiodio_unwritten))
>  		wake_up_all(ext4_ioend_wq(io->inode));
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-01-26 20:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-25 20:40 [patch|rfc] ext4: fix race between unwritten extent conversion and truncate Jeff Moyer
2012-01-26 17:02 ` Jan Kara
2012-01-26 18:04   ` Jeff Moyer
2012-01-26 20:10     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).