public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
@ 2008-12-04  6:33 Lachlan McIlroy
  2008-12-08 22:51 ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Lachlan McIlroy @ 2008-12-04  6:33 UTC (permalink / raw)
  To: xfs-oss

ping.

(forwarding message since my mailer eats the patch when replying).

-------- Original Message --------
Subject: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI
Date: Mon, 22 Sep 2008 17:06:24 +1000
From: Lachlan McIlroy <lachlan@sgi.com>
Reply-To: lachlan@sgi.com
To: xfs-dev <xfs-dev@sgi.com>, xfs-oss <xfs@oss.sgi.com>

The iolock is dropped and re-acquired around the call to XFS_SEND_NAMESP().
While the iolock is released the file can become cached.  We then
'goto retry' and - if we are doing direct I/O - mapping->nrpages may now be
non zero but need_i_mutex will be zero and we will hit the WARN_ON().

Since we have dropped the I/O lock then the file size may have also changed
so what we need to do here is 'goto start' like we do for the XFS_SEND_DATA()
DMAPI event.

We also need to update the filesize before releasing the iolock so that
needs to be done before the XFS_SEND_NAMESP event.  If we drop the iolock
before setting the filesize we could race with a truncate.

--- a/fs/xfs/linux-2.6/xfs_lrw.c	2008-09-22 15:47:38.000000000 +1000
+++ b/fs/xfs/linux-2.6/xfs_lrw.c	2008-09-22 15:50:56.000000000 +1000
@@ -707,7 +707,6 @@ start:
		}
	}

-retry:
	/* We can write back this queue in page reclaim */
	current->backing_dev_info = mapping->backing_dev_info;

@@ -763,6 +762,17 @@ retry:
	if (ret == -EIOCBQUEUED && !(ioflags & IO_ISAIO))
		ret = wait_on_sync_kiocb(iocb);

+	isize = i_size_read(inode);
+	if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize))
+		*offset = isize;
+
+	if (*offset > xip->i_size) {
+		xfs_ilock(xip, XFS_ILOCK_EXCL);
+		if (*offset > xip->i_size)
+			xip->i_size = *offset;
+		xfs_iunlock(xip, XFS_ILOCK_EXCL);
+	}
+
	if (ret == -ENOSPC &&
	    DM_EVENT_ENABLED(xip, DM_EVENT_NOSPACE) && !(ioflags & IO_INVIS)) {
		xfs_iunlock(xip, iolock);
@@ -776,20 +786,7 @@ retry:
		xfs_ilock(xip, iolock);
		if (error)
			goto out_unlock_internal;
-		pos = xip->i_size;
-		ret = 0;
-		goto retry;
-	}
-
-	isize = i_size_read(inode);
-	if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize))
-		*offset = isize;
-
-	if (*offset > xip->i_size) {
-		xfs_ilock(xip, XFS_ILOCK_EXCL);
-		if (*offset > xip->i_size)
-			xip->i_size = *offset;
-		xfs_iunlock(xip, XFS_ILOCK_EXCL);
+		goto start;
	}

	error = -ret;


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
  2008-12-04  6:33 [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] Lachlan McIlroy
@ 2008-12-08 22:51 ` Christoph Hellwig
  2008-12-09  5:10   ` Lachlan McIlroy
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2008-12-08 22:51 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-oss

On Thu, Dec 04, 2008 at 05:33:21PM +1100, Lachlan McIlroy wrote:
> --- a/fs/xfs/linux-2.6/xfs_lrw.c	2008-09-22 15:47:38.000000000 +1000
> +++ b/fs/xfs/linux-2.6/xfs_lrw.c	2008-09-22 15:50:56.000000000 +1000
> @@ -707,7 +707,6 @@ start:
> 		}
> 	}
> 
> -retry:
> 	/* We can write back this queue in page reclaim */
> 	current->backing_dev_info = mapping->backing_dev_info;
> 
> @@ -763,6 +762,17 @@ retry:
> 	if (ret == -EIOCBQUEUED && !(ioflags & IO_ISAIO))
> 		ret = wait_on_sync_kiocb(iocb);
> 
> +	isize = i_size_read(inode);
> +	if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize))
> +		*offset = isize;
> +
> +	if (*offset > xip->i_size) {
> +		xfs_ilock(xip, XFS_ILOCK_EXCL);
> +		if (*offset > xip->i_size)
> +			xip->i_size = *offset;
> +		xfs_iunlock(xip, XFS_ILOCK_EXCL);
> +	}
> +
> 	if (ret == -ENOSPC &&
> 	    DM_EVENT_ENABLED(xip, DM_EVENT_NOSPACE) && !(ioflags & IO_INVIS)) {
> 		xfs_iunlock(xip, iolock);

Moving these updates to before the dmapi nospace callout provale doesn't
make any changes to the non-dmapi codepath, so good from that
perspective.  And as you say above it makes sense to have this update
before the dmapi callout.

> @@ -776,20 +786,7 @@ retry:
> 		xfs_ilock(xip, iolock);
> 		if (error)
> 			goto out_unlock_internal;
> -		pos = xip->i_size;
> -		ret = 0;
> -		goto retry;
> -	}
> -
> -	isize = i_size_read(inode);
> -	if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize))
> -		*offset = isize;
> -
> -	if (*offset > xip->i_size) {
> -		xfs_ilock(xip, XFS_ILOCK_EXCL);
> -		if (*offset > xip->i_size)
> -			xip->i_size = *offset;
> -		xfs_iunlock(xip, XFS_ILOCK_EXCL);
> +		goto start;

Again all this won't affect non-dmapi operations, so OK with my mainline
hat on.  Now if we check what start does over the old retry labels:

 - calls generic_write_checks.  This could and should redo checks based
   on the new inode size, ok.
 - dmapi write even - shouldn't happen because eventsent is non-zero,
   ok.
 - O_DIRECT alignment validation.  Superflous, but harmless, ok.
 - check for exclusive lock.  This is what you said you wanted, and
   indded due to the lock dropping we need it.  But why don't
   you duplicate this check in the dmapi case below so that we
   only have to go to start once instead of possibly twice?
 - i_new_size update - needed due to the possible i_size changes, ok
 - ichgtime - if time passed since the last time we might want to
   re-updated it, ok
 - zero_eof, ok
 - setuid clearing, superflous, but harmless.

So the patch looks good to me, but as mention above it might be possible
to optimize it a littler more.
  

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
  2008-12-08 22:51 ` Christoph Hellwig
@ 2008-12-09  5:10   ` Lachlan McIlroy
  2008-12-09  9:22     ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Lachlan McIlroy @ 2008-12-09  5:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs-oss

Christoph Hellwig wrote:
> On Thu, Dec 04, 2008 at 05:33:21PM +1100, Lachlan McIlroy wrote:
>> --- a/fs/xfs/linux-2.6/xfs_lrw.c	2008-09-22 15:47:38.000000000 +1000
>> +++ b/fs/xfs/linux-2.6/xfs_lrw.c	2008-09-22 15:50:56.000000000 +1000
>> @@ -707,7 +707,6 @@ start:
>> 		}
>> 	}
>>
>> -retry:
>> 	/* We can write back this queue in page reclaim */
>> 	current->backing_dev_info = mapping->backing_dev_info;
>>
>> @@ -763,6 +762,17 @@ retry:
>> 	if (ret == -EIOCBQUEUED && !(ioflags & IO_ISAIO))
>> 		ret = wait_on_sync_kiocb(iocb);
>>
>> +	isize = i_size_read(inode);
>> +	if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize))
>> +		*offset = isize;
>> +
>> +	if (*offset > xip->i_size) {
>> +		xfs_ilock(xip, XFS_ILOCK_EXCL);
>> +		if (*offset > xip->i_size)
>> +			xip->i_size = *offset;
>> +		xfs_iunlock(xip, XFS_ILOCK_EXCL);
>> +	}
>> +
>> 	if (ret == -ENOSPC &&
>> 	    DM_EVENT_ENABLED(xip, DM_EVENT_NOSPACE) && !(ioflags & IO_INVIS)) {
>> 		xfs_iunlock(xip, iolock);
> 
> Moving these updates to before the dmapi nospace callout provale doesn't
> make any changes to the non-dmapi codepath, so good from that
> perspective.  And as you say above it makes sense to have this update
> before the dmapi callout.
> 
>> @@ -776,20 +786,7 @@ retry:
>> 		xfs_ilock(xip, iolock);
>> 		if (error)
>> 			goto out_unlock_internal;
>> -		pos = xip->i_size;
>> -		ret = 0;
>> -		goto retry;
>> -	}
>> -
>> -	isize = i_size_read(inode);
>> -	if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize))
>> -		*offset = isize;
>> -
>> -	if (*offset > xip->i_size) {
>> -		xfs_ilock(xip, XFS_ILOCK_EXCL);
>> -		if (*offset > xip->i_size)
>> -			xip->i_size = *offset;
>> -		xfs_iunlock(xip, XFS_ILOCK_EXCL);
>> +		goto start;
> 
> Again all this won't affect non-dmapi operations, so OK with my mainline
> hat on.  Now if we check what start does over the old retry labels:
> 
>  - calls generic_write_checks.  This could and should redo checks based
>    on the new inode size, ok.
>  - dmapi write even - shouldn't happen because eventsent is non-zero,
>    ok.
>  - O_DIRECT alignment validation.  Superflous, but harmless, ok.
>  - check for exclusive lock.  This is what you said you wanted, and
>    indded due to the lock dropping we need it.  But why don't
>    you duplicate this check in the dmapi case below so that we
>    only have to go to start once instead of possibly twice?
Thanks for looking at this Christoph.

I'm not sure what you mean by duplicating the checks.  I assume you
mean this check:

		if (!need_i_mutex && (mapping->nrpages || pos > xip->i_size)) {
			xfs_iunlock(xip, XFS_ILOCK_EXCL|iolock);
			iolock = XFS_IOLOCK_EXCL;
			need_i_mutex = 1;
			mutex_lock(&inode->i_mutex);
			xfs_ilock(xip, XFS_ILOCK_EXCL|iolock);
			goto start;
		}

This check is there because the default case for direct I/O is to
acquire the iolock in shared mode.  If we have work to do which
requires the iolock to be held exclusive then drop the lock and get
it again.  Since we dropped the lock then restart.

In the dmapi post-write event it doesn't matter if we have the
iolock shared or exclusive - it will be dropped regardless so I
don't see how checking the state of the iolock will allow us to
avoid a restart.

>  - i_new_size update - needed due to the possible i_size changes, ok
>  - ichgtime - if time passed since the last time we might want to
>    re-updated it, ok
>  - zero_eof, ok
>  - setuid clearing, superflous, but harmless.
> 
> So the patch looks good to me, but as mention above it might be possible
> to optimize it a littler more.
>   
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
  2008-12-09  5:10   ` Lachlan McIlroy
@ 2008-12-09  9:22     ` Christoph Hellwig
  2008-12-22  8:53       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2008-12-09  9:22 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Christoph Hellwig, xfs-oss

On Tue, Dec 09, 2008 at 04:10:21PM +1100, Lachlan McIlroy wrote:
> Thanks for looking at this Christoph.
> 
> I'm not sure what you mean by duplicating the checks.  I assume you
> mean this check:
> 
> 		if (!need_i_mutex && (mapping->nrpages || pos > xip->i_size)) {
> 			xfs_iunlock(xip, XFS_ILOCK_EXCL|iolock);
> 			iolock = XFS_IOLOCK_EXCL;
> 			need_i_mutex = 1;
> 			mutex_lock(&inode->i_mutex);
> 			xfs_ilock(xip, XFS_ILOCK_EXCL|iolock);
> 			goto start;
> 		}

Yes.

> This check is there because the default case for direct I/O is to
> acquire the iolock in shared mode.  If we have work to do which
> requires the iolock to be held exclusive then drop the lock and get
> it again.  Since we dropped the lock then restart.
> 
> In the dmapi post-write event it doesn't matter if we have the
> iolock shared or exclusive - it will be dropped regardless so I
> don't see how checking the state of the iolock will allow us to
> avoid a restart.

All very true, but it doesn't matter :)  When you do the goto start
after the dmapi post event you will run through the above check anyway,
and take the lock exclusive even if you don't need it.  By doing
the check right after the dmapi even you only run through the sequence
of checks leading to the above one guaranteed once, instead of
potentially twice (in addition to the initial once or twice before the
dmapi event).  Alternatively you could also have a flag that sais don't
bother with taking the lock exclusive that gets set after the dmapi
nospace even code.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
  2008-12-09  9:22     ` Christoph Hellwig
@ 2008-12-22  8:53       ` Christoph Hellwig
  2008-12-23  0:40         ` Lachlan McIlroy
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2008-12-22  8:53 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Christoph Hellwig, xfs-oss

Do you need more input on this one?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
  2008-12-22  8:53       ` Christoph Hellwig
@ 2008-12-23  0:40         ` Lachlan McIlroy
  2008-12-23  8:40           ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Lachlan McIlroy @ 2008-12-23  0:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs-oss

Christoph Hellwig wrote:
> Do you need more input on this one?

Actually I just might.  Based on your last reponse I wasn't sure if
you wanted me to make further changes.  Then I got side-tracked wondering
why we even have the 'goto retry' in the dmapi post event - why retry the
write if we get ENOSPC when we don't if dmapi is not enabled?  Could the
write get stuck in an infinite loop?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
  2008-12-23  0:40         ` Lachlan McIlroy
@ 2008-12-23  8:40           ` Christoph Hellwig
  2008-12-24  1:10             ` Lachlan McIlroy
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2008-12-23  8:40 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Christoph Hellwig, xfs-oss

On Tue, Dec 23, 2008 at 11:40:24AM +1100, Lachlan McIlroy wrote:
> Christoph Hellwig wrote:
> > Do you need more input on this one?
> 
> Actually I just might.  Based on your last reponse I wasn't sure if
> you wanted me to make further changes.

Well, my reponse was that I think we could do it more effecient, but the
patch still looks correct to me.

> Then I got side-tracked wondering
> why we even have the 'goto retry' in the dmapi post event - why retry the
> write if we get ENOSPC when we don't if dmapi is not enabled?  Could the
> write get stuck in an infinite loop?

We only retry on ENOSPC if the dmapi nospace even is enabled, or am I
missing something?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
  2008-12-23  8:40           ` Christoph Hellwig
@ 2008-12-24  1:10             ` Lachlan McIlroy
  2008-12-24  2:10               ` Niv Sardi
  0 siblings, 1 reply; 10+ messages in thread
From: Lachlan McIlroy @ 2008-12-24  1:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs-oss

Christoph Hellwig wrote:
> On Tue, Dec 23, 2008 at 11:40:24AM +1100, Lachlan McIlroy wrote:
>> Christoph Hellwig wrote:
>>> Do you need more input on this one?
>> Actually I just might.  Based on your last reponse I wasn't sure if
>> you wanted me to make further changes.
> 
> Well, my reponse was that I think we could do it more effecient, but the
> patch still looks correct to me.
Okay great.  I'll check it in and we can improve it later when I understand
what you meant!

> 
>> Then I got side-tracked wondering
>> why we even have the 'goto retry' in the dmapi post event - why retry the
>> write if we get ENOSPC when we don't if dmapi is not enabled?  Could the
>> write get stuck in an infinite loop?
> 
> We only retry on ENOSPC if the dmapi nospace even is enabled, or am I
> missing something?
I don't think you're missing anything here.  I don't understand how the
DMAPI stuff works but I imagined the event was there to indicate that the
write failed but what I don't understand is why that justifies a retry.
Is there something about DMAPI that needs the write to succeed?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
  2008-12-24  1:10             ` Lachlan McIlroy
@ 2008-12-24  2:10               ` Niv Sardi
  2008-12-24  2:23                 ` Lachlan McIlroy
  0 siblings, 1 reply; 10+ messages in thread
From: Niv Sardi @ 2008-12-24  2:10 UTC (permalink / raw)
  To: lachlan; +Cc: Christoph Hellwig, xfs-oss

Lachlan McIlroy <lachlan@sgi.com> writes:

> Christoph Hellwig wrote:
>
>> On Tue, Dec 23, 2008 at 11:40:24AM +1100, Lachlan McIlroy wrote:
>>> Christoph Hellwig wrote:
>>>> Do you need more input on this one?
>>> Actually I just might.  Based on your last reponse I wasn't sure if
>>> you wanted me to make further changes.
>> 
>> Well, my reponse was that I think we could do it more effecient, but the
>> patch still looks correct to me.
> Okay great.  I'll check it in and we can improve it later when I understand
> what you meant!
>
>> 
>>> Then I got side-tracked wondering
>>> why we even have the 'goto retry' in the dmapi post event - why retry the
>>> write if we get ENOSPC when we don't if dmapi is not enabled?  Could the
>>> write get stuck in an infinite loop?
>> 
>> We only retry on ENOSPC if the dmapi nospace even is enabled, or am I
>> missing something?
> I don't think you're missing anything here.  I don't understand how the
> DMAPI stuff works but I imagined the event was there to indicate that the
> write failed but what I don't understand is why that justifies a retry.
> Is there something about DMAPI that needs the write to succeed?

yes

http://www.opengroup.org/onlinepubs/9657099/chap3.htm#tagcjh_04_02_04


-- 
Niv Sardi

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
  2008-12-24  2:10               ` Niv Sardi
@ 2008-12-24  2:23                 ` Lachlan McIlroy
  0 siblings, 0 replies; 10+ messages in thread
From: Lachlan McIlroy @ 2008-12-24  2:23 UTC (permalink / raw)
  To: Niv Sardi; +Cc: Christoph Hellwig, xfs-oss

Niv Sardi wrote:
> Lachlan McIlroy <lachlan@sgi.com> writes:
> 
>> Christoph Hellwig wrote:
>>
>>> On Tue, Dec 23, 2008 at 11:40:24AM +1100, Lachlan McIlroy wrote:
>>>> Christoph Hellwig wrote:
>>>>> Do you need more input on this one?
>>>> Actually I just might.  Based on your last reponse I wasn't sure if
>>>> you wanted me to make further changes.
>>> Well, my reponse was that I think we could do it more effecient, but the
>>> patch still looks correct to me.
>> Okay great.  I'll check it in and we can improve it later when I understand
>> what you meant!
>>
>>>> Then I got side-tracked wondering
>>>> why we even have the 'goto retry' in the dmapi post event - why retry the
>>>> write if we get ENOSPC when we don't if dmapi is not enabled?  Could the
>>>> write get stuck in an infinite loop?
>>> We only retry on ENOSPC if the dmapi nospace even is enabled, or am I
>>> missing something?
>> I don't think you're missing anything here.  I don't understand how the
>> DMAPI stuff works but I imagined the event was there to indicate that the
>> write failed but what I don't understand is why that justifies a retry.
>> Is there something about DMAPI that needs the write to succeed?
> 
> yes
> 
> http://www.opengroup.org/onlinepubs/9657099/chap3.htm#tagcjh_04_02_04
Oh no, not the DMAPI spec - I don't want to go there!  Sounds like the
XFS_SEND_NAMESP() event can choose to fail and that will prevent an
infinite loop.  Thanks Niv.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-12-24  2:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-04  6:33 [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] Lachlan McIlroy
2008-12-08 22:51 ` Christoph Hellwig
2008-12-09  5:10   ` Lachlan McIlroy
2008-12-09  9:22     ` Christoph Hellwig
2008-12-22  8:53       ` Christoph Hellwig
2008-12-23  0:40         ` Lachlan McIlroy
2008-12-23  8:40           ` Christoph Hellwig
2008-12-24  1:10             ` Lachlan McIlroy
2008-12-24  2:10               ` Niv Sardi
2008-12-24  2:23                 ` Lachlan McIlroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox