* [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
@ 2008-12-04 6:33 Lachlan McIlroy
2008-12-08 22:51 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Lachlan McIlroy @ 2008-12-04 6:33 UTC (permalink / raw)
To: xfs-oss
ping.
(forwarding message since my mailer eats the patch when replying).
-------- Original Message --------
Subject: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI
Date: Mon, 22 Sep 2008 17:06:24 +1000
From: Lachlan McIlroy <lachlan@sgi.com>
Reply-To: lachlan@sgi.com
To: xfs-dev <xfs-dev@sgi.com>, xfs-oss <xfs@oss.sgi.com>
The iolock is dropped and re-acquired around the call to XFS_SEND_NAMESP().
While the iolock is released the file can become cached. We then
'goto retry' and - if we are doing direct I/O - mapping->nrpages may now be
non zero but need_i_mutex will be zero and we will hit the WARN_ON().
Since we have dropped the I/O lock then the file size may have also changed
so what we need to do here is 'goto start' like we do for the XFS_SEND_DATA()
DMAPI event.
We also need to update the filesize before releasing the iolock so that
needs to be done before the XFS_SEND_NAMESP event. If we drop the iolock
before setting the filesize we could race with a truncate.
--- a/fs/xfs/linux-2.6/xfs_lrw.c 2008-09-22 15:47:38.000000000 +1000
+++ b/fs/xfs/linux-2.6/xfs_lrw.c 2008-09-22 15:50:56.000000000 +1000
@@ -707,7 +707,6 @@ start:
}
}
-retry:
/* We can write back this queue in page reclaim */
current->backing_dev_info = mapping->backing_dev_info;
@@ -763,6 +762,17 @@ retry:
if (ret == -EIOCBQUEUED && !(ioflags & IO_ISAIO))
ret = wait_on_sync_kiocb(iocb);
+ isize = i_size_read(inode);
+ if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize))
+ *offset = isize;
+
+ if (*offset > xip->i_size) {
+ xfs_ilock(xip, XFS_ILOCK_EXCL);
+ if (*offset > xip->i_size)
+ xip->i_size = *offset;
+ xfs_iunlock(xip, XFS_ILOCK_EXCL);
+ }
+
if (ret == -ENOSPC &&
DM_EVENT_ENABLED(xip, DM_EVENT_NOSPACE) && !(ioflags & IO_INVIS)) {
xfs_iunlock(xip, iolock);
@@ -776,20 +786,7 @@ retry:
xfs_ilock(xip, iolock);
if (error)
goto out_unlock_internal;
- pos = xip->i_size;
- ret = 0;
- goto retry;
- }
-
- isize = i_size_read(inode);
- if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize))
- *offset = isize;
-
- if (*offset > xip->i_size) {
- xfs_ilock(xip, XFS_ILOCK_EXCL);
- if (*offset > xip->i_size)
- xip->i_size = *offset;
- xfs_iunlock(xip, XFS_ILOCK_EXCL);
+ goto start;
}
error = -ret;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] 2008-12-04 6:33 [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] Lachlan McIlroy @ 2008-12-08 22:51 ` Christoph Hellwig 2008-12-09 5:10 ` Lachlan McIlroy 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2008-12-08 22:51 UTC (permalink / raw) To: Lachlan McIlroy; +Cc: xfs-oss On Thu, Dec 04, 2008 at 05:33:21PM +1100, Lachlan McIlroy wrote: > --- a/fs/xfs/linux-2.6/xfs_lrw.c 2008-09-22 15:47:38.000000000 +1000 > +++ b/fs/xfs/linux-2.6/xfs_lrw.c 2008-09-22 15:50:56.000000000 +1000 > @@ -707,7 +707,6 @@ start: > } > } > > -retry: > /* We can write back this queue in page reclaim */ > current->backing_dev_info = mapping->backing_dev_info; > > @@ -763,6 +762,17 @@ retry: > if (ret == -EIOCBQUEUED && !(ioflags & IO_ISAIO)) > ret = wait_on_sync_kiocb(iocb); > > + isize = i_size_read(inode); > + if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize)) > + *offset = isize; > + > + if (*offset > xip->i_size) { > + xfs_ilock(xip, XFS_ILOCK_EXCL); > + if (*offset > xip->i_size) > + xip->i_size = *offset; > + xfs_iunlock(xip, XFS_ILOCK_EXCL); > + } > + > if (ret == -ENOSPC && > DM_EVENT_ENABLED(xip, DM_EVENT_NOSPACE) && !(ioflags & IO_INVIS)) { > xfs_iunlock(xip, iolock); Moving these updates to before the dmapi nospace callout provale doesn't make any changes to the non-dmapi codepath, so good from that perspective. And as you say above it makes sense to have this update before the dmapi callout. > @@ -776,20 +786,7 @@ retry: > xfs_ilock(xip, iolock); > if (error) > goto out_unlock_internal; > - pos = xip->i_size; > - ret = 0; > - goto retry; > - } > - > - isize = i_size_read(inode); > - if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize)) > - *offset = isize; > - > - if (*offset > xip->i_size) { > - xfs_ilock(xip, XFS_ILOCK_EXCL); > - if (*offset > xip->i_size) > - xip->i_size = *offset; > - xfs_iunlock(xip, XFS_ILOCK_EXCL); > + goto start; Again all this won't affect non-dmapi operations, so OK with my mainline hat on. Now if we check what start does over the old retry labels: - calls generic_write_checks. This could and should redo checks based on the new inode size, ok. - dmapi write even - shouldn't happen because eventsent is non-zero, ok. - O_DIRECT alignment validation. Superflous, but harmless, ok. - check for exclusive lock. This is what you said you wanted, and indded due to the lock dropping we need it. But why don't you duplicate this check in the dmapi case below so that we only have to go to start once instead of possibly twice? - i_new_size update - needed due to the possible i_size changes, ok - ichgtime - if time passed since the last time we might want to re-updated it, ok - zero_eof, ok - setuid clearing, superflous, but harmless. So the patch looks good to me, but as mention above it might be possible to optimize it a littler more. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] 2008-12-08 22:51 ` Christoph Hellwig @ 2008-12-09 5:10 ` Lachlan McIlroy 2008-12-09 9:22 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: Lachlan McIlroy @ 2008-12-09 5:10 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs-oss Christoph Hellwig wrote: > On Thu, Dec 04, 2008 at 05:33:21PM +1100, Lachlan McIlroy wrote: >> --- a/fs/xfs/linux-2.6/xfs_lrw.c 2008-09-22 15:47:38.000000000 +1000 >> +++ b/fs/xfs/linux-2.6/xfs_lrw.c 2008-09-22 15:50:56.000000000 +1000 >> @@ -707,7 +707,6 @@ start: >> } >> } >> >> -retry: >> /* We can write back this queue in page reclaim */ >> current->backing_dev_info = mapping->backing_dev_info; >> >> @@ -763,6 +762,17 @@ retry: >> if (ret == -EIOCBQUEUED && !(ioflags & IO_ISAIO)) >> ret = wait_on_sync_kiocb(iocb); >> >> + isize = i_size_read(inode); >> + if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize)) >> + *offset = isize; >> + >> + if (*offset > xip->i_size) { >> + xfs_ilock(xip, XFS_ILOCK_EXCL); >> + if (*offset > xip->i_size) >> + xip->i_size = *offset; >> + xfs_iunlock(xip, XFS_ILOCK_EXCL); >> + } >> + >> if (ret == -ENOSPC && >> DM_EVENT_ENABLED(xip, DM_EVENT_NOSPACE) && !(ioflags & IO_INVIS)) { >> xfs_iunlock(xip, iolock); > > Moving these updates to before the dmapi nospace callout provale doesn't > make any changes to the non-dmapi codepath, so good from that > perspective. And as you say above it makes sense to have this update > before the dmapi callout. > >> @@ -776,20 +786,7 @@ retry: >> xfs_ilock(xip, iolock); >> if (error) >> goto out_unlock_internal; >> - pos = xip->i_size; >> - ret = 0; >> - goto retry; >> - } >> - >> - isize = i_size_read(inode); >> - if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize)) >> - *offset = isize; >> - >> - if (*offset > xip->i_size) { >> - xfs_ilock(xip, XFS_ILOCK_EXCL); >> - if (*offset > xip->i_size) >> - xip->i_size = *offset; >> - xfs_iunlock(xip, XFS_ILOCK_EXCL); >> + goto start; > > Again all this won't affect non-dmapi operations, so OK with my mainline > hat on. Now if we check what start does over the old retry labels: > > - calls generic_write_checks. This could and should redo checks based > on the new inode size, ok. > - dmapi write even - shouldn't happen because eventsent is non-zero, > ok. > - O_DIRECT alignment validation. Superflous, but harmless, ok. > - check for exclusive lock. This is what you said you wanted, and > indded due to the lock dropping we need it. But why don't > you duplicate this check in the dmapi case below so that we > only have to go to start once instead of possibly twice? Thanks for looking at this Christoph. I'm not sure what you mean by duplicating the checks. I assume you mean this check: if (!need_i_mutex && (mapping->nrpages || pos > xip->i_size)) { xfs_iunlock(xip, XFS_ILOCK_EXCL|iolock); iolock = XFS_IOLOCK_EXCL; need_i_mutex = 1; mutex_lock(&inode->i_mutex); xfs_ilock(xip, XFS_ILOCK_EXCL|iolock); goto start; } This check is there because the default case for direct I/O is to acquire the iolock in shared mode. If we have work to do which requires the iolock to be held exclusive then drop the lock and get it again. Since we dropped the lock then restart. In the dmapi post-write event it doesn't matter if we have the iolock shared or exclusive - it will be dropped regardless so I don't see how checking the state of the iolock will allow us to avoid a restart. > - i_new_size update - needed due to the possible i_size changes, ok > - ichgtime - if time passed since the last time we might want to > re-updated it, ok > - zero_eof, ok > - setuid clearing, superflous, but harmless. > > So the patch looks good to me, but as mention above it might be possible > to optimize it a littler more. > > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] 2008-12-09 5:10 ` Lachlan McIlroy @ 2008-12-09 9:22 ` Christoph Hellwig 2008-12-22 8:53 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2008-12-09 9:22 UTC (permalink / raw) To: Lachlan McIlroy; +Cc: Christoph Hellwig, xfs-oss On Tue, Dec 09, 2008 at 04:10:21PM +1100, Lachlan McIlroy wrote: > Thanks for looking at this Christoph. > > I'm not sure what you mean by duplicating the checks. I assume you > mean this check: > > if (!need_i_mutex && (mapping->nrpages || pos > xip->i_size)) { > xfs_iunlock(xip, XFS_ILOCK_EXCL|iolock); > iolock = XFS_IOLOCK_EXCL; > need_i_mutex = 1; > mutex_lock(&inode->i_mutex); > xfs_ilock(xip, XFS_ILOCK_EXCL|iolock); > goto start; > } Yes. > This check is there because the default case for direct I/O is to > acquire the iolock in shared mode. If we have work to do which > requires the iolock to be held exclusive then drop the lock and get > it again. Since we dropped the lock then restart. > > In the dmapi post-write event it doesn't matter if we have the > iolock shared or exclusive - it will be dropped regardless so I > don't see how checking the state of the iolock will allow us to > avoid a restart. All very true, but it doesn't matter :) When you do the goto start after the dmapi post event you will run through the above check anyway, and take the lock exclusive even if you don't need it. By doing the check right after the dmapi even you only run through the sequence of checks leading to the above one guaranteed once, instead of potentially twice (in addition to the initial once or twice before the dmapi event). Alternatively you could also have a flag that sais don't bother with taking the lock exclusive that gets set after the dmapi nospace even code. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] 2008-12-09 9:22 ` Christoph Hellwig @ 2008-12-22 8:53 ` Christoph Hellwig 2008-12-23 0:40 ` Lachlan McIlroy 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2008-12-22 8:53 UTC (permalink / raw) To: Lachlan McIlroy; +Cc: Christoph Hellwig, xfs-oss Do you need more input on this one? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] 2008-12-22 8:53 ` Christoph Hellwig @ 2008-12-23 0:40 ` Lachlan McIlroy 2008-12-23 8:40 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: Lachlan McIlroy @ 2008-12-23 0:40 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs-oss Christoph Hellwig wrote: > Do you need more input on this one? Actually I just might. Based on your last reponse I wasn't sure if you wanted me to make further changes. Then I got side-tracked wondering why we even have the 'goto retry' in the dmapi post event - why retry the write if we get ENOSPC when we don't if dmapi is not enabled? Could the write get stuck in an infinite loop? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] 2008-12-23 0:40 ` Lachlan McIlroy @ 2008-12-23 8:40 ` Christoph Hellwig 2008-12-24 1:10 ` Lachlan McIlroy 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2008-12-23 8:40 UTC (permalink / raw) To: Lachlan McIlroy; +Cc: Christoph Hellwig, xfs-oss On Tue, Dec 23, 2008 at 11:40:24AM +1100, Lachlan McIlroy wrote: > Christoph Hellwig wrote: > > Do you need more input on this one? > > Actually I just might. Based on your last reponse I wasn't sure if > you wanted me to make further changes. Well, my reponse was that I think we could do it more effecient, but the patch still looks correct to me. > Then I got side-tracked wondering > why we even have the 'goto retry' in the dmapi post event - why retry the > write if we get ENOSPC when we don't if dmapi is not enabled? Could the > write get stuck in an infinite loop? We only retry on ENOSPC if the dmapi nospace even is enabled, or am I missing something? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] 2008-12-23 8:40 ` Christoph Hellwig @ 2008-12-24 1:10 ` Lachlan McIlroy 2008-12-24 2:10 ` Niv Sardi 0 siblings, 1 reply; 10+ messages in thread From: Lachlan McIlroy @ 2008-12-24 1:10 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs-oss Christoph Hellwig wrote: > On Tue, Dec 23, 2008 at 11:40:24AM +1100, Lachlan McIlroy wrote: >> Christoph Hellwig wrote: >>> Do you need more input on this one? >> Actually I just might. Based on your last reponse I wasn't sure if >> you wanted me to make further changes. > > Well, my reponse was that I think we could do it more effecient, but the > patch still looks correct to me. Okay great. I'll check it in and we can improve it later when I understand what you meant! > >> Then I got side-tracked wondering >> why we even have the 'goto retry' in the dmapi post event - why retry the >> write if we get ENOSPC when we don't if dmapi is not enabled? Could the >> write get stuck in an infinite loop? > > We only retry on ENOSPC if the dmapi nospace even is enabled, or am I > missing something? I don't think you're missing anything here. I don't understand how the DMAPI stuff works but I imagined the event was there to indicate that the write failed but what I don't understand is why that justifies a retry. Is there something about DMAPI that needs the write to succeed? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] 2008-12-24 1:10 ` Lachlan McIlroy @ 2008-12-24 2:10 ` Niv Sardi 2008-12-24 2:23 ` Lachlan McIlroy 0 siblings, 1 reply; 10+ messages in thread From: Niv Sardi @ 2008-12-24 2:10 UTC (permalink / raw) To: lachlan; +Cc: Christoph Hellwig, xfs-oss Lachlan McIlroy <lachlan@sgi.com> writes: > Christoph Hellwig wrote: > >> On Tue, Dec 23, 2008 at 11:40:24AM +1100, Lachlan McIlroy wrote: >>> Christoph Hellwig wrote: >>>> Do you need more input on this one? >>> Actually I just might. Based on your last reponse I wasn't sure if >>> you wanted me to make further changes. >> >> Well, my reponse was that I think we could do it more effecient, but the >> patch still looks correct to me. > Okay great. I'll check it in and we can improve it later when I understand > what you meant! > >> >>> Then I got side-tracked wondering >>> why we even have the 'goto retry' in the dmapi post event - why retry the >>> write if we get ENOSPC when we don't if dmapi is not enabled? Could the >>> write get stuck in an infinite loop? >> >> We only retry on ENOSPC if the dmapi nospace even is enabled, or am I >> missing something? > I don't think you're missing anything here. I don't understand how the > DMAPI stuff works but I imagined the event was there to indicate that the > write failed but what I don't understand is why that justifies a retry. > Is there something about DMAPI that needs the write to succeed? yes http://www.opengroup.org/onlinepubs/9657099/chap3.htm#tagcjh_04_02_04 -- Niv Sardi _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] 2008-12-24 2:10 ` Niv Sardi @ 2008-12-24 2:23 ` Lachlan McIlroy 0 siblings, 0 replies; 10+ messages in thread From: Lachlan McIlroy @ 2008-12-24 2:23 UTC (permalink / raw) To: Niv Sardi; +Cc: Christoph Hellwig, xfs-oss Niv Sardi wrote: > Lachlan McIlroy <lachlan@sgi.com> writes: > >> Christoph Hellwig wrote: >> >>> On Tue, Dec 23, 2008 at 11:40:24AM +1100, Lachlan McIlroy wrote: >>>> Christoph Hellwig wrote: >>>>> Do you need more input on this one? >>>> Actually I just might. Based on your last reponse I wasn't sure if >>>> you wanted me to make further changes. >>> Well, my reponse was that I think we could do it more effecient, but the >>> patch still looks correct to me. >> Okay great. I'll check it in and we can improve it later when I understand >> what you meant! >> >>>> Then I got side-tracked wondering >>>> why we even have the 'goto retry' in the dmapi post event - why retry the >>>> write if we get ENOSPC when we don't if dmapi is not enabled? Could the >>>> write get stuck in an infinite loop? >>> We only retry on ENOSPC if the dmapi nospace even is enabled, or am I >>> missing something? >> I don't think you're missing anything here. I don't understand how the >> DMAPI stuff works but I imagined the event was there to indicate that the >> write failed but what I don't understand is why that justifies a retry. >> Is there something about DMAPI that needs the write to succeed? > > yes > > http://www.opengroup.org/onlinepubs/9657099/chap3.htm#tagcjh_04_02_04 Oh no, not the DMAPI spec - I don't want to go there! Sounds like the XFS_SEND_NAMESP() event can choose to fail and that will prevent an infinite loop. Thanks Niv. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-12-24 2:22 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-12-04 6:33 [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] Lachlan McIlroy 2008-12-08 22:51 ` Christoph Hellwig 2008-12-09 5:10 ` Lachlan McIlroy 2008-12-09 9:22 ` Christoph Hellwig 2008-12-22 8:53 ` Christoph Hellwig 2008-12-23 0:40 ` Lachlan McIlroy 2008-12-23 8:40 ` Christoph Hellwig 2008-12-24 1:10 ` Lachlan McIlroy 2008-12-24 2:10 ` Niv Sardi 2008-12-24 2:23 ` Lachlan McIlroy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox