From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id mB959uif027427 for ; Mon, 8 Dec 2008 23:09:56 -0600 Message-ID: <493DFDBD.7060909@sgi.com> Date: Tue, 09 Dec 2008 16:10:21 +1100 From: Lachlan McIlroy MIME-Version: 1.0 Subject: Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI] References: <493779B1.3010703@sgi.com> <20081208225125.GA15647@infradead.org> In-Reply-To: <20081208225125.GA15647@infradead.org> Reply-To: lachlan@sgi.com List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Christoph Hellwig Cc: xfs-oss Christoph Hellwig wrote: > On Thu, Dec 04, 2008 at 05:33:21PM +1100, Lachlan McIlroy wrote: >> --- a/fs/xfs/linux-2.6/xfs_lrw.c 2008-09-22 15:47:38.000000000 +1000 >> +++ b/fs/xfs/linux-2.6/xfs_lrw.c 2008-09-22 15:50:56.000000000 +1000 >> @@ -707,7 +707,6 @@ start: >> } >> } >> >> -retry: >> /* We can write back this queue in page reclaim */ >> current->backing_dev_info = mapping->backing_dev_info; >> >> @@ -763,6 +762,17 @@ retry: >> if (ret == -EIOCBQUEUED && !(ioflags & IO_ISAIO)) >> ret = wait_on_sync_kiocb(iocb); >> >> + isize = i_size_read(inode); >> + if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize)) >> + *offset = isize; >> + >> + if (*offset > xip->i_size) { >> + xfs_ilock(xip, XFS_ILOCK_EXCL); >> + if (*offset > xip->i_size) >> + xip->i_size = *offset; >> + xfs_iunlock(xip, XFS_ILOCK_EXCL); >> + } >> + >> if (ret == -ENOSPC && >> DM_EVENT_ENABLED(xip, DM_EVENT_NOSPACE) && !(ioflags & IO_INVIS)) { >> xfs_iunlock(xip, iolock); > > Moving these updates to before the dmapi nospace callout provale doesn't > make any changes to the non-dmapi codepath, so good from that > perspective. And as you say above it makes sense to have this update > before the dmapi callout. > >> @@ -776,20 +786,7 @@ retry: >> xfs_ilock(xip, iolock); >> if (error) >> goto out_unlock_internal; >> - pos = xip->i_size; >> - ret = 0; >> - goto retry; >> - } >> - >> - isize = i_size_read(inode); >> - if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize)) >> - *offset = isize; >> - >> - if (*offset > xip->i_size) { >> - xfs_ilock(xip, XFS_ILOCK_EXCL); >> - if (*offset > xip->i_size) >> - xip->i_size = *offset; >> - xfs_iunlock(xip, XFS_ILOCK_EXCL); >> + goto start; > > Again all this won't affect non-dmapi operations, so OK with my mainline > hat on. Now if we check what start does over the old retry labels: > > - calls generic_write_checks. This could and should redo checks based > on the new inode size, ok. > - dmapi write even - shouldn't happen because eventsent is non-zero, > ok. > - O_DIRECT alignment validation. Superflous, but harmless, ok. > - check for exclusive lock. This is what you said you wanted, and > indded due to the lock dropping we need it. But why don't > you duplicate this check in the dmapi case below so that we > only have to go to start once instead of possibly twice? Thanks for looking at this Christoph. I'm not sure what you mean by duplicating the checks. I assume you mean this check: if (!need_i_mutex && (mapping->nrpages || pos > xip->i_size)) { xfs_iunlock(xip, XFS_ILOCK_EXCL|iolock); iolock = XFS_IOLOCK_EXCL; need_i_mutex = 1; mutex_lock(&inode->i_mutex); xfs_ilock(xip, XFS_ILOCK_EXCL|iolock); goto start; } This check is there because the default case for direct I/O is to acquire the iolock in shared mode. If we have work to do which requires the iolock to be held exclusive then drop the lock and get it again. Since we dropped the lock then restart. In the dmapi post-write event it doesn't matter if we have the iolock shared or exclusive - it will be dropped regardless so I don't see how checking the state of the iolock will allow us to avoid a restart. > - i_new_size update - needed due to the possible i_size changes, ok > - ichgtime - if time passed since the last time we might want to > re-updated it, ok > - zero_eof, ok > - setuid clearing, superflous, but harmless. > > So the patch looks good to me, but as mention above it might be possible > to optimize it a littler more. > > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs