From: Dave Chinner <david@fromorbit.com>
To: Lachlan McIlroy <lachlan@sgi.com>
Cc: xfs@oss.sgi.com, xfs-dev <xfs-dev@sgi.com>
Subject: Re: [PATCH] Move vn_iowait() earlier in the reclaim path
Date: Wed, 6 Aug 2008 15:20:53 +1000 [thread overview]
Message-ID: <20080806052053.GU6119@disturbed> (raw)
In-Reply-To: <48990C4E.9070102@sgi.com>
On Wed, Aug 06, 2008 at 12:28:30PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Tue, Aug 05, 2008 at 05:52:34PM +1000, Lachlan McIlroy wrote:
>>> Dave Chinner wrote:
>>>> On Tue, Aug 05, 2008 at 04:43:29PM +1000, Lachlan McIlroy wrote:
>>>>> Currently by the time we get to vn_iowait() in xfs_reclaim() we have already
>>>>> gone through xfs_inactive()/xfs_free() and recycled the inode. Any I/O
>>>> xfs_free()? What's that?
>>> Sorry that should have been xfs_ifree() (we set the inode's mode to
>>> zero in there).
>>>
>>>>> completions still running (file size updates and unwritten extent conversions)
>>>>> may be working on an inode that is no longer valid.
>>>> The linux inode does not get freed until after ->clear_inode
>>>> completes, hence it is perfectly valid to reference it anywhere
>>>> in the ->clear_inode path.
>>> The problem I see is an assert in xfs_setfilesize() fail:
>>>
>>> ASSERT((ip->i_d.di_mode & S_IFMT) == S_IFREG);
>>>
>>> The mode of the XFS inode is zero at this time.
>>
>> Ok, so the question has to be why is there I/O still in progress
>> after the truncate is supposed to have already occurred and the
>> vn_iowait() in xfs_itruncate_start() been executed.
>>
>> Something doesn't add up here - you can't be doing I/O on a file
>> with no extents or delalloc blocks, hence that means we should be
>> passing through the truncate path in xfs_inactive() before we
>> call xfs_ifree() and therefore doing the vn_iowait()..
>>
>> Hmmmm - the vn_iowait() is conditional based on:
>>
>> /* wait for the completion of any pending DIOs */
>> if (new_size < ip->i_size)
>> vn_iowait(ip);
>>
>> We are truncating to zero (new_size == 0), so the only case where
>> this would not wait is if ip->i_size == 0. Still - I can't see
>> how we'd be doing I/O on an inode with a zero i_size. I suspect
>> ensuring we call vn_iowait() if newsize == 0 as well would fix
>> the problem. If not, there's something much more subtle going
>> on here that we should understand....
>
> If we make the vn_iowait() unconditional we might re-introduce the
> NFS exclusivity bug that killed performance. That was through
> xfs_release()->xfs_free_eofblocks()->xfs_itruncate_start().
It won't reintroduce that problem because ->clear_inode()
is not called on every NFS write operation.
> So if we leave the above code as is then we need another
> vn_iowait() in xfs_inactive() to catch any remaining workqueue
> items that we didn't wait for in xfs_itruncate_start().
How do we have any new *data* I/O at all in progress at this point?
That does not explain why we need an additional vn_iowait() call.
All I see from this is a truncate race that has somethign to do with
the vn_iowait() call being conditional.
That is, if we truncate to zero, then the current code in
xfs_itruncate_start() should wait unconditinally for *all* I/O to
complete because, by definition, all that I/O is beyond the new EOF
and we have to wait for it to complete before truncating the file.
Seeing as we are in ->clear_inode(), no new data I/O can start
while we are deep in this code, hence we should not be seeing
I/O completions after the truncate starts and vn_iowait() has
completed.
Hence we need to know, firstly, if the truncate code has been
called; Secondly, what the value of i_size and i_new_size was when
the truncate was started and, finally, whether ip->i_iocount was
non-zero when the truncate was run. That is, we need to gather
enough data to determine whether we should have waited in the
truncate but didn't.
If either the vn_iowait() in the truncate path is not sufficient, or
the truncate code is not being called, there is *some other bug*
that we don't yet understand. Adding an unconditional vn_iowait()
appear to me to be fixing a symptom, not the underlying cause of
the problem....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2008-08-06 5:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-05 6:43 [PATCH] Move vn_iowait() earlier in the reclaim path Lachlan McIlroy
2008-08-05 7:37 ` Dave Chinner
2008-08-05 7:44 ` Dave Chinner
2008-08-05 7:52 ` Lachlan McIlroy
2008-08-05 8:42 ` Dave Chinner
2008-08-06 2:28 ` Lachlan McIlroy
2008-08-06 5:20 ` Dave Chinner [this message]
2008-08-06 6:10 ` Lachlan McIlroy
2008-08-06 9:38 ` Dave Chinner
2008-08-07 8:43 ` Lachlan McIlroy
2008-08-08 8:32 ` Lachlan McIlroy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080806052053.GU6119@disturbed \
--to=david@fromorbit.com \
--cc=lachlan@sgi.com \
--cc=xfs-dev@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.