public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Lachlan McIlroy <lachlan@sgi.com>
Cc: xfs@oss.sgi.com, xfs-dev <xfs-dev@sgi.com>
Subject: Re: [PATCH] Move vn_iowait() earlier in the reclaim path
Date: Wed, 6 Aug 2008 15:20:53 +1000	[thread overview]
Message-ID: <20080806052053.GU6119@disturbed> (raw)
In-Reply-To: <48990C4E.9070102@sgi.com>

On Wed, Aug 06, 2008 at 12:28:30PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Tue, Aug 05, 2008 at 05:52:34PM +1000, Lachlan McIlroy wrote:
>>> Dave Chinner wrote:
>>>> On Tue, Aug 05, 2008 at 04:43:29PM +1000, Lachlan McIlroy wrote:
>>>>> Currently by the time we get to vn_iowait() in xfs_reclaim() we have already
>>>>> gone through xfs_inactive()/xfs_free() and recycled the inode.  Any I/O
>>>> xfs_free()? What's that?
>>> Sorry that should have been xfs_ifree() (we set the inode's mode to
>>> zero in there).
>>>
>>>>> completions still running (file size updates and unwritten extent conversions)
>>>>> may be working on an inode that is no longer valid.
>>>> The linux inode does not get freed until after ->clear_inode
>>>> completes, hence it is perfectly valid to reference it anywhere
>>>> in the ->clear_inode path.
>>> The problem I see is an assert in xfs_setfilesize() fail:
>>>
>>> 	ASSERT((ip->i_d.di_mode & S_IFMT) == S_IFREG);
>>>
>>> The mode of the XFS inode is zero at this time.
>>
>> Ok, so the question has to be why is there I/O still in progress
>> after the truncate is supposed to have already occurred and the
>> vn_iowait() in xfs_itruncate_start() been executed.
>>
>> Something doesn't add up here - you can't be doing I/O on a file
>> with no extents or delalloc blocks, hence that means we should be
>> passing through the truncate path in xfs_inactive() before we
>> call xfs_ifree() and therefore doing the vn_iowait()..
>>
>> Hmmmm - the vn_iowait() is conditional based on:
>>
>>         /* wait for the completion of any pending DIOs */
>>         if (new_size < ip->i_size)
>>                 vn_iowait(ip);
>>
>> We are truncating to zero (new_size == 0), so the only case where
>> this would not wait is if ip->i_size == 0. Still - I can't see
>> how we'd be doing I/O on an inode with a zero i_size. I suspect
>> ensuring we call vn_iowait() if newsize == 0 as well would fix
>> the problem. If not, there's something much more subtle going
>> on here that we should understand....
>
> If we make the vn_iowait() unconditional we might re-introduce the
> NFS exclusivity bug that killed performance.  That was through
> xfs_release()->xfs_free_eofblocks()->xfs_itruncate_start().

It won't reintroduce that problem because ->clear_inode()
is not called on every NFS write operation.

> So if we leave the above code as is then we need another
> vn_iowait() in xfs_inactive() to catch any remaining workqueue
> items that we didn't wait for in xfs_itruncate_start().

How do we have any new *data* I/O at all in progress at this point?
That does not explain why we need an additional vn_iowait() call.
All I see from this is a truncate race that has somethign to do with
the vn_iowait() call being conditional.

That is, if we truncate to zero, then the current code in
xfs_itruncate_start() should wait unconditinally for *all* I/O to
complete because, by definition, all that I/O is beyond the new EOF
and we have to wait for it to complete before truncating the file.
Seeing as we are in ->clear_inode(), no new data I/O can start
while we are deep in this code, hence we should not be seeing
I/O completions after the truncate starts and vn_iowait() has
completed.

Hence we need to know, firstly, if the truncate code has been
called; Secondly, what the value of i_size and i_new_size was when
the truncate was started and, finally, whether ip->i_iocount was
non-zero when the truncate was run.  That is, we need to gather
enough data to determine whether we should have waited in the
truncate but didn't.

If either the vn_iowait() in the truncate path is not sufficient, or
the truncate code is not being called, there is *some other bug*
that we don't yet understand.  Adding an unconditional vn_iowait()
appear to me to be fixing a symptom, not the underlying cause of
the problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2008-08-06  5:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-05  6:43 [PATCH] Move vn_iowait() earlier in the reclaim path Lachlan McIlroy
2008-08-05  7:37 ` Dave Chinner
2008-08-05  7:44   ` Dave Chinner
2008-08-05  7:52   ` Lachlan McIlroy
2008-08-05  8:42     ` Dave Chinner
2008-08-06  2:28       ` Lachlan McIlroy
2008-08-06  5:20         ` Dave Chinner [this message]
2008-08-06  6:10           ` Lachlan McIlroy
2008-08-06  9:38             ` Dave Chinner
2008-08-07  8:43               ` Lachlan McIlroy
2008-08-08  8:32                 ` Lachlan McIlroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080806052053.GU6119@disturbed \
    --to=david@fromorbit.com \
    --cc=lachlan@sgi.com \
    --cc=xfs-dev@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox