* question on xfs_vm_writepage in combination with fsync
@ 2011-06-20 20:56 Kevan Rehm
2011-06-20 23:50 ` Dave Chinner
0 siblings, 1 reply; 2+ messages in thread
From: Kevan Rehm @ 2011-06-20 20:56 UTC (permalink / raw)
To: xfs
Greetings,
I've run into a case where the fsync() system call seems to have
returned before all file data was actually on disk. (A SLES11SP1 system
crash occurred shortly after an fsync which had returned zero. After
restarting the machine, the last I/O before the fsync is not in the
file.) In attempting to find the problem, I've come across code I don't
understand, and am hoping someone can enlighten me as to how things are
supposed to work.
Routine xfs_vm_writepage has various situations under which it will
decide it can't currently initiate writeback on a page, and in that case
calls redirty_page_for_writepage, unlocks the page, and returns zero.
That seems to me to be incompatible with fsync(), so I'm obviously
missing some key piece of logic.
The calling sequence of routines involved in fsync is:
do_fsync->vfs_fsync->vfs_fsync_range->
filemap_write_and_wait_range->
__filemap_fdatawrite_range->
do_writepages->generic_writepages->
write_cache_pages
Routine write_cache_pages walks the radix tree and calls
clear_page_dirty_for_io and then __writepage on each dirty page to
initiate writeback. __writepage calls xfs_vm_writepage. That routine
is occasionally unable to immediately start writeback of the page, and
so it calls redirty_page_for_writepage without setting the writeback flag.
When write_cache_pages resumes after the __writepage call, it continues
walking the radix tree starting additional writebacks on dirty pages,
but nothing I can see will ever come back and try again to start a
writeback on the page that xfs_vm_writepage couldn't writeback.
Eventually control bubbles back up to filemap_write_and_wait_range()
where wait_on_page_writeback_range is called, but that routine only
waits for writebacks to complete, it doesn't do anything about dirty
pages. So it appears to me that the dirty page will be left dirty
indefinitely even though the wbc contained WB_SYNC_ALL.
I'd like to believe that I am missing something, and that the code is
correct, but I do have a crash dump where I can see dirty pages in files
that were recently fsync'd. And I can't believe the problem is
something inside XFS, because I see other filesystems also call
redirty_page_for_writepage, so I think the same problem could occur with
them.
Could someone please describe to me how fsync is supposed to work in
combination with xfs_vm_writepage?
Thanks in advance,
Regards, Kevan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: question on xfs_vm_writepage in combination with fsync
2011-06-20 20:56 question on xfs_vm_writepage in combination with fsync Kevan Rehm
@ 2011-06-20 23:50 ` Dave Chinner
0 siblings, 0 replies; 2+ messages in thread
From: Dave Chinner @ 2011-06-20 23:50 UTC (permalink / raw)
To: Kevan Rehm; +Cc: xfs
On Mon, Jun 20, 2011 at 03:56:19PM -0500, Kevan Rehm wrote:
> Greetings,
>
> I've run into a case where the fsync() system call seems to have
> returned before all file data was actually on disk. (A SLES11SP1 system
> crash occurred shortly after an fsync which had returned zero. After
> restarting the machine, the last I/O before the fsync is not in the
> file.) In attempting to find the problem, I've come across code I don't
> understand, and am hoping someone can enlighten me as to how things are
> supposed to work.
>
> Routine xfs_vm_writepage has various situations under which it will
> decide it can't currently initiate writeback on a page, and in that case
> calls redirty_page_for_writepage, unlocks the page, and returns zero.
> That seems to me to be incompatible with fsync(), so I'm obviously
> missing some key piece of logic.
>
> The calling sequence of routines involved in fsync is:
>
> do_fsync->vfs_fsync->vfs_fsync_range->
> filemap_write_and_wait_range->
> __filemap_fdatawrite_range->
> do_writepages->generic_writepages->
> write_cache_pages
>
> Routine write_cache_pages walks the radix tree and calls
> clear_page_dirty_for_io and then __writepage on each dirty page to
> initiate writeback. __writepage calls xfs_vm_writepage. That routine
> is occasionally unable to immediately start writeback of the page, and
> so it calls redirty_page_for_writepage without setting the writeback flag.
Hi Kevan,
The current xfs_vm_writepage mainline code will only enter the
redirty path if:
- it is called from direct memory reclaim
- it is called within a transaction context and we need to
do an allocation transaction
- it is WB_SYNC_NONE writeback and we can't get the inode
lock without blocking during block mapping (EAGAIN case).
None of these cases are triggered by fsync() driven (WB_SYNC_ALL)
writeback, so AFAICT fsync() based writeback should not be skipping
writeback of dirty pages in the given fsync range. So for a mainline
kernel I don't think there are any problems w.r.t. fsync() and
redirtying pages causing dirty pages to be skipped during writeback.
However, the mainline writeback path has had significant change
(especially to WB_SYNC_ALL writeback) since sles11sp1 was
snapshotted (2.6.32, right?). Hence it is possible that one (or
several) of the changes fixed this bug without us even realising it
was a problem.
That said, having dirty pages after an fsync is not necessarily an
fsync bug - something coul dhave dirtied them while the fsync was in
progress. I don't know any details of how this occurred, so I'm
simply speculating that there could be other causes of the dirty
pages you are seeing...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2011-06-20 23:51 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-20 20:56 question on xfs_vm_writepage in combination with fsync Kevan Rehm
2011-06-20 23:50 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox