linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Cleanup generic_osync_inode?
@ 2009-08-14 21:48 Jan Kara
  2009-08-15 15:12 ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2009-08-14 21:48 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: hch, viro

  Hi,

  I was looking at generic_osync_inode() and found out it's kind of
inconsistent with what we have in fsync() path and does not work in all
cases e.g. on ext3 / ext4. The problem is that filesystem never actually
gets to know that it should sync all metadata needed to reach the data -
generic_osync_inode() only does sync_mapping_buffers() but e.g. ext3 / ext4
don't track metadata buffers there. Then it does write_inode_now() which
would actually flush the journal, but it does so only in case inode is
I_DIRTY_DATASYNC... So there are cases where we sync the data but leave
metadata uncommitted.
  What I'd imagine is that generic_osync_inode() would be just like
fdatasync call, only we'd have to add a possibility to avoid fdatawrite /
fdatawait as some callers submit / wait for data themselves. That would
nicely unify those syncing paths.
  The only small problem is with an interface since ->fsync() callback
takes preferably struct file * and at least struct dentry *, while
generic_osync_inode takes just inode. Most of the callers actually have
a struct file * pointer but sync_page_range[_nolock]() do not, so that
would have to be solved somehow.
  Any opinions / ideas?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Cleanup generic_osync_inode?
  2009-08-14 21:48 Cleanup generic_osync_inode? Jan Kara
@ 2009-08-15 15:12 ` Christoph Hellwig
  2009-08-15 15:21   ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2009-08-15 15:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, hch, viro

On Fri, Aug 14, 2009 at 11:48:31PM +0200, Jan Kara wrote:
>   Hi,
> 
>   I was looking at generic_osync_inode() and found out it's kind of
> inconsistent with what we have in fsync() path and does not work in all
> cases e.g. on ext3 / ext4. The problem is that filesystem never actually
> gets to know that it should sync all metadata needed to reach the data -
> generic_osync_inode() only does sync_mapping_buffers() but e.g. ext3 / ext4
> don't track metadata buffers there. Then it does write_inode_now() which
> would actually flush the journal, but it does so only in case inode is
> I_DIRTY_DATASYNC... So there are cases where we sync the data but leave
> metadata uncommitted.

Yeah, it's not very useful.  That's the reason why XFS doesn't rely on
it but rather does a manual log force for the metadata.

>   What I'd imagine is that generic_osync_inode() would be just like
> fdatasync call, only we'd have to add a possibility to avoid fdatawrite /
> fdatawait as some callers submit / wait for data themselves. That would
> nicely unify those syncing paths.
>   The only small problem is with an interface since ->fsync() callback
> takes preferably struct file * and at least struct dentry *, while
> generic_osync_inode takes just inode. Most of the callers actually have
> a struct file * pointer but sync_page_range[_nolock]() do not, so that
> would have to be solved somehow.

Note that the current way ->fsync works is also rather problematic,
for one thing all filesystems that touch metadata on data I/O completion
(XFS, btrfs, ext4 and probably more) really want to first write out
the data _and_ wait for it, and only then sync the inode.  Right now it
every filesystems has to do that itself and do it under i_mutex or
drop/reacquire it which is quite stupid.  The second issue is that
was pass a file which is required for some filesystems (e.g. NFS) but
which may be NULL when coming from NFSD.  I'll need to look into fixing
NFSD and always pass a file here.

No back to the original generic_osync_inode / O_SYNC handling problem:
All callers of sync_page_range actually do have a file pointer, and
all of them are called from generic or filesystem specific write code,
so passing the file pointer to it is no problem at all.
sync_page_range_nolock just has a single caller in fat which does not
have a file pointer.  The right thing here is IMHO:

 (1) open-code sync_page_range_nolock in fat and just get rid of it as a
     generic helper
 (2) replace sync_page_range with a generic_write_sync or similar does
     the range writeouts + a call to ->fsync.  Bonus points for also
     moving the O_SYNC / IS_SYNC checks into that helper.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Cleanup generic_osync_inode?
  2009-08-15 15:12 ` Christoph Hellwig
@ 2009-08-15 15:21   ` Christoph Hellwig
  0 siblings, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2009-08-15 15:21 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, hch, viro


And actually merge our new helper generic_osync_inode.  We should
never had those two different helpers doing pretty much the same.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-08-15 15:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-14 21:48 Cleanup generic_osync_inode? Jan Kara
2009-08-15 15:12 ` Christoph Hellwig
2009-08-15 15:21   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).