All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Emelyanov <xemul@parallels.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: "fuse-devel@lists.sourceforge.net"
	<fuse-devel@lists.sourceforge.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	James Bottomley <jbottomley@parallels.com>,
	Kirill Korotaev <dev@parallels.com>
Subject: Re: [PATCH 6/10] fuse: Trust kernel i_size only
Date: Mon, 16 Jul 2012 07:32:44 +0400	[thread overview]
Message-ID: <50038B5C.4060507@parallels.com> (raw)
In-Reply-To: <87sjcv39pn.fsf@tucsk.pomaz.szeredi.hu>

On 07/13/2012 08:30 PM, Miklos Szeredi wrote:
> Pavel Emelyanov <xemul@parallels.com> writes:
> 
>> On 07/04/2012 06:39 PM, Miklos Szeredi wrote:
>>> Pavel Emelyanov <xemul@parallels.com> writes:
>>>
>>>> Make fuse think that when writeback is on the inode's i_size is alway
>>>> up-to-date and not update it with the value received from the
>>>> userspace. This is done because the page cache code may update i_size
>>>> without letting the FS know.
>>>
>>> Similar rule applies to i_mtime.  Except it's even more tricky, since
>>> you have to flush the mtime together with the data (the flush should not
>>> update the modification time).  And we have other operations which also
>>> change the mtime, and those also need to be updated.
>>>
>>> We should probably look at what NFS is doing.
>>
>> Miklos,
>>
>> I've looked at how NFS and FUSE manage the i_mtime.
>>
>> The NFS (if not looking at the fscache) tries to keep the i_mtime at the
>> maximum between the local and the remote values. This looks like correct
>> behavior for FUSE too.
>>
>> But, looking at the FUSE code I see that the existing attr_version checks
>> do implement the same approach even if we turn the writeback cache (this
>> series) ON.
> 
> It's not as simple as that.
> 
> First off, fuse sets S_NOCMTIME which means the kernel won't update
> i_mtime on writes.  Which is fine for write-through, since the time
> update is always the responsibility of the userspace filesystem.

Oops, I've missed that fact :( I wonder why writes did update the mtime
in my tests then...

> If we are doing buffered writes, then the kernel must update i_mtime on
> write(2) and must flush that to the userspace filesystem at some point
> (with a SETTATTR operation).

Agree.

> The attr_version thing is about making sure that we don't update the
> kernel's cached attribute with stale data from the userspace filesystem.
> E.g. if a GETATTR races with a WRITE then by the time the GETATTR
> finishes write may have already updated i_size despite the fact that
> GETATTR is returning i_size before the write.  In that case we don't
> store the i_size we believe is stale but we do use it the stat(2) reply,
> since it came from the userspace filesystem, which is the authoritative
> source of information.
> 
> But that means that if we want stat(2) to work correctly, then we must
> flush the updated i_mtime to userspace before we do the GETATTR.

Why? Isn't it enough just to look at the per-inode flag (you're talking
below) and take either cached i_mtime value or the user-space verion of it?

> So basically what we need is a per-inode flag that says that i_mtime has
> been updated (it is more recent then what userspace has) and we must
> update i_mtime *only* in write and not other operations which still do
> the mtime update in the userspace filesystem.  Any operation that
> modifies i_mtime (and hence invalidate the attributes) must clear the
> flag.  Any other operation which updates or invalidates the attributes
> must first flush the i_mtime to userspace if the flag is set.
> 
> In addition the userspace fileystem need to implement the policy similar
> to NFS, in which it only updates mtime if it is greater than the current
> one.  This means that we must differentiate between an mtime update due
> to a buffered write from an mtime update due to an utime (and friends)
> system call.
> 
> I think that would work, but it's a tricky thing and needs to be thought
> trough.
> 
> Thanks,
> Miklos
> .
> 


  reply	other threads:[~2012-07-16  3:32 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-03 15:53 [PATCH 0/10] fuse: An attempt to implement a write-back cache policy Pavel Emelyanov
2012-07-03 15:53 ` [PATCH 1/10] fuse: Linking file to inode helper Pavel Emelyanov
2012-07-03 15:54 ` [PATCH 2/10] fuse: Getting file for writeback helper Pavel Emelyanov
2012-07-03 15:54 ` [PATCH 3/10] fuse: Prepare to handle short reads Pavel Emelyanov
2012-07-03 15:55 ` [PATCH 4/10] fuse: Prepare to handle multiple pages in writeback Pavel Emelyanov
2012-07-04 13:06   ` Miklos Szeredi
2012-07-04 14:26     ` Pavel Emelyanov
2012-07-03 15:55 ` [PATCH 6/10] fuse: Trust kernel i_size only Pavel Emelyanov
2012-07-04 14:39   ` Miklos Szeredi
2012-07-05 14:10     ` Pavel Emelyanov
2012-07-10  5:53     ` Pavel Emelyanov
2012-07-13 16:30       ` Miklos Szeredi
2012-07-16  3:32         ` Pavel Emelyanov [this message]
2012-07-17 15:17           ` Miklos Szeredi
     [not found]     ` <8762a3pp3m.fsf-d8RdFUjzFsbxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2012-10-01 17:30       ` Maxim V. Patlasov
2012-11-16  9:49         ` Miklos Szeredi
2012-11-16 10:32           ` Maxim V. Patlasov
     [not found] ` <4FF3156E.8030109-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-07-03 15:55   ` [PATCH 5/10] fuse: Connection bit for enabling writeback Pavel Emelyanov
2012-07-03 15:56   ` [PATCH 7/10] fuse: Flush files on wb close Pavel Emelyanov
2012-07-03 15:56   ` [PATCH 8/10] fuse: Implement writepages and write_begin/write_end callbacks Pavel Emelyanov
2012-07-03 15:57   ` [PATCH 9/10] fuse: Turn writeback on Pavel Emelyanov
2012-07-04  3:01   ` [PATCH 0/10] fuse: An attempt to implement a write-back cache policy Nikolaus Rath
     [not found]     ` <87a9zg1b7q.fsf-sKB8Sp2ER+yL2G7IJ6k9tw@public.gmane.org>
2012-07-04  7:11       ` Pavel Emelyanov
2012-07-04 13:22         ` Nikolaus Rath
     [not found]           ` <4FF4438B.8050807-BTH8mxji4b0@public.gmane.org>
2012-07-04 14:33             ` Pavel Emelyanov
     [not found]               ` <4FF45447.5000705-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-07-04 17:08                 ` Nikolaus Rath
2012-07-05  9:01                   ` Pavel Emelyanov
2012-07-05 13:07         ` Nikolaus Rath
2012-07-05 14:08           ` Pavel Emelyanov
2012-07-05 14:29             ` Nikolaus Rath
2012-07-05 14:34               ` Pavel Emelyanov
2012-07-06  2:04                 ` Nikolaus Rath
     [not found]                   ` <8762a1odbf.fsf-sKB8Sp2ER+yL2G7IJ6k9tw@public.gmane.org>
2012-07-06  8:46                     ` Pavel Emelyanov
2012-07-05 19:31   ` Anand Avati
     [not found]     ` <4FF5EB85.1050701-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-07-05 20:07       ` Pavel Emelyanov
2012-07-06 11:52         ` [fuse-devel] " Kirill Korotaev
2012-07-03 15:57 ` [PATCH 10/10] mm: Account for WRITEBACK_TEMP in balance_dirty_pages Pavel Emelyanov
     [not found]   ` <4FF3166B.5090800-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-07-13 16:57     ` Miklos Szeredi
2012-07-16  3:27       ` Pavel Emelyanov
2012-07-17 19:11         ` Miklos Szeredi
2012-07-27  4:01           ` Pavel Emelyanov
     [not found]             ` <5012127C.8070203-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-08-07 17:30               ` Miklos Szeredi
2012-07-05 19:26 ` [fuse-devel] [PATCH 0/10] fuse: An attempt to implement a write-back cache policy Anand Avati

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50038B5C.4060507@parallels.com \
    --to=xemul@parallels.com \
    --cc=dev@parallels.com \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=jbottomley@parallels.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.