From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pavel Emelyanov Subject: Re: [PATCH 6/10] fuse: Trust kernel i_size only Date: Mon, 16 Jul 2012 07:32:44 +0400 Message-ID: <50038B5C.4060507@parallels.com> References: <4FF3156E.8030109@parallels.com> <4FF3160B.6090501@parallels.com> <8762a3pp3m.fsf@tucsk.pomaz.szeredi.hu> <4FFBC370.7070807@parallels.com> <87sjcv39pn.fsf@tucsk.pomaz.szeredi.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: "fuse-devel@lists.sourceforge.net" , Alexander Viro , linux-fsdevel , James Bottomley , Kirill Korotaev To: Miklos Szeredi Return-path: Received: from mailhub.sw.ru ([195.214.232.25]:12599 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751723Ab2GPDc5 (ORCPT ); Sun, 15 Jul 2012 23:32:57 -0400 In-Reply-To: <87sjcv39pn.fsf@tucsk.pomaz.szeredi.hu> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 07/13/2012 08:30 PM, Miklos Szeredi wrote: > Pavel Emelyanov writes: > >> On 07/04/2012 06:39 PM, Miklos Szeredi wrote: >>> Pavel Emelyanov writes: >>> >>>> Make fuse think that when writeback is on the inode's i_size is alway >>>> up-to-date and not update it with the value received from the >>>> userspace. This is done because the page cache code may update i_size >>>> without letting the FS know. >>> >>> Similar rule applies to i_mtime. Except it's even more tricky, since >>> you have to flush the mtime together with the data (the flush should not >>> update the modification time). And we have other operations which also >>> change the mtime, and those also need to be updated. >>> >>> We should probably look at what NFS is doing. >> >> Miklos, >> >> I've looked at how NFS and FUSE manage the i_mtime. >> >> The NFS (if not looking at the fscache) tries to keep the i_mtime at the >> maximum between the local and the remote values. This looks like correct >> behavior for FUSE too. >> >> But, looking at the FUSE code I see that the existing attr_version checks >> do implement the same approach even if we turn the writeback cache (this >> series) ON. > > It's not as simple as that. > > First off, fuse sets S_NOCMTIME which means the kernel won't update > i_mtime on writes. Which is fine for write-through, since the time > update is always the responsibility of the userspace filesystem. Oops, I've missed that fact :( I wonder why writes did update the mtime in my tests then... > If we are doing buffered writes, then the kernel must update i_mtime on > write(2) and must flush that to the userspace filesystem at some point > (with a SETTATTR operation). Agree. > The attr_version thing is about making sure that we don't update the > kernel's cached attribute with stale data from the userspace filesystem. > E.g. if a GETATTR races with a WRITE then by the time the GETATTR > finishes write may have already updated i_size despite the fact that > GETATTR is returning i_size before the write. In that case we don't > store the i_size we believe is stale but we do use it the stat(2) reply, > since it came from the userspace filesystem, which is the authoritative > source of information. > > But that means that if we want stat(2) to work correctly, then we must > flush the updated i_mtime to userspace before we do the GETATTR. Why? Isn't it enough just to look at the per-inode flag (you're talking below) and take either cached i_mtime value or the user-space verion of it? > So basically what we need is a per-inode flag that says that i_mtime has > been updated (it is more recent then what userspace has) and we must > update i_mtime *only* in write and not other operations which still do > the mtime update in the userspace filesystem. Any operation that > modifies i_mtime (and hence invalidate the attributes) must clear the > flag. Any other operation which updates or invalidates the attributes > must first flush the i_mtime to userspace if the flag is set. > > In addition the userspace fileystem need to implement the policy similar > to NFS, in which it only updates mtime if it is greater than the current > one. This means that we must differentiate between an mtime update due > to a buffered write from an mtime update due to an utime (and friends) > system call. > > I think that would work, but it's a tricky thing and needs to be thought > trough. > > Thanks, > Miklos > . >