From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Maxim V. Patlasov" Subject: Re: [PATCH 06/14] fuse: Trust kernel i_size only - v2 Date: Mon, 25 Mar 2013 16:29:49 +0400 Message-ID: <5150433D.1090408@parallels.com> References: <20130125181700.10037.29163.stgit@maximpc.sw.ru> <20130125182210.10037.59973.stgit@maximpc.sw.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: , , , , , , , To: Miklos Szeredi Return-path: In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Hi Miklos, Sorry for long delay, see please inline comments below. 01/29/2013 02:18 PM, Miklos Szeredi =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > On Fri, Jan 25, 2013 at 7:22 PM, Maxim V. Patlasov > wrote: >> Make fuse think that when writeback is on the inode's i_size is alwa= ys >> up-to-date and not update it with the value received from the usersp= ace. >> This is done because the page cache code may update i_size without l= etting >> the FS know. >> >> This assumption implies fixing the previously introduced short-read = helper -- >> when a short read occurs the 'hole' is filled with zeroes. >> >> fuse_file_fallocate() is also fixed because now we should keep i_siz= e up to >> date, so it must be updated if FUSE_FALLOCATE request succeeded. >> >> Changed in v2: >> - improved comment in fuse_short_read() >> - fixed fuse_file_fallocate() for KEEP_SIZE mode >> >> Original patch by: Pavel Emelyanov >> Signed-off-by: Maxim V. Patlasov >> --- >> fs/fuse/dir.c | 9 ++++++--- >> fs/fuse/file.c | 43 +++++++++++++++++++++++++++++++++++++++++-- >> fs/fuse/inode.c | 6 ++++-- >> 3 files changed, 51 insertions(+), 7 deletions(-) >> >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c >> index ed8f8c5..ff8b603 100644 >> --- a/fs/fuse/dir.c >> +++ b/fs/fuse/dir.c >> @@ -827,7 +827,7 @@ static void fuse_fillattr(struct inode *inode, s= truct fuse_attr *attr, >> stat->mtime.tv_nsec =3D attr->mtimensec; >> stat->ctime.tv_sec =3D attr->ctime; >> stat->ctime.tv_nsec =3D attr->ctimensec; >> - stat->size =3D attr->size; >> + stat->size =3D i_size_read(inode); > The old code is correct and you break it. fuse_fillattr() is called strictly after fuse_change_attributes(). The=20 latter usually sets local i_size with server value: i_size_write(inode,= =20 attr->size). This makes -/+ lines above equivalent. The only exception=20 is stale attributes when current fi->attr_version is greater than=20 attr_version acquired before sending FUSE_GETATTR request. My patch=20 breaks old behaviour in this special case, I'll fix it, thanks for the=20 catch. > We always use the values > returned by GETATTR, instead of the cached ones. The cached ones are > a best guess by the kernel and they may or may not have been correct > at any point in time. The attributes returned by userspace are the > authentic ones. Yes, that's correct when "write cache" is off. > For the "write cache" case what we want, I think, is a mode where the > kernel always trusts the cached attributes. The attribute cache is > initialized from values returned in LOOKUP and the kernel never needs > to call GETATTR since the attributes are always up-to-date. > > Is that correct? No, for "write cache" case the kernel always trusts cached i_size for=20 regular files. For other attributes (and for !S_ISGREG() files) the=20 kernel relies on userspace. > >> stat->blocks =3D attr->blocks; >> >> if (attr->blksize !=3D 0) >> @@ -1541,6 +1541,7 @@ static int fuse_do_setattr(struct dentry *entr= y, struct iattr *attr, >> struct fuse_setattr_in inarg; >> struct fuse_attr_out outarg; >> bool is_truncate =3D false; >> + bool is_wb =3D fc->writeback_cache; >> loff_t oldsize; >> int err; >> >> @@ -1613,7 +1614,8 @@ static int fuse_do_setattr(struct dentry *entr= y, struct iattr *attr, >> fuse_change_attributes_common(inode, &outarg.attr, >> attr_timeout(&outarg)); >> oldsize =3D inode->i_size; >> - i_size_write(inode, outarg.attr.size); >> + if (!is_wb || is_truncate || !S_ISREG(inode->i_mode)) >> + i_size_write(inode, outarg.attr.size); > Okay, I managed to understand what is going on here: if userspace is > behaving badly and is changing the size even if that was not > requested, then we silently reject that. But that's neither clearly > unrestandable (without a comment) nor sensible, I think. > > If the filesystem is behaving badly, just let it. Or is there some > other reason why we'd want this check? The change above has nothing to do with misbehaving userspace. The chec= k=20 literally means: do not trust attr.size from server if "write cache" is= =20 on and the file is regular and there was no explicit user request to=20 change size (i.e. ATTR_SIZE bit was not set in attr->ia_valid). The=20 check is necessary if the user changes some attribute (not related to=20 i_size) and at the time of processing FUSE_SETATTR server doesn't have=20 up-to-date info about the size of file. In "write cache" case this=20 situation is typical for cached writes extending file. > >> if (is_truncate) { >> /* NOTE: this may release/reacquire fc->lock */ >> @@ -1625,7 +1627,8 @@ static int fuse_do_setattr(struct dentry *entr= y, struct iattr *attr, >> * Only call invalidate_inode_pages2() after removing >> * FUSE_NOWRITE, otherwise fuse_launder_page() would deadlo= ck. >> */ >> - if (S_ISREG(inode->i_mode) && oldsize !=3D outarg.attr.size)= { >> + if ((is_truncate || !is_wb) && >> + S_ISREG(inode->i_mode) && oldsize !=3D outar= g.attr.size) { >> truncate_pagecache(inode, oldsize, outarg.attr.size= ); >> invalidate_inode_pages2(inode->i_mapping); >> } >> diff --git a/fs/fuse/file.c b/fs/fuse/file.c >> index b28be33..6b64e11 100644 >> --- a/fs/fuse/file.c >> +++ b/fs/fuse/file.c >> @@ -15,6 +15,7 @@ >> #include >> #include >> #include >> +#include >> >> static const struct file_operations fuse_direct_io_file_operations= ; >> >> @@ -543,9 +544,31 @@ static void fuse_short_read(struct fuse_req *re= q, struct inode *inode, >> u64 attr_ver) >> { >> size_t num_read =3D req->out.args[0].size; >> + struct fuse_conn *fc =3D get_fuse_conn(inode); >> + >> + if (fc->writeback_cache) { >> + /* >> + * A hole in a file. Some data after the hole are in= page cache, >> + * but have not reached the client fs yet. So, the h= ole is not >> + * present there. >> + */ >> + int i; >> + int start_idx =3D num_read >> PAGE_CACHE_SHIFT; >> + size_t off =3D num_read & (PAGE_CACHE_SIZE - 1); >> >> - loff_t pos =3D page_offset(req->pages[0]) + num_read; >> - fuse_read_update_size(inode, pos, attr_ver); >> + for (i =3D start_idx; i < req->num_pages; i++) { >> + struct page *page =3D req->pages[i]; >> + void *mapaddr =3D kmap_atomic(page); >> + >> + memset(mapaddr + off, 0, PAGE_CACHE_SIZE - o= ff); >> + >> + kunmap_atomic(mapaddr); >> + off =3D 0; >> + } >> + } else { >> + loff_t pos =3D page_offset(req->pages[0]) + num_read= ; >> + fuse_read_update_size(inode, pos, attr_ver); >> + } >> } >> >> static int fuse_readpage(struct file *file, struct page *page) >> @@ -2285,6 +2308,8 @@ static long fuse_file_fallocate(struct file *f= ile, int mode, loff_t offset, >> .mode =3D mode >> }; >> int err; >> + bool change_i_size =3D fc->writeback_cache && >> + !(mode & FALLOC_FL_KEEP_SIZE); >> >> if (fc->no_fallocate) >> return -EOPNOTSUPP; >> @@ -2293,6 +2318,11 @@ static long fuse_file_fallocate(struct file *= file, int mode, loff_t offset, >> if (IS_ERR(req)) >> return PTR_ERR(req); >> >> + if (change_i_size) { >> + struct inode *inode =3D file->f_mapping->host; >> + mutex_lock(&inode->i_mutex); >> + } >> + >> req->in.h.opcode =3D FUSE_FALLOCATE; >> req->in.h.nodeid =3D ff->nodeid; >> req->in.numargs =3D 1; >> @@ -2306,6 +2336,15 @@ static long fuse_file_fallocate(struct file *= file, int mode, loff_t offset, >> } >> fuse_put_request(fc, req); >> >> + if (change_i_size) { >> + struct inode *inode =3D file->f_mapping->host; >> + >> + if (!err) >> + fuse_write_update_size(inode, offset + lengt= h); >> + >> + mutex_unlock(&inode->i_mutex); >> + } >> + >> return err; >> } >> >> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c >> index 9d95a5a..7e07dbd 100644 >> --- a/fs/fuse/inode.c >> +++ b/fs/fuse/inode.c >> @@ -196,6 +196,7 @@ void fuse_change_attributes(struct inode *inode,= struct fuse_attr *attr, >> { >> struct fuse_conn *fc =3D get_fuse_conn(inode); >> struct fuse_inode *fi =3D get_fuse_inode(inode); >> + bool is_wb =3D fc->writeback_cache; >> loff_t oldsize; >> struct timespec old_mtime; >> >> @@ -209,10 +210,11 @@ void fuse_change_attributes(struct inode *inod= e, struct fuse_attr *attr, >> fuse_change_attributes_common(inode, attr, attr_valid); >> >> oldsize =3D inode->i_size; >> - i_size_write(inode, attr->size); >> + if (!is_wb || !S_ISREG(inode->i_mode)) >> + i_size_write(inode, attr->size); > Same as the previous comment. I think you simply need to omit these > checks. But if something *is* needed to special case the write cache > case, then it needs some comments to explain what and why. Are you OK about a comment like this: /* * In case of writeback_cache enabled, the cached writes beyond EOF * extend local i_size without keeping userspace server in sync. So, * attr->size coming from server can be stale. We cannot trust it. */ Thanks, Maxim > >> spin_unlock(&fc->lock); >> >> - if (S_ISREG(inode->i_mode)) { >> + if (!is_wb && S_ISREG(inode->i_mode)) { >> bool inval =3D false; >> >> if (oldsize !=3D attr->size) { >>