From: Maxim Patlasov <mpatlasov@parallels.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: <dev@parallels.com>, <xemul@parallels.com>,
<fuse-devel@lists.sourceforge.net>, <bfoster@redhat.com>,
<linux-kernel@vger.kernel.org>, <jbottomley@parallels.com>,
<linux-fsdevel@vger.kernel.org>, <akpm@linux-foundation.org>,
<fengguang.wu@intel.com>, <devel@openvz.org>
Subject: Re: [PATCH 05/11] fuse: Trust kernel i_mtime only -v2
Date: Thu, 26 Dec 2013 19:41:41 +0400 [thread overview]
Message-ID: <52BC4E35.8040503@parallels.com> (raw)
In-Reply-To: <20131112165251.GA10813@tucsk.piliscsaba.szeredi.hu>
Hi Miklos,
Sorry for delay, see please inline comments below.
On 11/12/2013 08:52 PM, Miklos Szeredi wrote:
> On Thu, Oct 10, 2013 at 05:10:56PM +0400, Maxim Patlasov wrote:
>> Let the kernel maintain i_mtime locally:
>> - clear S_NOCMTIME
>> - implement i_op->update_time()
>> - flush mtime on fsync and last close
>> - update i_mtime explicitly on truncate and fallocate
>>
>> Fuse inode flag FUSE_I_MTIME_DIRTY serves as indication that local i_mtime
>> should be flushed to the server eventually.
>>
>> Changed in v2 (thanks to Brian):
>> - renamed FUSE_I_MTIME_UPDATED to FUSE_I_MTIME_DIRTY
>> - simplified fuse_set_mtime_local()
>> - abandoned optimizations: clearing the flag on some operations (like
>> direct write) is doable, but may lead to unexpected changes of
>> user-visible mtime.
>>
>> Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
>> ---
>> fs/fuse/dir.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++++------
>> fs/fuse/file.c | 30 +++++++++++++--
>> fs/fuse/fuse_i.h | 6 ++-
>> fs/fuse/inode.c | 13 +++++-
>> 4 files changed, 138 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>> index f022968..eda248b 100644
>> --- a/fs/fuse/dir.c
>> +++ b/fs/fuse/dir.c
>> @@ -857,8 +857,11 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
>> struct fuse_conn *fc = get_fuse_conn(inode);
>>
>> /* see the comment in fuse_change_attributes() */
>> - if (fc->writeback_cache && S_ISREG(inode->i_mode))
>> + if (fc->writeback_cache && S_ISREG(inode->i_mode)) {
>> attr->size = i_size_read(inode);
>> + attr->mtime = inode->i_mtime.tv_sec;
>> + attr->mtimensec = inode->i_mtime.tv_nsec;
>> + }
>>
>> stat->dev = inode->i_sb->s_dev;
>> stat->ino = attr->ino;
>> @@ -1582,6 +1585,82 @@ void fuse_release_nowrite(struct inode *inode)
>> spin_unlock(&fc->lock);
>> }
>>
>> +static void fuse_setattr_fill(struct fuse_conn *fc, struct fuse_req *req,
>> + struct inode *inode,
>> + struct fuse_setattr_in *inarg_p,
>> + struct fuse_attr_out *outarg_p)
>> +{
>> + req->in.h.opcode = FUSE_SETATTR;
>> + req->in.h.nodeid = get_node_id(inode);
>> + req->in.numargs = 1;
>> + req->in.args[0].size = sizeof(*inarg_p);
>> + req->in.args[0].value = inarg_p;
>> + req->out.numargs = 1;
>> + if (fc->minor < 9)
>> + req->out.args[0].size = FUSE_COMPAT_ATTR_OUT_SIZE;
>> + else
>> + req->out.args[0].size = sizeof(*outarg_p);
>> + req->out.args[0].value = outarg_p;
>> +}
>> +
>> +/*
>> + * Flush inode->i_mtime to the server
>> + */
>> +int fuse_flush_mtime(struct file *file, bool nofail)
>> +{
>> + struct inode *inode = file->f_mapping->host;
>> + struct fuse_inode *fi = get_fuse_inode(inode);
>> + struct fuse_conn *fc = get_fuse_conn(inode);
>> + struct fuse_req *req = NULL;
>> + struct fuse_setattr_in inarg;
>> + struct fuse_attr_out outarg;
>> + int err;
>> +
>> + if (nofail) {
>> + req = fuse_get_req_nofail_nopages(fc, file);
>> + } else {
>> + req = fuse_get_req_nopages(fc);
>> + if (IS_ERR(req))
>> + return PTR_ERR(req);
>> + }
>> +
>> + memset(&inarg, 0, sizeof(inarg));
>> + memset(&outarg, 0, sizeof(outarg));
>> +
>> + inarg.valid |= FATTR_MTIME;
>> + inarg.mtime = inode->i_mtime.tv_sec;
>> + inarg.mtimensec = inode->i_mtime.tv_nsec;
>> +
>> + fuse_setattr_fill(fc, req, inode, &inarg, &outarg);
>> + fuse_request_send(fc, req);
>> + err = req->out.h.error;
>> + fuse_put_request(fc, req);
>> +
>> + if (!err)
>> + clear_bit(FUSE_I_MTIME_DIRTY, &fi->state);
> Doing the test and the clear separately opens a huge race window when i_mtime
> modifications are bound to get lost.
No. Because the whole operation is protected by i_mutex (see
fuse_fsync_common()).
>
>> +
>> + return err;
>> +}
>> +
>> +/*
>> + * S_NOCMTIME is clear, so we need to update inode->i_mtime manually.
>> + */
>> +static void fuse_set_mtime_local(struct iattr *iattr, struct inode *inode)
>> +{
>> + unsigned ivalid = iattr->ia_valid;
>> + struct fuse_inode *fi = get_fuse_inode(inode);
>> +
>> + if ((ivalid & ATTR_MTIME) && (ivalid & ATTR_MTIME_SET)) {
>> + /* client fs has just set mtime to iattr->ia_mtime */
>> + inode->i_mtime = iattr->ia_mtime;
>> + clear_bit(FUSE_I_MTIME_DIRTY, &fi->state);
> This is protected by i_mutex, so it should be safe.
Yes.
>
>> + } else if ((ivalid & ATTR_MTIME) || (ivalid & ATTR_SIZE)) {
>> + /* client fs doesn't know that we're updating i_mtime */
> If so, why not tell the client fs to update mtime?
Looks reasonable. I'll resend updated patch soon.
>
>> + inode->i_mtime = current_fs_time(inode->i_sb);
>> + set_bit(FUSE_I_MTIME_DIRTY, &fi->state);
>> + }
>> +}
>> +
>> /*
>> * Set attributes, and at the same time refresh them.
>> *
>> @@ -1641,17 +1720,7 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr,
>> inarg.valid |= FATTR_LOCKOWNER;
>> inarg.lock_owner = fuse_lock_owner_id(fc, current->files);
>> }
>> - req->in.h.opcode = FUSE_SETATTR;
>> - req->in.h.nodeid = get_node_id(inode);
>> - req->in.numargs = 1;
>> - req->in.args[0].size = sizeof(inarg);
>> - req->in.args[0].value = &inarg;
>> - req->out.numargs = 1;
>> - if (fc->minor < 9)
>> - req->out.args[0].size = FUSE_COMPAT_ATTR_OUT_SIZE;
>> - else
>> - req->out.args[0].size = sizeof(outarg);
>> - req->out.args[0].value = &outarg;
>> + fuse_setattr_fill(fc, req, inode, &inarg, &outarg);
>> fuse_request_send(fc, req);
>> err = req->out.h.error;
>> fuse_put_request(fc, req);
>> @@ -1668,6 +1737,10 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr,
>> }
>>
>> spin_lock(&fc->lock);
>> + /* the kernel maintains i_mtime locally */
>> + if (fc->writeback_cache && S_ISREG(inode->i_mode))
>> + fuse_set_mtime_local(attr, inode);
>> +
>> fuse_change_attributes_common(inode, &outarg.attr,
>> attr_timeout(&outarg));
>> oldsize = inode->i_size;
>> @@ -1898,6 +1971,17 @@ static int fuse_removexattr(struct dentry *entry, const char *name)
>> return err;
>> }
>>
>> +static int fuse_update_time(struct inode *inode, struct timespec *now,
>> + int flags)
>> +{
>> + if (flags & S_MTIME) {
>> + inode->i_mtime = *now;
>> + set_bit(FUSE_I_MTIME_DIRTY, &get_fuse_inode(inode)->state);
>> + BUG_ON(!S_ISREG(inode->i_mode));
>> + }
>> + return 0;
>> +}
>> +
>> static const struct inode_operations fuse_dir_inode_operations = {
>> .lookup = fuse_lookup,
>> .mkdir = fuse_mkdir,
>> @@ -1937,6 +2021,7 @@ static const struct inode_operations fuse_common_inode_operations = {
>> .getxattr = fuse_getxattr,
>> .listxattr = fuse_listxattr,
>> .removexattr = fuse_removexattr,
>> + .update_time = fuse_update_time,
>> };
>>
>> static const struct inode_operations fuse_symlink_inode_operations = {
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index 333a764..eabe202 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -291,6 +291,9 @@ static int fuse_open(struct inode *inode, struct file *file)
>>
>> static int fuse_release(struct inode *inode, struct file *file)
>> {
>> + if (test_bit(FUSE_I_MTIME_DIRTY, &get_fuse_inode(inode)->state))
> So, why not make this test_and_clear_bit()?
To be uniform with fuse_fsync_common(). See please the next comment.
>
>
>> + fuse_flush_mtime(file, true);
>> +
>> fuse_release_common(file, FUSE_RELEASE);
>>
>> /* return value is ignored by VFS */
>> @@ -458,6 +461,12 @@ int fuse_fsync_common(struct file *file, loff_t start, loff_t end,
>>
>> fuse_sync_writes(inode);
>>
>> + if (test_bit(FUSE_I_MTIME_DIRTY, &get_fuse_inode(inode)->state)) {
> And this too.
If mtime flush fails, the bit is kept. Another way would be to use
test_and_clear_bit() as you suggested, but set the bit back in case of
failure. Technically they are equivalent (due to i_mutex protection),
but current implementation looks more natural.
Thanks,
Maxim
>
>> + int err = fuse_flush_mtime(file, false);
>> + if (err)
>> + goto out;
>> + }
>> +
>> req = fuse_get_req_nopages(fc);
>> if (IS_ERR(req)) {
>> err = PTR_ERR(req);
>> @@ -948,16 +957,21 @@ static size_t fuse_send_write(struct fuse_req *req, struct fuse_io_priv *io,
>> return req->misc.write.out.size;
>> }
>>
>> -void fuse_write_update_size(struct inode *inode, loff_t pos)
>> +bool fuse_write_update_size(struct inode *inode, loff_t pos)
>> {
>> struct fuse_conn *fc = get_fuse_conn(inode);
>> struct fuse_inode *fi = get_fuse_inode(inode);
>> + bool ret = false;
>>
>> spin_lock(&fc->lock);
>> fi->attr_version = ++fc->attr_version;
>> - if (pos > inode->i_size)
>> + if (pos > inode->i_size) {
>> i_size_write(inode, pos);
>> + ret = true;
>> + }
>> spin_unlock(&fc->lock);
>> +
>> + return ret;
>> }
>>
>> static size_t fuse_send_write_pages(struct fuse_req *req, struct file *file,
>> @@ -2862,8 +2876,16 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
>> goto out;
>>
>> /* we could have extended the file */
>> - if (!(mode & FALLOC_FL_KEEP_SIZE))
>> - fuse_write_update_size(inode, offset + length);
>> + if (!(mode & FALLOC_FL_KEEP_SIZE)) {
>> + bool changed = fuse_write_update_size(inode, offset + length);
>> +
>> + if (changed && fc->writeback_cache) {
>> + struct fuse_inode *fi = get_fuse_inode(inode);
>> +
>> + inode->i_mtime = current_fs_time(inode->i_sb);
>> + set_bit(FUSE_I_MTIME_DIRTY, &fi->state);
>> + }
>> + }
>>
>> if (mode & FALLOC_FL_PUNCH_HOLE)
>> truncate_pagecache_range(inode, offset, offset + length - 1);
>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>> index a490ba3..3028b8a 100644
>> --- a/fs/fuse/fuse_i.h
>> +++ b/fs/fuse/fuse_i.h
>> @@ -117,6 +117,8 @@ enum {
>> FUSE_I_ADVISE_RDPLUS,
>> /** An operation changing file size is in progress */
>> FUSE_I_SIZE_UNSTABLE,
>> + /** i_mtime has been updated locally; a flush to userspace needed */
>> + FUSE_I_MTIME_DIRTY,
>> };
>>
>> struct fuse_conn;
>> @@ -870,7 +872,9 @@ long fuse_ioctl_common(struct file *file, unsigned int cmd,
>> unsigned fuse_file_poll(struct file *file, poll_table *wait);
>> int fuse_dev_release(struct inode *inode, struct file *file);
>>
>> -void fuse_write_update_size(struct inode *inode, loff_t pos);
>> +bool fuse_write_update_size(struct inode *inode, loff_t pos);
>> +
>> +int fuse_flush_mtime(struct file *file, bool nofail);
>>
>> int fuse_do_setattr(struct inode *inode, struct iattr *attr,
>> struct file *file);
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index 63818ab..253701f 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -170,8 +170,11 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
>> inode->i_blocks = attr->blocks;
>> inode->i_atime.tv_sec = attr->atime;
>> inode->i_atime.tv_nsec = attr->atimensec;
>> - inode->i_mtime.tv_sec = attr->mtime;
>> - inode->i_mtime.tv_nsec = attr->mtimensec;
>> + /* mtime from server may be stale due to local buffered write */
>> + if (!fc->writeback_cache || !S_ISREG(inode->i_mode)) {
>> + inode->i_mtime.tv_sec = attr->mtime;
>> + inode->i_mtime.tv_nsec = attr->mtimensec;
>> + }
>> inode->i_ctime.tv_sec = attr->ctime;
>> inode->i_ctime.tv_nsec = attr->ctimensec;
>>
>> @@ -250,6 +253,8 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
>> {
>> inode->i_mode = attr->mode & S_IFMT;
>> inode->i_size = attr->size;
>> + inode->i_mtime.tv_sec = attr->mtime;
>> + inode->i_mtime.tv_nsec = attr->mtimensec;
>> if (S_ISREG(inode->i_mode)) {
>> fuse_init_common(inode);
>> fuse_init_file_inode(inode);
>> @@ -296,7 +301,9 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
>> return NULL;
>>
>> if ((inode->i_state & I_NEW)) {
>> - inode->i_flags |= S_NOATIME|S_NOCMTIME;
>> + inode->i_flags |= S_NOATIME;
>> + if (!fc->writeback_cache)
>> + inode->i_flags |= S_NOCMTIME;
>> inode->i_generation = generation;
>> inode->i_data.backing_dev_info = &fc->bdi;
>> fuse_init_inode(inode, attr);
>>
>
next prev parent reply other threads:[~2013-12-26 15:41 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-10 13:09 [PATCH v6 00/11] fuse: An attempt to implement a write-back cache policy Maxim Patlasov
2013-10-10 13:10 ` [PATCH 01/11] fuse: Linking file to inode helper Maxim Patlasov
2013-10-10 13:10 ` [PATCH 02/11] fuse: Prepare to handle short reads Maxim Patlasov
2013-10-10 13:10 ` [PATCH 03/11] fuse: Connection bit for enabling writeback Maxim Patlasov
2013-10-10 13:10 ` [PATCH 04/11] fuse: Trust kernel i_size only - v4 Maxim Patlasov
[not found] ` <20131010130718.10089.6736.stgit-opBMJL+S1+lZ2WjqXT1pg9yJl3JzOD/t@public.gmane.org>
2013-10-10 13:10 ` [PATCH 05/11] fuse: Trust kernel i_mtime only -v2 Maxim Patlasov
2013-10-10 13:10 ` Maxim Patlasov
2013-11-12 16:52 ` Miklos Szeredi
2013-12-26 15:41 ` Maxim Patlasov [this message]
2014-01-06 16:22 ` Miklos Szeredi
2014-01-20 11:33 ` Maxim Patlasov
2013-12-26 15:51 ` [PATCH 05/11] fuse: Trust kernel i_mtime only -v3 Maxim Patlasov
2013-10-10 13:11 ` [PATCH 06/11] fuse: Flush files on wb close -v2 Maxim Patlasov
[not found] ` <20131010131102.10089.51081.stgit-opBMJL+S1+lZ2WjqXT1pg9yJl3JzOD/t@public.gmane.org>
2013-10-10 13:19 ` [PATCH] " Maxim Patlasov
2013-10-10 13:19 ` Maxim Patlasov
2013-10-10 13:11 ` [PATCH 07/11] fuse: restructure fuse_readpage() Maxim Patlasov
2013-11-12 17:17 ` Miklos Szeredi
2013-12-20 14:54 ` Maxim Patlasov
2014-01-06 16:43 ` Miklos Szeredi
2014-01-20 11:46 ` Maxim Patlasov
2013-10-10 13:11 ` [PATCH 08/11] fuse: Implement write_begin/write_end callbacks -v2 Maxim Patlasov
2013-10-10 13:11 ` [PATCH 09/11] fuse: fuse_flush() should wait on writeback Maxim Patlasov
2013-10-10 13:12 ` [PATCH 10/11] fuse: Fix O_DIRECT operations vs cached writeback misorder - v2 Maxim Patlasov
2013-10-10 13:12 ` [PATCH 11/11] fuse: Turn writeback cache on Maxim Patlasov
[not found] ` <87li13pido.fsf@vostro.rath.org>
[not found] ` <CAJfpegvFFXPuEoXXXJCBLEQi+T3mx95g4X+MjxAbyg0rhjbfDA@mail.gmail.com>
[not found] ` <528321C3.20307@parallels.com>
[not found] ` <CAJfpegu_ScgdgHDVM_5GMQC86OFnpXLDsuArKbWP_cX-D8m8Jw@mail.gmail.com>
[not found] ` <52FA4ECD.2030608@parallels.com>
[not found] ` <52FC2DE5.9010008@rath.org>
[not found] ` <52FCB029.9040608@parallels.com>
[not found] ` <87wqgwdvi1.fsf@vostro.rath.org>
2014-02-27 15:33 ` fuse write performance Miklos Szeredi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52BC4E35.8040503@parallels.com \
--to=mpatlasov@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=bfoster@redhat.com \
--cc=dev@parallels.com \
--cc=devel@openvz.org \
--cc=fengguang.wu@intel.com \
--cc=fuse-devel@lists.sourceforge.net \
--cc=jbottomley@parallels.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.