From: Jan Kara <jack@suse.cz>
To: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Jan Kara <jack@suse.cz>, Zach Brown <zab@zabbo.net>,
Thanos Makatos <thanos.makatos@citrix.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Ross Lagerwall <ross.lagerwall@citrix.com>,
Felipe Franciosi <felipe.franciosi@citrix.com>
Subject: Re: invalidate the buffer heads of a block device
Date: Wed, 1 Oct 2014 16:07:27 +0200 [thread overview]
Message-ID: <20141001140727.GE17405@quack.suse.cz> (raw)
In-Reply-To: <CAHQdGtQnjBxnKZL13xES1UhpA_3y8Px7vYJJBgfUMHRmi+tWWg@mail.gmail.com>
On Wed 01-10-14 07:50:21, Trond Myklebust wrote:
> On Wed, Oct 1, 2014 at 5:05 AM, Jan Kara <jack@suse.cz> wrote:
> > On Tue 30-09-14 17:13:19, Trond Myklebust wrote:
> >> On Tue, Sep 30, 2014 at 4:53 PM, Zach Brown <zab@zabbo.net> wrote:
> >> > On Tue, Sep 30, 2014 at 12:48:45PM +0200, Jan Kara wrote:
> >> >> On Tue 30-09-14 10:11:32, Thanos Makatos wrote:
> >> >> >
> >> >> > Regarding extending the ioctl to invalidate the page cache, do you have
> >> >> > any suggestions where I could start looking?
> >> >> You just need to call invalidate_inode_pages2(). That is going to do all
> >> >> you need.
> >> >>
> >> >> > Would such a new ioctl have any chance to be accepted upstream?
> >> >> I believe a possibility for a file to be fully flushed from page cache is
> >> >> useful at times and if you present well your usecase there are reasonable
> >> >> chances it will get accepted upstream.
> >> >
> >> > Agreed, this seems reasonable. How many times have we all dropped our
> >> > entire cache just 'cause we didn't have a more precise tool?
> >> >
> >> > $ grep -ri drop_caches xfstests/
> >> > xfstests/src/fsync-tester.c: if ((fd = open("/proc/sys/vm/drop_caches", O_WRONLY)) < 0) {
> >> > xfstests/src/stale_handle.c: system("echo 3 > /proc/sys/vm/drop_caches");
> >> > xfstests/common/quota: echo 3 > /proc/sys/vm/drop_caches
> >> >
> >> > The last one even says:
> >> >
> >> > # XXX: really need an ioctl instead of this big hammer
> >> > echo 3 > /proc/sys/vm/drop_caches
> >> >
> >> > :)
> >> >
> >>
> >> It would definitely be useful for NFS, however we'd want the option of
> >> clearing the cached metadata too (acls, mode bits, owner/group owner,
> >> etc.)
> > Hum, I can imagine you might be somehow successful in flushing ACLs but
> > how would you like to flush mode or owner? It's not like you can just
> > reload inode from disk and overwrite what you have in memory. What would
> > look sane is to push inode with all metadata it caches out of memory but
> > once somehow holds the inode (and ioctl() itself will have a file handle of
> > the inode), you just cannot. So I'm not sure if you meant something
> > different or if this was just a wish without deeper thought.
>
> You can and you _must_ reload the inode if it changes on the server.
> That's what makes distributed filesystems "interesting" as far as
> caching goes. What we do in the cases where we think the file metadata
> may have changed, is to mark the existing cached metadata as being
> stale so that we know that we need to revalidate before trying to use
> it.
Ah OK, I somehow thought about revalidation automatically happening for
any filesystem and found that impossible. I can imagine doing this
specifically for NFS which is designed to handle such things is possible :)
> So being able to tell in the ioctl() whether or not the application
> thinks the data or metadata (or both) may have changed on the remote
> disk is actually a very useful feature if you are trying to do things
> like distributed compiles.
>
> I've had an experimental NFS-only ioctl() based caching interface
> kicking around in my git repository for a couple of years now
> (http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/ioctl).
> If we are planning on doing something at the VFS level, then it would
> be nice to at least duplicate the functionality from the first 2
> patches.
Thanks for the pointer. The first two patches look useful and I think we
can design the ioctl interface to accommodate the NFS usecase (although most
filesystems will just return EOPNOTSUPP in case someone asks for metadata
invalidation).
The interface in your patches looks mostly OK, I'd just rather use 'flags'
than 'cmd'. So something like:
struct invalidate_arg {
u64 flags;
u64 start;
u64 len;
};
Where 'flags' can be:
INVAL_METADATA - invalidate inode itself, acls, etc. if fs supports it
(this can be more finegrained if that's useful. Is it?)
INVAL_DATA - invalidate data in range <start, start+len)
We can have a fs callback that gets called for this ioctl to do the work,
most filesystems would just leave it at NULL which would fall back to the
default function which just handles INVAL_DATA and returns EOPNOTSUPP if
INVAL_METADATA is set. NFS could then implement its own thing for
INVAL_METADATA and call the generic function for handling INVAL_DATA.
Thoughts?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2014-10-01 14:07 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-29 16:24 invalidate the buffer heads of a block device Thanos Makatos
2014-09-29 18:11 ` Jan Kara
2014-09-30 8:58 ` Thanos Makatos
2014-09-30 9:19 ` Jan Kara
2014-09-30 9:39 ` Thanos Makatos
2014-09-30 9:55 ` Jan Kara
2014-09-30 10:11 ` Thanos Makatos
2014-09-30 10:48 ` Jan Kara
2014-09-30 20:53 ` Zach Brown
2014-09-30 21:13 ` Trond Myklebust
2014-10-01 9:05 ` Jan Kara
2014-10-01 11:50 ` Trond Myklebust
2014-10-01 14:07 ` Jan Kara [this message]
2014-10-01 14:47 ` Trond Myklebust
2014-10-01 15:15 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141001140727.GE17405@quack.suse.cz \
--to=jack@suse.cz \
--cc=felipe.franciosi@citrix.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=ross.lagerwall@citrix.com \
--cc=thanos.makatos@citrix.com \
--cc=trond.myklebust@primarydata.com \
--cc=zab@zabbo.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).