From: Jan Kara <jack@suse.cz>
To: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Jan Kara <jack@suse.cz>, Zach Brown <zab@zabbo.net>,
Thanos Makatos <thanos.makatos@citrix.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Ross Lagerwall <ross.lagerwall@citrix.com>,
Felipe Franciosi <felipe.franciosi@citrix.com>
Subject: Re: invalidate the buffer heads of a block device
Date: Wed, 1 Oct 2014 17:15:59 +0200 [thread overview]
Message-ID: <20141001151559.GG17405@quack.suse.cz> (raw)
In-Reply-To: <CAHQdGtRe_opcKe-oVxzCPgWU+_DMRYsW4VLvWEpD7QJLT4vq9w@mail.gmail.com>
On Wed 01-10-14 10:47:05, Trond Myklebust wrote:
> On Wed, Oct 1, 2014 at 10:07 AM, Jan Kara <jack@suse.cz> wrote:
> > On Wed 01-10-14 07:50:21, Trond Myklebust wrote:
> >> On Wed, Oct 1, 2014 at 5:05 AM, Jan Kara <jack@suse.cz> wrote:
> >> > On Tue 30-09-14 17:13:19, Trond Myklebust wrote:
> >> >> On Tue, Sep 30, 2014 at 4:53 PM, Zach Brown <zab@zabbo.net> wrote:
> >> >> > On Tue, Sep 30, 2014 at 12:48:45PM +0200, Jan Kara wrote:
> >> >> >> On Tue 30-09-14 10:11:32, Thanos Makatos wrote:
> >> >> >> >
> >> >> >> > Regarding extending the ioctl to invalidate the page cache, do you have
> >> >> >> > any suggestions where I could start looking?
> >> >> >> You just need to call invalidate_inode_pages2(). That is going to do all
> >> >> >> you need.
> >> >> >>
> >> >> >> > Would such a new ioctl have any chance to be accepted upstream?
> >> >> >> I believe a possibility for a file to be fully flushed from page cache is
> >> >> >> useful at times and if you present well your usecase there are reasonable
> >> >> >> chances it will get accepted upstream.
> >> >> >
> >> >> > Agreed, this seems reasonable. How many times have we all dropped our
> >> >> > entire cache just 'cause we didn't have a more precise tool?
> >> >> >
> >> >> > $ grep -ri drop_caches xfstests/
> >> >> > xfstests/src/fsync-tester.c: if ((fd = open("/proc/sys/vm/drop_caches", O_WRONLY)) < 0) {
> >> >> > xfstests/src/stale_handle.c: system("echo 3 > /proc/sys/vm/drop_caches");
> >> >> > xfstests/common/quota: echo 3 > /proc/sys/vm/drop_caches
> >> >> >
> >> >> > The last one even says:
> >> >> >
> >> >> > # XXX: really need an ioctl instead of this big hammer
> >> >> > echo 3 > /proc/sys/vm/drop_caches
> >> >> >
> >> >> > :)
> >> >> >
> >> >>
> >> >> It would definitely be useful for NFS, however we'd want the option of
> >> >> clearing the cached metadata too (acls, mode bits, owner/group owner,
> >> >> etc.)
> >> > Hum, I can imagine you might be somehow successful in flushing ACLs but
> >> > how would you like to flush mode or owner? It's not like you can just
> >> > reload inode from disk and overwrite what you have in memory. What would
> >> > look sane is to push inode with all metadata it caches out of memory but
> >> > once somehow holds the inode (and ioctl() itself will have a file handle of
> >> > the inode), you just cannot. So I'm not sure if you meant something
> >> > different or if this was just a wish without deeper thought.
> >>
> >> You can and you _must_ reload the inode if it changes on the server.
> >> That's what makes distributed filesystems "interesting" as far as
> >> caching goes. What we do in the cases where we think the file metadata
> >> may have changed, is to mark the existing cached metadata as being
> >> stale so that we know that we need to revalidate before trying to use
> >> it.
> > Ah OK, I somehow thought about revalidation automatically happening for
> > any filesystem and found that impossible. I can imagine doing this
> > specifically for NFS which is designed to handle such things is possible :)
> >
> >> So being able to tell in the ioctl() whether or not the application
> >> thinks the data or metadata (or both) may have changed on the remote
> >> disk is actually a very useful feature if you are trying to do things
> >> like distributed compiles.
> >>
> >> I've had an experimental NFS-only ioctl() based caching interface
> >> kicking around in my git repository for a couple of years now
> >> (http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/ioctl).
> >> If we are planning on doing something at the VFS level, then it would
> >> be nice to at least duplicate the functionality from the first 2
> >> patches.
> > Thanks for the pointer. The first two patches look useful and I think we
> > can design the ioctl interface to accommodate the NFS usecase (although most
> > filesystems will just return EOPNOTSUPP in case someone asks for metadata
> > invalidation).
> >
> > The interface in your patches looks mostly OK, I'd just rather use 'flags'
> > than 'cmd'. So something like:
> > struct invalidate_arg {
> > u64 flags;
> > u64 start;
> > u64 len;
> > };
> >
> > Where 'flags' can be:
> > INVAL_METADATA - invalidate inode itself, acls, etc. if fs supports it
> > (this can be more finegrained if that's useful. Is it?)
> > INVAL_DATA - invalidate data in range <start, start+len)
>
> If these are flags rather than a command, then is the intention that
> they can be ORed together so that you can force a metadata+data
> revalidate in a single function call?
Yes, that is my intention.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
prev parent reply other threads:[~2014-10-01 15:16 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-29 16:24 invalidate the buffer heads of a block device Thanos Makatos
2014-09-29 18:11 ` Jan Kara
2014-09-30 8:58 ` Thanos Makatos
2014-09-30 9:19 ` Jan Kara
2014-09-30 9:39 ` Thanos Makatos
2014-09-30 9:55 ` Jan Kara
2014-09-30 10:11 ` Thanos Makatos
2014-09-30 10:48 ` Jan Kara
2014-09-30 20:53 ` Zach Brown
2014-09-30 21:13 ` Trond Myklebust
2014-10-01 9:05 ` Jan Kara
2014-10-01 11:50 ` Trond Myklebust
2014-10-01 14:07 ` Jan Kara
2014-10-01 14:47 ` Trond Myklebust
2014-10-01 15:15 ` Jan Kara [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141001151559.GG17405@quack.suse.cz \
--to=jack@suse.cz \
--cc=felipe.franciosi@citrix.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=ross.lagerwall@citrix.com \
--cc=thanos.makatos@citrix.com \
--cc=trond.myklebust@primarydata.com \
--cc=zab@zabbo.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).