From: Matthew Wilcox <matthew@wil.cx>
To: Rob Ross <rross@mcs.anl.gov>
Cc: Christoph Hellwig <hch@infradead.org>,
Trond Myklebust <trond.myklebust@fys.uio.no>,
Andreas Dilger <adilger@clusterfs.com>,
Sage Weil <sage@newdream.net>, Brad Boyer <flar@allandria.com>,
Anton Altaparmakov <aia21@cam.ac.uk>,
Gary Grider <ggrider@lanl.gov>,
linux-fsdevel@vger.kernel.org
Subject: Re: NFSv4/pNFS possible POSIX I/O API standards
Date: Wed, 6 Dec 2006 08:44:26 -0700 [thread overview]
Message-ID: <20061206154426.GU3013@parisc-linux.org> (raw)
In-Reply-To: <4576DBE0.9090305@mcs.anl.gov>
On Wed, Dec 06, 2006 at 09:04:00AM -0600, Rob Ross wrote:
> The openg() solution has the following advantages to what you propose.
> First, it places the burden of the communication of the file handle on
> the application process, not the file system. That means less work for
> the file system. Second, it does not require that clients respond to
> unexpected network traffic. Third, the network traffic is deterministic
> -- one client interacts with the file system and then explicitly
> performs the broadcast. Fourth, it does not require that the file system
> store additional state on clients.
You didn't address the disadvantages I pointed out on December 1st in a
mail to the posix mailing list:
: I now understand this not so much as a replacement for dup() but in
: terms of being able to open by NFS filehandle, or inode number. The
: fh_t is presumably generated by the underlying cluster filesystem, and
: is a handle that has meaning on all nodes that are members of the
: cluster.
:
: I think we need to consider security issues (that have also come up
: when open-by-inode-number was proposed). For example, how long is the
: fh_t intended to be valid for? Forever? Until the cluster is rebooted?
: Could the fh_t be used by any user, or only those with credentials to
: access the file? What happens if we revoke() the original fd?
:
: I'm a little concerned about the generation of a suitable fh_t.
: In the implementation of sutoc(), how does the kernel know which
: filesystem to ask to translate it? It's not impossible (though it is
: implausible) that an fh_t could be meaningful to more than one
: filesystem.
:
: One possibility of fixing this could be to use a magic number at the
: beginning of the fh_t to distinguish which filesystem this belongs
: to (a list of currently-used magic numbers in Linux can be found at
: http://git.parisc-linux.org/?p=linux-2.6.git;a=blob;f=include/linux/magic.h)
Christoph has also touched on some of these points, and added some I
missed.
> In the O_CLUSTER_WIDE approach, a naive implementation (everyone passing
> the flag) would likely cause a storm of network traffic if clients were
> closely synchronized (which they are likely to be).
I think you're referring to a naive application, rather than a naive
cluster filesystem, right? There's several ways to fix that problem,
including throttling broadcasts of information, having nodes ask their
immediate neighbours if they have a cache of the information, and having
the server not respond (wait for a retransmit) if it's recently sent out
a broadcast.
> However, the application change issue is actually moot; we will make
> whatever changes inside our MPI-IO implementation, and many users will
> get the benefits for free.
That's good.
> The readdirplus(), readx()/writex(), and openg()/openfh() were all
> designed to allow our applications to explain exactly what they wanted
> and to allow for explicit communication. I understand that there is a
> tendency toward solutions where the FS guesses what the app is going to
> do or is passed a hint (e.g. fadvise) about what is going to happen,
> because these things don't require interface changes. But these
> solutions just aren't as effective as actually spelling out what the
> application wants.
Sure, but I think you're emphasising "these interfaces let us get our
job done" over the legitimate concerns that we have. I haven't really
looked at the readdirplus() or readx()/writex() interfaces, but the
security problems with openg() makes me think you haven't really looked
at it from the "what could go wrong" perspective. I'd be interested in
reviewing the readx()/writex() interfaces, but still don't see a document
for them anywhere.
next prev parent reply other threads:[~2006-12-06 15:44 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-28 4:34 NFSv4/pNFS possible POSIX I/O API standards Gary Grider
2006-11-28 5:54 ` Christoph Hellwig
2006-11-28 10:54 ` Andreas Dilger
2006-11-28 11:28 ` Anton Altaparmakov
2006-11-28 20:17 ` Russell Cattelan
2006-11-28 23:28 ` Wendy Cheng
2006-11-29 9:12 ` Christoph Hellwig
2006-11-29 9:04 ` Christoph Hellwig
2006-11-29 9:14 ` Christoph Hellwig
2006-11-29 9:48 ` Andreas Dilger
2006-11-29 10:18 ` Anton Altaparmakov
2006-11-29 8:26 ` Brad Boyer
2006-11-30 9:25 ` Christoph Hellwig
2006-11-30 17:49 ` Sage Weil
2006-12-01 5:26 ` Trond Myklebust
2006-12-01 7:08 ` Sage Weil
2006-12-01 14:41 ` Trond Myklebust
2006-12-01 16:47 ` Sage Weil
2006-12-01 18:07 ` Trond Myklebust
2006-12-01 18:42 ` Sage Weil
2006-12-01 19:13 ` Trond Myklebust
2006-12-01 20:32 ` Sage Weil
2006-12-04 18:02 ` Peter Staubach
2006-12-05 23:20 ` readdirplus() as possible POSIX I/O API Sage Weil
2006-12-06 15:48 ` Peter Staubach
2006-12-03 1:57 ` NFSv4/pNFS possible POSIX I/O API standards Andreas Dilger
2006-12-03 7:34 ` Kari Hurtta
2006-12-03 1:52 ` Andreas Dilger
2006-12-03 16:10 ` Sage Weil
2006-12-04 7:32 ` Andreas Dilger
2006-12-04 15:15 ` Trond Myklebust
2006-12-05 0:59 ` Rob Ross
2006-12-05 4:44 ` Gary Grider
2006-12-05 10:05 ` Christoph Hellwig
2006-12-05 5:56 ` Trond Myklebust
2006-12-05 10:07 ` Christoph Hellwig
2006-12-05 14:20 ` Matthew Wilcox
2006-12-06 15:04 ` Rob Ross
2006-12-06 15:44 ` Matthew Wilcox [this message]
2006-12-06 16:15 ` Rob Ross
2006-12-05 14:55 ` Trond Myklebust
2006-12-05 22:11 ` Rob Ross
2006-12-05 23:24 ` Trond Myklebust
2006-12-06 16:42 ` Rob Ross
2006-12-06 12:22 ` Ragnar Kjørstad
2006-12-06 15:14 ` Trond Myklebust
2006-12-05 16:55 ` Latchesar Ionkov
2006-12-05 22:12 ` Christoph Hellwig
2006-12-06 23:12 ` Latchesar Ionkov
2006-12-06 23:33 ` Trond Myklebust
2006-12-05 21:50 ` Rob Ross
2006-12-05 22:05 ` Christoph Hellwig
2006-12-05 23:18 ` Sage Weil
2006-12-05 23:55 ` Ulrich Drepper
2006-12-06 10:06 ` Andreas Dilger
2006-12-06 17:19 ` Ulrich Drepper
2006-12-06 17:27 ` Rob Ross
2006-12-06 17:42 ` Ulrich Drepper
2006-12-06 18:01 ` Ragnar Kjørstad
2006-12-06 18:13 ` Ulrich Drepper
2006-12-17 14:41 ` Ragnar Kjørstad
2006-12-17 19:07 ` Ulrich Drepper
2006-12-17 19:38 ` Matthew Wilcox
2006-12-17 21:51 ` Ulrich Drepper
2006-12-18 2:57 ` Ragnar Kjørstad
2006-12-18 3:54 ` Gary Grider
2006-12-07 5:57 ` Andreas Dilger
2006-12-15 22:37 ` Ulrich Drepper
2006-12-16 18:13 ` Andreas Dilger
2006-12-16 19:08 ` Ulrich Drepper
2006-12-14 23:58 ` statlite() Rob Ross
2006-12-07 23:39 ` NFSv4/pNFS possible POSIX I/O API standards Nikita Danilov
2006-12-05 14:37 ` Peter Staubach
2006-12-05 10:26 ` readdirplus() as possible POSIX I/O API Andreas Dilger
2006-12-05 15:23 ` Trond Myklebust
2006-12-06 10:28 ` Andreas Dilger
2006-12-06 15:10 ` Trond Myklebust
2006-12-05 17:06 ` Latchesar Ionkov
2006-12-05 22:48 ` Rob Ross
2006-11-29 10:25 ` NFSv4/pNFS possible POSIX I/O API standards Steven Whitehouse
2006-11-30 12:29 ` Christoph Hellwig
2006-12-01 15:52 ` Ric Wheeler
2006-11-29 12:23 ` Matthew Wilcox
2006-11-29 12:35 ` Matthew Wilcox
2006-11-29 16:26 ` Gary Grider
2006-11-29 17:18 ` Christoph Hellwig
2006-11-29 12:39 ` Christoph Hellwig
2006-12-01 22:29 ` Rob Ross
2006-12-02 2:35 ` Latchesar Ionkov
2006-12-05 0:37 ` Rob Ross
2006-12-05 10:02 ` Christoph Hellwig
2006-12-05 16:47 ` Latchesar Ionkov
2006-12-05 17:01 ` Matthew Wilcox
[not found] ` <f158dc670612050909m366594c5ubaa87d9a9ecc8c2a@mail.gmail.com>
2006-12-05 17:10 ` Latchesar Ionkov
2006-12-05 17:39 ` Matthew Wilcox
2006-12-05 21:55 ` Rob Ross
2006-12-05 21:50 ` Peter Staubach
2006-12-05 21:44 ` Rob Ross
2006-12-06 11:01 ` openg Christoph Hellwig
2006-12-06 15:41 ` openg Trond Myklebust
2006-12-06 15:42 ` openg Rob Ross
2006-12-06 23:32 ` openg Christoph Hellwig
2006-12-14 23:36 ` openg Rob Ross
2006-12-06 23:25 ` Re: NFSv4/pNFS possible POSIX I/O API standards Latchesar Ionkov
2006-12-06 9:48 ` David Chinner
2006-12-06 15:53 ` openg and path_to_handle Rob Ross
2006-12-06 16:04 ` Matthew Wilcox
2006-12-06 16:20 ` Rob Ross
2006-12-06 20:57 ` David Chinner
2006-12-06 20:40 ` David Chinner
2006-12-06 20:50 ` Matthew Wilcox
2006-12-06 21:09 ` David Chinner
2006-12-06 22:09 ` Andreas Dilger
2006-12-06 22:17 ` Matthew Wilcox
2006-12-06 22:41 ` Andreas Dilger
2006-12-06 23:39 ` Christoph Hellwig
2006-12-14 22:52 ` Rob Ross
2006-12-06 20:50 ` Rob Ross
2006-12-06 21:01 ` David Chinner
2006-12-06 23:19 ` Latchesar Ionkov
2006-12-14 21:00 ` Rob Ross
2006-12-14 21:20 ` Matthew Wilcox
2006-12-14 23:02 ` Rob Ross
2006-11-28 15:08 ` NFSv4/pNFS possible POSIX I/O API standards Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061206154426.GU3013@parisc-linux.org \
--to=matthew@wil.cx \
--cc=adilger@clusterfs.com \
--cc=aia21@cam.ac.uk \
--cc=flar@allandria.com \
--cc=ggrider@lanl.gov \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=rross@mcs.anl.gov \
--cc=sage@newdream.net \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).