From: Rob Ross <rross@mcs.anl.gov>
To: Latchesar Ionkov <lionkov@lanl.gov>
Cc: Christoph Hellwig <hch@infradead.org>,
Matthew Wilcox <matthew@wil.cx>, Gary Grider <ggrider@lanl.gov>,
linux-fsdevel@vger.kernel.org
Subject: Re: NFSv4/pNFS possible POSIX I/O API standards
Date: Tue, 05 Dec 2006 15:44:31 -0600 [thread overview]
Message-ID: <4575E83F.8090501@mcs.anl.gov> (raw)
In-Reply-To: <f158dc670612050847y197c232bp6f967934c5769bf3@mail.gmail.com>
Latchesar Ionkov wrote:
> On 12/5/06, Rob Ross <rross@mcs.anl.gov> wrote:
>>
>> I agree that it is not feasible to add new system calls every time
>> somebody has a problem, and we don't take adding system calls lightly.
>> However, in this case we're talking about an entire *community* of
>> people (high-end computing), not just one or two people. Of course it
>> may still be the case that that community is not important enough to
>> justify the addition of system calls; that's obviously not my call to
>> make!
>
> I have the feeling that openg stuff is rushed without looking into all
> solutions, that don't require changes to the current interface. I
> don't see any numbers showing where exactly the time is spent? Is
> opening too slow because of the number of requests that the file
> server suddently has to respond to? Does having an operation that
> looks up multiple names instead of a single name good enough? How much
> time is spent on opening the file once you have resolved the name?
Thanks for looking at the graph.
To clarify the workload, we do not expect that application processes
will be opening a large number of files all at once; that was just how
the test was run to get a reasonable average value. So I don't think
that something that looked up multiple file names would help for this case.
I unfortunately don't have data to show exactly where the time was
spent, but it's a good guess that it is all the network traffic in the
open() case.
>> I'm sure that you meant more than just to rename openg() to lookup(),
>> but I don't understand what you are proposing. We still need a second
>> call to take the results of the lookup (by whatever name) and convert
>> that into a file descriptor. That's all the openfh() (previously named
>> sutoc()) is for.
>
> The idea is that lookup doesn't open the file, just does to name
> resolution. The actual opening is done by openfh (or whatever you call
> it next :). I don't think it is a good idea to introduce another way
> of addressing files on the file system at all, but if you still decide
> to do it, it makes more sense to separate the name resolution from the
> operations (at the moment only open operation, but who knows what'll
> somebody think of next;) you want to do on the file.
I really think that we're saying the same thing here?
I think of the open() call as doing two (maybe three) things. First,
performs name resolution and permission checking. Second, creates the
file descriptor that allows the user process to do subsequent I/O.
Third, creates a context for access, if the FS keeps track of "open"
files (not all do).
The openg() really just does the lookup and permission checking). The
openfh() creates the file descriptor and starts that context if the
particular FS tracks that sort of thing.
>> I think the subject line might be a little misleading; we're not just
>> talking about NFS here. There are a number of different file systems
>> that might benefit from these enhancements (e.g. GPFS, Lustre, PVFS,
>> PanFS, etc.).
>
> I think that the main problem is that all these file systems resove a
> path name, one directory at a time bringing the server to its knees by
> the huge amount of requests. I would like to see what the performance
> is if you a) cache the last few hundred lookups on the server side,
> and b) modify VFS and the file systems to support multi-name lookups.
> Just assume for a moment that there is no any way to get these new
> operations in (which is probaly going to be true anyway :). What other
> solutions can you think of? :)
Well you've caught me. I don't want to cache the values, because I
fundamentally believe that sharing state between clients and servers is
braindead (to use Christoph's phrase) in systems of this scale
(thousands to tens of thousands of clients). So I don't want locks, so I
can't keep the cache consistent, ... So someone else will have to run
the tests you propose :)...
Also, to address Christoph's snipe while we're here; I don't care one
way or another whether the Linux community wants to help GPFS or not. I
do care that I'm arguing for something that is useful to more than just
my own pet project, and that was the point that I was trying to make.
I'll be sure not to mention GPFS again.
What's the etiquette on changing subject lines here? It might be useful
to separate the openg() etc. discussion from the readdirplus() etc.
discussion.
Thanks again for the comments,
Rob
next prev parent reply other threads:[~2006-12-05 21:45 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-28 4:34 NFSv4/pNFS possible POSIX I/O API standards Gary Grider
2006-11-28 5:54 ` Christoph Hellwig
2006-11-28 10:54 ` Andreas Dilger
2006-11-28 11:28 ` Anton Altaparmakov
2006-11-28 20:17 ` Russell Cattelan
2006-11-28 23:28 ` Wendy Cheng
2006-11-29 9:12 ` Christoph Hellwig
2006-11-29 9:04 ` Christoph Hellwig
2006-11-29 9:14 ` Christoph Hellwig
2006-11-29 9:48 ` Andreas Dilger
2006-11-29 10:18 ` Anton Altaparmakov
2006-11-29 8:26 ` Brad Boyer
2006-11-30 9:25 ` Christoph Hellwig
2006-11-30 17:49 ` Sage Weil
2006-12-01 5:26 ` Trond Myklebust
2006-12-01 7:08 ` Sage Weil
2006-12-01 14:41 ` Trond Myklebust
2006-12-01 16:47 ` Sage Weil
2006-12-01 18:07 ` Trond Myklebust
2006-12-01 18:42 ` Sage Weil
2006-12-01 19:13 ` Trond Myklebust
2006-12-01 20:32 ` Sage Weil
2006-12-04 18:02 ` Peter Staubach
2006-12-05 23:20 ` readdirplus() as possible POSIX I/O API Sage Weil
2006-12-06 15:48 ` Peter Staubach
2006-12-03 1:57 ` NFSv4/pNFS possible POSIX I/O API standards Andreas Dilger
2006-12-03 7:34 ` Kari Hurtta
2006-12-03 1:52 ` Andreas Dilger
2006-12-03 16:10 ` Sage Weil
2006-12-04 7:32 ` Andreas Dilger
2006-12-04 15:15 ` Trond Myklebust
2006-12-05 0:59 ` Rob Ross
2006-12-05 4:44 ` Gary Grider
2006-12-05 10:05 ` Christoph Hellwig
2006-12-05 5:56 ` Trond Myklebust
2006-12-05 10:07 ` Christoph Hellwig
2006-12-05 14:20 ` Matthew Wilcox
2006-12-06 15:04 ` Rob Ross
2006-12-06 15:44 ` Matthew Wilcox
2006-12-06 16:15 ` Rob Ross
2006-12-05 14:55 ` Trond Myklebust
2006-12-05 22:11 ` Rob Ross
2006-12-05 23:24 ` Trond Myklebust
2006-12-06 16:42 ` Rob Ross
2006-12-06 12:22 ` Ragnar Kjørstad
2006-12-06 15:14 ` Trond Myklebust
2006-12-05 16:55 ` Latchesar Ionkov
2006-12-05 22:12 ` Christoph Hellwig
2006-12-06 23:12 ` Latchesar Ionkov
2006-12-06 23:33 ` Trond Myklebust
2006-12-05 21:50 ` Rob Ross
2006-12-05 22:05 ` Christoph Hellwig
2006-12-05 23:18 ` Sage Weil
2006-12-05 23:55 ` Ulrich Drepper
2006-12-06 10:06 ` Andreas Dilger
2006-12-06 17:19 ` Ulrich Drepper
2006-12-06 17:27 ` Rob Ross
2006-12-06 17:42 ` Ulrich Drepper
2006-12-06 18:01 ` Ragnar Kjørstad
2006-12-06 18:13 ` Ulrich Drepper
2006-12-17 14:41 ` Ragnar Kjørstad
2006-12-17 19:07 ` Ulrich Drepper
2006-12-17 19:38 ` Matthew Wilcox
2006-12-17 21:51 ` Ulrich Drepper
2006-12-18 2:57 ` Ragnar Kjørstad
2006-12-18 3:54 ` Gary Grider
2006-12-07 5:57 ` Andreas Dilger
2006-12-15 22:37 ` Ulrich Drepper
2006-12-16 18:13 ` Andreas Dilger
2006-12-16 19:08 ` Ulrich Drepper
2006-12-14 23:58 ` statlite() Rob Ross
2006-12-07 23:39 ` NFSv4/pNFS possible POSIX I/O API standards Nikita Danilov
2006-12-05 14:37 ` Peter Staubach
2006-12-05 10:26 ` readdirplus() as possible POSIX I/O API Andreas Dilger
2006-12-05 15:23 ` Trond Myklebust
2006-12-06 10:28 ` Andreas Dilger
2006-12-06 15:10 ` Trond Myklebust
2006-12-05 17:06 ` Latchesar Ionkov
2006-12-05 22:48 ` Rob Ross
2006-11-29 10:25 ` NFSv4/pNFS possible POSIX I/O API standards Steven Whitehouse
2006-11-30 12:29 ` Christoph Hellwig
2006-12-01 15:52 ` Ric Wheeler
2006-11-29 12:23 ` Matthew Wilcox
2006-11-29 12:35 ` Matthew Wilcox
2006-11-29 16:26 ` Gary Grider
2006-11-29 17:18 ` Christoph Hellwig
2006-11-29 12:39 ` Christoph Hellwig
2006-12-01 22:29 ` Rob Ross
2006-12-02 2:35 ` Latchesar Ionkov
2006-12-05 0:37 ` Rob Ross
2006-12-05 10:02 ` Christoph Hellwig
2006-12-05 16:47 ` Latchesar Ionkov
2006-12-05 17:01 ` Matthew Wilcox
[not found] ` <f158dc670612050909m366594c5ubaa87d9a9ecc8c2a@mail.gmail.com>
2006-12-05 17:10 ` Latchesar Ionkov
2006-12-05 17:39 ` Matthew Wilcox
2006-12-05 21:55 ` Rob Ross
2006-12-05 21:50 ` Peter Staubach
2006-12-05 21:44 ` Rob Ross [this message]
2006-12-06 11:01 ` openg Christoph Hellwig
2006-12-06 15:41 ` openg Trond Myklebust
2006-12-06 15:42 ` openg Rob Ross
2006-12-06 23:32 ` openg Christoph Hellwig
2006-12-14 23:36 ` openg Rob Ross
2006-12-06 23:25 ` Re: NFSv4/pNFS possible POSIX I/O API standards Latchesar Ionkov
2006-12-06 9:48 ` David Chinner
2006-12-06 15:53 ` openg and path_to_handle Rob Ross
2006-12-06 16:04 ` Matthew Wilcox
2006-12-06 16:20 ` Rob Ross
2006-12-06 20:57 ` David Chinner
2006-12-06 20:40 ` David Chinner
2006-12-06 20:50 ` Matthew Wilcox
2006-12-06 21:09 ` David Chinner
2006-12-06 22:09 ` Andreas Dilger
2006-12-06 22:17 ` Matthew Wilcox
2006-12-06 22:41 ` Andreas Dilger
2006-12-06 23:39 ` Christoph Hellwig
2006-12-14 22:52 ` Rob Ross
2006-12-06 20:50 ` Rob Ross
2006-12-06 21:01 ` David Chinner
2006-12-06 23:19 ` Latchesar Ionkov
2006-12-14 21:00 ` Rob Ross
2006-12-14 21:20 ` Matthew Wilcox
2006-12-14 23:02 ` Rob Ross
2006-11-28 15:08 ` NFSv4/pNFS possible POSIX I/O API standards Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4575E83F.8090501@mcs.anl.gov \
--to=rross@mcs.anl.gov \
--cc=ggrider@lanl.gov \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lionkov@lanl.gov \
--cc=matthew@wil.cx \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).