* v9fs writepage
@ 2005-04-30 17:17 Eric Van Hensbergen
2005-04-30 17:58 ` Miklos Szeredi
0 siblings, 1 reply; 13+ messages in thread
From: Eric Van Hensbergen @ 2005-04-30 17:17 UTC (permalink / raw)
To: v9fs-developer; +Cc: kernel-mentors, linux-fsdevel, Al Viro, Christoph Hellwig
One critique I got back (from Chris Hellwig) on the v9fs
implementation was the way that I went about implementing writepage.
I'll agree that my existing solution is imperfect so I'd like to
solicit ideas from the community.
Essentially, the 9P protocols requires I associate each transaction
with a FID (which can be thought of a file descriptor known both to
the client and the server which is associated with a particular
thread/user). In most areas of the v9fs driver it is fairly trivial
to map an access back to a particular FID (be looking at the process
context in which the system call was executed). However, in the
address_space code (to support mmap), I'm not given an easy handle (in
most cases its the dentry or the file structure) to resolve a FID
against. In order to try and guess what the right FID to use for the
writepage transaction is, I wrote the following function which
essentially scans through the dentries of an inode and tries to find
the right FID.
/**
* v9fs_find_file - find a file pointer based on page
* @page: page to lookup file based on
*
*/
static struct file *v9fs_find_file(struct page *page)
{
struct address_space *mapping = page->mapping;
struct inode *inode = NULL;
struct dentry *dentry = NULL;
struct v9fs_fid *fid = NULL;
struct list_head *p, *temp;
dprintk(DEBUG_VFS, " page: %p\n", page);
if (!mapping) {
dprintk(DEBUG_ERROR, "No mapping\n");
return NULL;
}
inode = mapping->host;
if (!inode) {
dprintk(DEBUG_ERROR, "No inode\n");
return NULL;
}
list_for_each_safe(p, temp, &inode->i_dentry) {
dentry = list_entry(p, struct dentry, d_alias);
fid = v9fs_fid_lookup(dentry, FID_OP);
if (fid)
return fid->filp;
}
return NULL;
}
The problem is that v9fs_fid_lookup() tries to find the right fid
based on information in current (and therefore assumes it is running
in the same context of the thread that initiated the transaction).
This isn't always the case in writepage, although I try to force this
by always calling writepage from my dirtypage method (which I believe
is always called in the context of the process who is "dirtying" the
page). This seems to work for simple cases (like running fsx), but
hch pointed out that he believes it won't work in certain scenarios.
Can anyone suggest a methodology using the address_space_operations to
be able to associate memory writes with a particular thread/user?
If you would like to see more context, the v9fs code is available in
several flavors:
tarballs: http://v9fs.sf.net
CVSweb: http://cvs.sourceforge.net/viewcvs.py/v9fs/linux-9p/
CVS: :pserver:anonymous@cvs.sourceforge.net:/cvsroot/v9fs/linux-9p
BitKeeper: bk://linux-v9fs.bkbits.net
Thanks for your help.
-eric
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: v9fs writepage
2005-04-30 17:17 v9fs writepage Eric Van Hensbergen
@ 2005-04-30 17:58 ` Miklos Szeredi
2005-04-30 18:10 ` Eric Van Hensbergen
0 siblings, 1 reply; 13+ messages in thread
From: Miklos Szeredi @ 2005-04-30 17:58 UTC (permalink / raw)
To: ericvh; +Cc: kernel-mentors, v9fs-developer, linux-fsdevel, hch, viro
> I'm not given an easy handle (in most cases its the dentry or the
> file structure) to resolve a FID against.
While it had a writepage(), FUSE had a similar problem. The solution
was to store a list of files suitable for writing in the inode (the
private inode data). The list was updated in the open() and release()
methods. And in writepage() the first entry on the list was taken.
No searching was required.
You're probably wondering what happened to writepage() in FUSE. Well
it was removed because it turned out to be deadlock prone in OOM
situations. That is actually not a speciality of FUSE, most network
filesystems are vulnerable in case the server is running on the same
machine as the client. And in other cases as well, but only to a
lesser extent.
The reason is that if the server needs to allocate memory to fulfill
the writeback request, this allocation will block (since the system is
just trying to free up memory) resulting in a deadlock.
You should keep this in mind for v9fs too, if you want to allow
non-privileged users to mount their filesystems.
Miklos
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: v9fs writepage
2005-04-30 17:58 ` Miklos Szeredi
@ 2005-04-30 18:10 ` Eric Van Hensbergen
2005-04-30 18:36 ` Al Viro
2005-04-30 19:53 ` Miklos Szeredi
0 siblings, 2 replies; 13+ messages in thread
From: Eric Van Hensbergen @ 2005-04-30 18:10 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: kernel-mentors, v9fs-developer, linux-fsdevel, hch, viro
On 4/30/05, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > I'm not given an easy handle (in most cases its the dentry or the
> > file structure) to resolve a FID against.
>
> While it had a writepage(), FUSE had a similar problem. The solution
> was to store a list of files suitable for writing in the inode (the
> private inode data). The list was updated in the open() and release()
> methods. And in writepage() the first entry on the list was taken.
> No searching was required.
>
> You're probably wondering what happened to writepage() in FUSE. Well
> it was removed because it turned out to be deadlock prone in OOM
> situations. That is actually not a speciality of FUSE, most network
> filesystems are vulnerable in case the server is running on the same
> machine as the client. And in other cases as well, but only to a
> lesser extent.
>
> The reason is that if the server needs to allocate memory to fulfill
> the writeback request, this allocation will block (since the system is
> just trying to free up memory) resulting in a deadlock.
>
> You should keep this in mind for v9fs too, if you want to allow
> non-privileged users to mount their filesystems.
>
I remember seeing some of this thread on LKML, wasn't this one of
Linus' big complaints about FUSE? I'm fine with leaving writepage out
of v9fs, particularly for user-mounts and/or synthetic file systems.
Is there an established guideline for its use? (I suppose as long as
we know the file system is remote then we can allow it -- or is it
only user-space file systems/user-mounts that are a concern?)
It would seem a relaxed check in my v9fs_fid_locate would provide the
same service as keeping open files in the inode. We had looked at
doing something similar, but I hated the idea of having to keep the
extra book-keeping if there was another path (particularly since we
are already cacheing fids in dentries). Basically, I could add a flag
to v9fs_fid locate which says if you can't find an appropriate context
(a fid owned by the same user, pid, or pgrp as the context performing
the writepage) then choose any open fid matching the file. I may be
able to get closer by flagging fids which have been mmapped -- but
that seems like overkill and doesn't necessarily provide any
additional accuracy for this corner case.
Thanks Miklos.
-eric
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: v9fs writepage
2005-04-30 18:10 ` Eric Van Hensbergen
@ 2005-04-30 18:36 ` Al Viro
2005-04-30 18:47 ` Eric Van Hensbergen
2005-04-30 19:53 ` Miklos Szeredi
1 sibling, 1 reply; 13+ messages in thread
From: Al Viro @ 2005-04-30 18:36 UTC (permalink / raw)
To: Eric Van Hensbergen
Cc: Miklos Szeredi, v9fs-developer, kernel-mentors, linux-fsdevel,
hch
On Sat, Apr 30, 2005 at 01:10:39PM -0500, Eric Van Hensbergen wrote:
> It would seem a relaxed check in my v9fs_fid_locate would provide the
> same service as keeping open files in the inode. We had looked at
> doing something similar, but I hated the idea of having to keep the
> extra book-keeping if there was another path (particularly since we
> are already cacheing fids in dentries).
a) 9P server is entirely within its rights to serve different contents
depending on the implied uid and even on phase of moon during the processing
of auth. So client-side file contents cache is not kosher.
b) 9P server is allowed (and often *does*, for RPC-style applications)
to serve absolutely different contents for IO on different open() on
the same file by the same user.
c) caches belong to server side in that model. Server knows whether it's
allowed to cache given file; client does not and I don't see any provisions
for transfer of that information in the protocol. FWIW, I would expect
something like kernel-side caching fs that would be explicitly told to
cache given tree.
Keep in mind that 9P closely matches the Plan 9 VFS API, but it's not the
same thing. Kernel makes direct calls into filesystem; it's just that
mnt happens to map them onto 9P exchanges. Natural place for client-side
cache would be between VFS and mnt, very likely as a separate fs driver.
FWIW, I would suggest the folks involved in that fun to take a good look
at http://cm.bell-labs.com/magic/man2html/2/auth and around it. Especially
when it comes to recreating namespaces, etc.
Putting the information about caching into 9P would be interesting, but
that doesn't belong here - such stuff should be taken to Plan 9 maillist.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: v9fs writepage
2005-04-30 18:36 ` Al Viro
@ 2005-04-30 18:47 ` Eric Van Hensbergen
2005-04-30 19:01 ` Al Viro
0 siblings, 1 reply; 13+ messages in thread
From: Eric Van Hensbergen @ 2005-04-30 18:47 UTC (permalink / raw)
To: Al Viro; +Cc: Miklos Szeredi, v9fs-developer, kernel-mentors, linux-fsdevel,
hch
On 4/30/05, Al Viro <viro@parcelfarce.linux.theplanet.co.uk> wrote:
> On Sat, Apr 30, 2005 at 01:10:39PM -0500, Eric Van Hensbergen wrote:
>
> > It would seem a relaxed check in my v9fs_fid_locate would provide the
> > same service as keeping open files in the inode. We had looked at
> > doing something similar, but I hated the idea of having to keep the
> > extra book-keeping if there was another path (particularly since we
> > are already cacheing fids in dentries).
>
> a) 9P server is entirely within its rights to serve different contents
> depending on the implied uid and even on phase of moon during the processing
> of auth. So client-side file contents cache is not kosher.
>
Yes. I agree. I actually explicitly took all the cacheing behavior
out of v9fs when I started getting involved a year ago. I only put
the address_space operations back in recently in order to support mmap
(which isn't supported at all under Plan 9). Would you rather I
remove the vfs_addr.c support in order to stay more Plan 9 consistent?
I figured I'd have a better chance of getting LKML acceptance by
providing as many "linux semantics" as I could -- specifically we
really wanted to pass fsx.
>
> Putting the information about caching into 9P would be interesting, but
> that doesn't belong here - such stuff should be taken to Plan 9 maillist.
>
There is some precident for this in the cfs(4) file system under Plan
9 which providing something similar to the new Linux CacheFS for
non-synthetic file systems (a unofficial rule of thumb was that
anything that had a zero modification time was most likely synthetic
and therefore unsuitable to cache). My plan has been to start by
integrating this sort of layer on-top of the existing v9fs base and
then do what I can to remove any performance penalty from implementing
it as a stackable layer. I was also planning on talking to the linux
CacheFS guys to see if their stuff would be suitable for this
application. It's on the roadmap, but was planning to start after we
get the inital core accepted to the kernel (trying to keep things as
simple as possible). Its very likely we could just provide all mmap
support within the cache layer.
-eric
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: v9fs writepage
2005-04-30 18:47 ` Eric Van Hensbergen
@ 2005-04-30 19:01 ` Al Viro
2005-04-30 20:03 ` Miklos Szeredi
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Al Viro @ 2005-04-30 19:01 UTC (permalink / raw)
To: Eric Van Hensbergen
Cc: Miklos Szeredi, v9fs-developer, kernel-mentors, linux-fsdevel,
hch
On Sat, Apr 30, 2005 at 01:47:35PM -0500, Eric Van Hensbergen wrote:
> Yes. I agree. I actually explicitly took all the cacheing behavior
> out of v9fs when I started getting involved a year ago. I only put
> the address_space operations back in recently in order to support mmap
> (which isn't supported at all under Plan 9). Would you rather I
> remove the vfs_addr.c support in order to stay more Plan 9 consistent?
> I figured I'd have a better chance of getting LKML acceptance by
> providing as many "linux semantics" as I could -- specifically we
> really wanted to pass fsx.
Well, let me put it that way:
a) more complexity does not help to get new code merged.
b) 9P has very obvious RPC-style uses. We have nothing really
suitable in that area and implementation on non-caching 9P client has
immediate applications. And _that_ is far more interesting than ability
to import from Plan 9 fileserver or u9fs running on a Unix box.
c) mmap() is and ever had been optional. If userland code breaks
due to mmap() not working for some file - it's already fs-dependent.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: v9fs writepage
2005-04-30 18:10 ` Eric Van Hensbergen
2005-04-30 18:36 ` Al Viro
@ 2005-04-30 19:53 ` Miklos Szeredi
1 sibling, 0 replies; 13+ messages in thread
From: Miklos Szeredi @ 2005-04-30 19:53 UTC (permalink / raw)
To: ericvh; +Cc: v9fs-developer, kernel-mentors, linux-fsdevel, viro, hch
> I remember seeing some of this thread on LKML, wasn't this one of
> Linus' big complaints about FUSE?
Yup.
> I'm fine with leaving writepage out of v9fs, particularly for
> user-mounts and/or synthetic file systems. Is there an established
> guideline for its use? (I suppose as long as we know the file system
> is remote then we can allow it -- or is it only user-space file
> systems/user-mounts that are a concern?)
Page writeback is problematic even for normal operation of network
filesystems, because allocation of socket buffers for the reply may
not be possible. I think people are thinking hard about the problem,
but it's not something easily triggered.
But when you have the server on the same box as the client, it's
definitely a problem. And if the user controls a server it's a very
easy to exploit DoS. So these two cases are the ones to look out for.
Miklos
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: v9fs writepage
2005-04-30 19:01 ` Al Viro
@ 2005-04-30 20:03 ` Miklos Szeredi
2005-04-30 20:09 ` Al Viro
2005-05-01 1:10 ` Eric Van Hensbergen
2005-05-01 15:29 ` Ronald G. Minnich
2 siblings, 1 reply; 13+ messages in thread
From: Miklos Szeredi @ 2005-04-30 20:03 UTC (permalink / raw)
To: viro; +Cc: v9fs-developer, linux-fsdevel, hch, kernel-mentors
> c) mmap() is and ever had been optional. If userland code breaks
> due to mmap() not working for some file - it's already fs-dependent.
Well, read only mapping is used by everything. Can ld.so get along
without it?
There's only one documented case of an app (svn) breaking because FUSE
doesn't support writable mmap, so that is pretty rare.
Miklos
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: v9fs writepage
2005-04-30 20:03 ` Miklos Szeredi
@ 2005-04-30 20:09 ` Al Viro
2005-04-30 21:44 ` Miklos Szeredi
0 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2005-04-30 20:09 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: v9fs-developer, linux-fsdevel, hch, kernel-mentors
On Sat, Apr 30, 2005 at 10:03:37PM +0200, Miklos Szeredi wrote:
> > c) mmap() is and ever had been optional. If userland code breaks
> > due to mmap() not working for some file - it's already fs-dependent.
>
> Well, read only mapping is used by everything. Can ld.so get along
> without it?
Yes, it can.
> There's only one documented case of an app (svn) breaking because FUSE
> doesn't support writable mmap, so that is pretty rare.
SVN duhvelopers show their usual clue level, film at 11...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: v9fs writepage
2005-04-30 20:09 ` Al Viro
@ 2005-04-30 21:44 ` Miklos Szeredi
0 siblings, 0 replies; 13+ messages in thread
From: Miklos Szeredi @ 2005-04-30 21:44 UTC (permalink / raw)
To: viro; +Cc: ericvh, v9fs-developer, kernel-mentors, linux-fsdevel, hch
> > > c) mmap() is and ever had been optional. If userland code breaks
> > > due to mmap() not working for some file - it's already fs-dependent.
> >
> > Well, read only mapping is used by everything. Can ld.so get along
> > without it?
>
> Yes, it can.
Maybe, but even before that the elf loader seems to fall over.
So either the kernel needs to be fixed, or read only mmap _is_ needed
to be able to execute programs.
Miklos
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: v9fs writepage
2005-04-30 19:01 ` Al Viro
2005-04-30 20:03 ` Miklos Szeredi
@ 2005-05-01 1:10 ` Eric Van Hensbergen
2005-05-01 15:36 ` [V9fs-developer] " Ronald G. Minnich
2005-05-01 15:29 ` Ronald G. Minnich
2 siblings, 1 reply; 13+ messages in thread
From: Eric Van Hensbergen @ 2005-05-01 1:10 UTC (permalink / raw)
To: Al Viro; +Cc: kernel-mentors, v9fs-developer, linux-fsdevel, hch,
Miklos Szeredi
On 4/30/05, Al Viro <viro@parcelfarce.linux.theplanet.co.uk> wrote:
> On Sat, Apr 30, 2005 at 01:47:35PM -0500, Eric Van Hensbergen wrote:
> > Yes. I agree. I actually explicitly took all the cacheing behavior
> > out of v9fs when I started getting involved a year ago.
>
> Well, let me put it that way:
> a) more complexity does not help to get new code merged.
> b) 9P has very obvious RPC-style uses. We have nothing really
> suitable in that area and implementation on non-caching 9P client has
> immediate applications. And _that_ is far more interesting than ability
> to import from Plan 9 fileserver or u9fs running on a Unix box.
> c) mmap() is and ever had been optional. If userland code breaks
> due to mmap() not working for some file - it's already fs-dependent.
>
It is easy enough to remove, mostly in one file except for asssigning
the address_space_operations in vfs_inode. I'll take it out before
RC3 and provide it as a separate optional-patch for folks wishing to
use v9fs for root file systems and such (the original motivation
behind supporting mmap was LANL was using it for root file systems and
were running into trouble exec'ing things without mmap).
Sorry for making assumptions about what would/wouldn't be accept Viro
-- I should have talked to you about things sooner, but wanted to do
my best to clean up the existing code base and provide a level of
functionality that would satisfy what I thought Linux users'
expectations of a distributed file system.
RC3 already has a lot of cleanups in it (mostly from hch's comments
and running it through sparse). I should be sending something out
sometime tomorrow or Monday morning at the latest.
Al - I'd really like your feedback on the driver as a whole once
you've had a chance to go over the new set of patches - I'm very happy
to make whatever changes are necessary to make it more
acceptable/useful to the linux community.
-eric
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [V9fs-developer] Re: v9fs writepage
2005-04-30 19:01 ` Al Viro
2005-04-30 20:03 ` Miklos Szeredi
2005-05-01 1:10 ` Eric Van Hensbergen
@ 2005-05-01 15:29 ` Ronald G. Minnich
2 siblings, 0 replies; 13+ messages in thread
From: Ronald G. Minnich @ 2005-05-01 15:29 UTC (permalink / raw)
To: Al Viro
Cc: Eric Van Hensbergen, Miklos Szeredi, v9fs-developer,
kernel-mentors, linux-fsdevel, hch
eric, seems like taking out writepage and mmap support might be the thing
to do ...
ron
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [V9fs-developer] Re: v9fs writepage
2005-05-01 1:10 ` Eric Van Hensbergen
@ 2005-05-01 15:36 ` Ronald G. Minnich
0 siblings, 0 replies; 13+ messages in thread
From: Ronald G. Minnich @ 2005-05-01 15:36 UTC (permalink / raw)
To: Eric Van Hensbergen
Cc: Al Viro, Miklos Szeredi, v9fs-developer, kernel-mentors,
linux-fsdevel, hch
On Sat, 30 Apr 2005, Eric Van Hensbergen wrote:
> It is easy enough to remove, mostly in one file except for asssigning
> the address_space_operations in vfs_inode. I'll take it out before RC3
> and provide it as a separate optional-patch for folks wishing to use
> v9fs for root file systems and such (the original motivation behind
> supporting mmap was LANL was using it for root file systems and were
> running into trouble exec'ing things without mmap).
ah! not us! the only reason I pushed on mmap was that I feared problems
with the applications guys and their wacky python libraries and their
penchant for dlopen() and such, which I (mis?)-understood would require
mmap. We don't do root file systems over the network -- does not scale
well to 1700 machines.
yank mmap out if it's making things ugly or unreliable. I'd rather see
writepage go away at this point as it is getting messy.
ron
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2005-05-01 15:36 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-30 17:17 v9fs writepage Eric Van Hensbergen
2005-04-30 17:58 ` Miklos Szeredi
2005-04-30 18:10 ` Eric Van Hensbergen
2005-04-30 18:36 ` Al Viro
2005-04-30 18:47 ` Eric Van Hensbergen
2005-04-30 19:01 ` Al Viro
2005-04-30 20:03 ` Miklos Szeredi
2005-04-30 20:09 ` Al Viro
2005-04-30 21:44 ` Miklos Szeredi
2005-05-01 1:10 ` Eric Van Hensbergen
2005-05-01 15:36 ` [V9fs-developer] " Ronald G. Minnich
2005-05-01 15:29 ` Ronald G. Minnich
2005-04-30 19:53 ` Miklos Szeredi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.