* Re: + ecryptfs-dont-muck-with-the-existing-nameidata-structures.patch added to -mm tree [not found] ` <20060707120004.GA6101@lst.de> @ 2006-07-07 23:24 ` Michael Halcrow 2006-07-08 16:04 ` [Unionfs] " Erez Zadok 0 siblings, 1 reply; 2+ messages in thread From: Michael Halcrow @ 2006-07-07 23:24 UTC (permalink / raw) To: Christoph Hellwig Cc: akpm, mm-commits, linux-kernel, Steven M. French, David J. Kleikamp, ezk, fistgen, jsipek, unionfs On Fri, Jul 07, 2006 at 02:00:04PM +0200, Christoph Hellwig wrote: > On Mon, Jun 19, 2006 at 04:38:55PM -0700, akpm@osdl.org wrote: > > The patch titled > > > > ecryptfs: don't muck with the existing nameidata structures > > > > has been added to the -mm tree. Its filename is > > > > ecryptfs-dont-muck-with-the-existing-nameidata-structures.patch > > > > See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find > > out what to do about this > > I think the only way to fix this is to stop passing down the whole > nameidata. Pass the intent only and do something smart with it in > the interposing filesystem. Not sure what though as the important > part isn't just doing something smart with what we get, but rather > the more important part is to do pass down the right thing to the > underlying filesystem. > > Have you tested ecryptfs with nfsv4? I've tested with the older nfs, but not v4 yet. I don't think it will work, for this reason. do_filp_open() open_namei() path_lookup_open() __path_lookup_intent_open() do_path_lookup() link_path_walk() __link_path_walk() do_lookup() do_revalidate() d_revalidate() d_revalidate is a pointer to ecryptfs_d_revalidate(), which passes its own temporary nameidata to nfs_open_revalidate(). nfs_open_revalidate() nfs4_open_revalidate() nfs4_intent_set_file() lookup_instantiate_filp() In lookup_instantiate_filp(), nd->intent.open.file = __dentry_open(...) This is where it gets tricky. In my humble understanding of VFS internals, my first inclination is to say that perhaps ecryptfs_d_revalidate() could detect that intent.open.file was set, then generate a corresponding lookup intent dentry and interpose the two; it's not obvious to me why the bookkeeping has to be terribly complicated. I do fear that some nameidata stacking infrastructure may need to be introduced into nameidata structure itself (nameidata.private pointer?) for this approach to work, but I will have to poke around and think about it a bit more. In any case, right now, ecryptfs_d_revalidate() drops the temporary nameidata on the floor, leaving nd->intent.opt.file dangling. Then we find our way back to do_filp_open(), which returns nameidata_to_filp(), which returns an uninitialized nd->intent.opt.file. If my observations are coherent, then this will certainly need to be fixed for nfsv4. The original form of ecryptfs_d_revalidate(), which just replaced the dentry and vfsmount in the nd, has the obvious inconsistency problem with the dcache. I'll bring this up with the FiST folks to get their view on the issue. Thanks, Mike ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Unionfs] Re: + ecryptfs-dont-muck-with-the-existing-nameidata-structures.patch added to -mm tree 2006-07-07 23:24 ` + ecryptfs-dont-muck-with-the-existing-nameidata-structures.patch added to -mm tree Michael Halcrow @ 2006-07-08 16:04 ` Erez Zadok 0 siblings, 0 replies; 2+ messages in thread From: Erez Zadok @ 2006-07-08 16:04 UTC (permalink / raw) To: Michael Halcrow Cc: Christoph Hellwig, akpm, Steven M. French, unionfs, mm-commits, jsipek, David J. Kleikamp, fistgen, ezk, linux-kernel Mike, and everyone who wants to consider stacking: If you want a stackable system to work well (instead of giving up and "ripping it out" of your VFS), then you have to allow every major object that is involved in the file system's implementation, to be stacked upon. You have to think about a making things work not just for one-to-one stack, but a stack of N levels (N > 2); or even fan-out stacks (ala Unionfs). This means that, at the VERY least, every such object should have a way of connecting it to its corresponding lower one(s). More specifically, every object needs to have some void* that is at the file system's control, into which we can stuff info about lower objects, etc. In all of the stackable file systems we've developed, we found it relatively straightforward to handle struct inode, dentry, file, and superblock -- b/c they have such a void* (or equivalent). Things got more difficult when we had to deal w/ struct page, partly b/c there's no easy way to have one page point to a lower page. Struct page also has the annoying problem that it back-refs *one* inode, via the mapping->host (this is one major reason for the "double caching" problem). When we ported the stackable file systems to 2.6, we noticed this new nameidata structure being passed around to file systems. Our initial desire was to stack on it -- b/c that's the safest way avoid mixing objects b/t layers. But nameidata has no void* that one can use to stack on another. So, instead, we looked at how precisely it was used by lower file systems. We found out that, at least back when we did the 2.6 port, that it was ok to just pass the nd we got from above to lower file systems, and that some file systems didn't use it at all, so we could even just pass a null. That strategy may no longer work (b/c of nfsv4, it appears). For years I've worked around similar VFS limitations, in my strong desire not to require users to modify their kernel to use my stackable file systems. And this strategy was successful: stackable file systems had become more and more popular (cf. Halcrow/IBM and www.unionfs.org :-) But, if we we're now going to consider stacking seriously in linux, then I think it would be best to do it cleanly. This means that some changes to the VFS are going to be unfortunately unavoidable. Finally, note that just adding a void* to some structures won't automatically make the code cleanly stackable. Some VFS changes may be needed, such as what Al did in 2.3: various VFS functions were split into private VFS functions, and exported, reentrant vfs_* methods; these vfs_* methods were very useful to us. Sincerely, Erez. ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2006-07-08 16:05 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200606192335.k5JNZWo6017881@shell0.pdx.osdl.net>
[not found] ` <20060707120004.GA6101@lst.de>
2006-07-07 23:24 ` + ecryptfs-dont-muck-with-the-existing-nameidata-structures.patch added to -mm tree Michael Halcrow
2006-07-08 16:04 ` [Unionfs] " Erez Zadok
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox