* Re: + ecryptfs-dont-muck-with-the-existing-nameidata-structures.patch added to -mm tree
[not found] ` <20060707120004.GA6101@lst.de>
@ 2006-07-07 23:24 ` Michael Halcrow
2006-07-08 16:04 ` [Unionfs] " Erez Zadok
0 siblings, 1 reply; 2+ messages in thread
From: Michael Halcrow @ 2006-07-07 23:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: akpm, mm-commits, linux-kernel, Steven M. French,
David J. Kleikamp, ezk, fistgen, jsipek, unionfs
On Fri, Jul 07, 2006 at 02:00:04PM +0200, Christoph Hellwig wrote:
> On Mon, Jun 19, 2006 at 04:38:55PM -0700, akpm@osdl.org wrote:
> > The patch titled
> >
> > ecryptfs: don't muck with the existing nameidata structures
> >
> > has been added to the -mm tree. Its filename is
> >
> > ecryptfs-dont-muck-with-the-existing-nameidata-structures.patch
> >
> > See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
> > out what to do about this
>
> I think the only way to fix this is to stop passing down the whole
> nameidata. Pass the intent only and do something smart with it in
> the interposing filesystem. Not sure what though as the important
> part isn't just doing something smart with what we get, but rather
> the more important part is to do pass down the right thing to the
> underlying filesystem.
>
> Have you tested ecryptfs with nfsv4?
I've tested with the older nfs, but not v4 yet. I don't think it will
work, for this reason.
do_filp_open()
open_namei()
path_lookup_open()
__path_lookup_intent_open()
do_path_lookup()
link_path_walk()
__link_path_walk()
do_lookup()
do_revalidate()
d_revalidate()
d_revalidate is a pointer to ecryptfs_d_revalidate(), which passes its
own temporary nameidata to nfs_open_revalidate().
nfs_open_revalidate()
nfs4_open_revalidate()
nfs4_intent_set_file()
lookup_instantiate_filp()
In lookup_instantiate_filp(), nd->intent.open.file = __dentry_open(...)
This is where it gets tricky. In my humble understanding of VFS
internals, my first inclination is to say that perhaps
ecryptfs_d_revalidate() could detect that intent.open.file was set,
then generate a corresponding lookup intent dentry and interpose the
two; it's not obvious to me why the bookkeeping has to be terribly
complicated. I do fear that some nameidata stacking infrastructure may
need to be introduced into nameidata structure itself
(nameidata.private pointer?) for this approach to work, but I will
have to poke around and think about it a bit more.
In any case, right now, ecryptfs_d_revalidate() drops the temporary
nameidata on the floor, leaving nd->intent.opt.file dangling. Then we
find our way back to do_filp_open(), which returns
nameidata_to_filp(), which returns an uninitialized
nd->intent.opt.file. If my observations are coherent, then this will
certainly need to be fixed for nfsv4.
The original form of ecryptfs_d_revalidate(), which just replaced the
dentry and vfsmount in the nd, has the obvious inconsistency problem
with the dcache.
I'll bring this up with the FiST folks to get their view on the issue.
Thanks,
Mike
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Unionfs] Re: + ecryptfs-dont-muck-with-the-existing-nameidata-structures.patch added to -mm tree
2006-07-07 23:24 ` + ecryptfs-dont-muck-with-the-existing-nameidata-structures.patch added to -mm tree Michael Halcrow
@ 2006-07-08 16:04 ` Erez Zadok
0 siblings, 0 replies; 2+ messages in thread
From: Erez Zadok @ 2006-07-08 16:04 UTC (permalink / raw)
To: Michael Halcrow
Cc: Christoph Hellwig, akpm, Steven M. French, unionfs, mm-commits,
jsipek, David J. Kleikamp, fistgen, ezk, linux-kernel
Mike, and everyone who wants to consider stacking:
If you want a stackable system to work well (instead of giving up and
"ripping it out" of your VFS), then you have to allow every major object
that is involved in the file system's implementation, to be stacked upon.
You have to think about a making things work not just for one-to-one stack,
but a stack of N levels (N > 2); or even fan-out stacks (ala Unionfs).
This means that, at the VERY least, every such object should have a way of
connecting it to its corresponding lower one(s). More specifically, every
object needs to have some void* that is at the file system's control, into
which we can stuff info about lower objects, etc.
In all of the stackable file systems we've developed, we found it relatively
straightforward to handle struct inode, dentry, file, and superblock -- b/c
they have such a void* (or equivalent).
Things got more difficult when we had to deal w/ struct page, partly b/c
there's no easy way to have one page point to a lower page. Struct page
also has the annoying problem that it back-refs *one* inode, via the
mapping->host (this is one major reason for the "double caching" problem).
When we ported the stackable file systems to 2.6, we noticed this new
nameidata structure being passed around to file systems. Our initial desire
was to stack on it -- b/c that's the safest way avoid mixing objects b/t
layers. But nameidata has no void* that one can use to stack on another.
So, instead, we looked at how precisely it was used by lower file systems.
We found out that, at least back when we did the 2.6 port, that it was ok to
just pass the nd we got from above to lower file systems, and that some file
systems didn't use it at all, so we could even just pass a null. That
strategy may no longer work (b/c of nfsv4, it appears).
For years I've worked around similar VFS limitations, in my strong desire
not to require users to modify their kernel to use my stackable file
systems. And this strategy was successful: stackable file systems had
become more and more popular (cf. Halcrow/IBM and www.unionfs.org :-)
But, if we we're now going to consider stacking seriously in linux, then I
think it would be best to do it cleanly. This means that some changes to
the VFS are going to be unfortunately unavoidable.
Finally, note that just adding a void* to some structures won't
automatically make the code cleanly stackable. Some VFS changes may be
needed, such as what Al did in 2.3: various VFS functions were split into
private VFS functions, and exported, reentrant vfs_* methods; these vfs_*
methods were very useful to us.
Sincerely,
Erez.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2006-07-08 16:05 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200606192335.k5JNZWo6017881@shell0.pdx.osdl.net>
[not found] ` <20060707120004.GA6101@lst.de>
2006-07-07 23:24 ` + ecryptfs-dont-muck-with-the-existing-nameidata-structures.patch added to -mm tree Michael Halcrow
2006-07-08 16:04 ` [Unionfs] " Erez Zadok
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox