Sick VFS question

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Sick VFS question
@ 2003-02-25  9:48 H. Peter Anvin
  2003-02-25 16:19 ` Ion Badulescu
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2003-02-25  9:48 UTC (permalink / raw)
  To: linux-fsdevel

Hi everyone,

I'm considering new ideas for autofs v5, and I have a sick question:
would the VFS barf completely if autofs would take a dentry belonging
to another filesystem and replace the inode pointer with a pointer to
an inode belonging to itself (obviously incrementing the counter on
that inode appropriately)?  This would in particular be applicable to
"mount pads", i.e. let's say /auto is an autofs, and /auto/foo is an
autofs mount key, containing two filesystems: /auto/foo and
/auto/foo/bar. After mounting /auto/foo, autofs would access the "bar"
directory in the "foo" filesystem, and replace that inode with an
internal autofs inode.
This internal inode would have a follow_link(!) method which would
cause the /auto/foo/bar mount to happen and then redirect the search
to the topmost directory.

Ideally, this should not impede a umount of the /auto/foo directory if
the /auto/foo/bar directory is not mounted.

Using follow_link this way solves a whole lot of problems in autofs,
both in terms of usability and atomicity.  It solves the same problem
the dentry traps was supposed to solve, but also avoids the "mount
storm" problem by allowing lstat() without triggering a mount.

I would appreciate comments...

    -hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: cris ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-02-25  9:48 Sick VFS question H. Peter Anvin
@ 2003-02-25 16:19 ` Ion Badulescu
  2003-02-25 17:30   ` Charles P. Wright
  0 siblings, 1 reply; 13+ messages in thread
From: Ion Badulescu @ 2003-02-25 16:19 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-fsdevel, ezk

On 25 Feb 2003 01:48:39 -0800, H. Peter Anvin <hpa@zytor.com> wrote:
> Hi everyone,
> 
> I'm considering new ideas for autofs v5, and I have a sick question:
> would the VFS barf completely if autofs would take a dentry belonging
> to another filesystem and replace the inode pointer with a pointer to
> an inode belonging to itself (obviously incrementing the counter on
> that inode appropriately)?

This is precisely what we're doing in FiST-lite, so the answer is "it's 
OK in 2.4, at the very least".

There are some tricky things involved and you need to be extra careful 
about refcounts and all, but it's doable.

> This internal inode would have a follow_link(!) method which would
> cause the /auto/foo/bar mount to happen and then redirect the search
> to the topmost directory.
> 
> Ideally, this should not impede a umount of the /auto/foo directory if
> the /auto/foo/bar directory is not mounted.

Hmm. That's a bit difficult. In order to keep control of that internal 
inode, you need to lock it in memory => unmountable fs. Otherwise, since 
/auto/foo is a separate, non-autofs f/s, autofs won't even see when 
/auto/foo is being looked up, so it can't redo the inode substitution 
trick on each lookup if/when the dentry is evicted...

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-02-25 16:19 ` Ion Badulescu
@ 2003-02-25 17:30   ` Charles P. Wright
  2003-02-25 17:57     ` H. Peter Anvin
  2003-02-25 18:46     ` Ion Badulescu
  0 siblings, 2 replies; 13+ messages in thread
From: Charles P. Wright @ 2003-02-25 17:30 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: H. Peter Anvin, linux-fsdevel, ezk

On Tue, 25 Feb 2003, Ion Badulescu wrote:
> On 25 Feb 2003 01:48:39 -0800, H. Peter Anvin <hpa@zytor.com> wrote:
> > Hi everyone,
> > 
> > I'm considering new ideas for autofs v5, and I have a sick question:
> > would the VFS barf completely if autofs would take a dentry belonging
> > to another filesystem and replace the inode pointer with a pointer to
> > an inode belonging to itself (obviously incrementing the counter on
> > that inode appropriately)?
> 
> This is precisely what we're doing in FiST-lite, so the answer is "it's 
> OK in 2.4, at the very least".
I don't believe this is the case.

I think what was suggested is EXT2 dentry (or other "real" fs) points to
an autofs inode.  I'm not sure if this would work or not, but I don't
think it is the same as FiST-lite.

AFAIK, In FiST-lite what happens is the upper level (wrapfs) inode has its
address space operations set to the operations of the lower level (e.g.,
EXT2) inode.  A quick look at the code seemed to confirm this.

The EXT2 dentry still points to the inode of ext2, and the wrapfs dentry
still points to a wrapfs inode.  The change is wrapfs inode's
i_mapping->a_ops points to the EXT2 inode's i_mapping->a_ops.

Charles

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-02-25 17:30   ` Charles P. Wright
@ 2003-02-25 17:57     ` H. Peter Anvin
  2003-02-25 18:46     ` Ion Badulescu
  1 sibling, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2003-02-25 17:57 UTC (permalink / raw)
  To: Charles P. Wright; +Cc: Ion Badulescu, linux-fsdevel, ezk

Charles P. Wright wrote:
> 
> I don't believe this is the case.
> 
> I think what was suggested is EXT2 dentry (or other "real" fs) points to
> an autofs inode.  I'm not sure if this would work or not, but I don't
> think it is the same as FiST-lite.
> 

That is, indeed, what I'm trying to accomplish.

> AFAIK, In FiST-lite what happens is the upper level (wrapfs) inode has its
> address space operations set to the operations of the lower level (e.g.,
> EXT2) inode.  A quick look at the code seemed to confirm this.
> 
> The EXT2 dentry still points to the inode of ext2, and the wrapfs dentry
> still points to a wrapfs inode.  The change is wrapfs inode's
> i_mapping->a_ops points to the EXT2 inode's i_mapping->a_ops.

This sounds like it takes a file and "maps" it on top of another file -- 
something that would probably make a viable implementation of cachefs if 
we'd ever get around to implementing that...

	-hpa


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-02-25 17:30   ` Charles P. Wright
  2003-02-25 17:57     ` H. Peter Anvin
@ 2003-02-25 18:46     ` Ion Badulescu
  2003-02-25 19:53       ` H. Peter Anvin
  2003-03-06 16:53       ` David Chow
  1 sibling, 2 replies; 13+ messages in thread
From: Ion Badulescu @ 2003-02-25 18:46 UTC (permalink / raw)
  To: Charles P. Wright; +Cc: H. Peter Anvin, linux-fsdevel, ezk

On Tue, 25 Feb 2003, Charles P. Wright wrote:

> AFAIK, In FiST-lite what happens is the upper level (wrapfs) inode has its
> address space operations set to the operations of the lower level (e.g.,
> EXT2) inode.  A quick look at the code seemed to confirm this.

Hmm. You're probably right (you've been hacking that code more recently
than I have...). Anyway, what I said initially was definitely the original
approach for FiST-lite, and I guess my memory is getting fuzzy on recent
details, even though I actively worked on them. :-)

Anyway, in that case, why couldn't autofs do the same kind of sick thing --
even sicker: make a private copy of ext2's (or whatever) i_op table
for that inode and replace only the follow_link method inside it with its
own?

There are some memory management issues there, but they look solvable.  
E.g. you only need to free up those private i_op structures when 
destroying the autofs superblock. Or perhaps increment the i_count in the 
inode so it doesn't get destroyed, and periodically go through the list of 
inodes checking if their i_count has dropped to 1 and free them. There 
could be other creative solutions as well... but I'm seriously wondering 
if any of this has a snowball's chance in hell of getting accepted by Linus.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-02-25 18:46     ` Ion Badulescu
@ 2003-02-25 19:53       ` H. Peter Anvin
  2003-02-25 20:39         ` Ion Badulescu
  2003-03-06 16:53       ` David Chow
  1 sibling, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2003-02-25 19:53 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: Charles P. Wright, linux-fsdevel, ezk

Ion Badulescu wrote:
> 
> Anyway, in that case, why couldn't autofs do the same kind of sick thing --
> even sicker: make a private copy of ext2's (or whatever) i_op table
> for that inode and replace only the follow_link method inside it with its
> own?
> 

It seems a lot uglier than what I proposed, though.  I would prefer not
to have to create such "hack inodes", especially since some filesystems
legitimately change their i_ops around with time.

	-hpa


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-02-25 19:53       ` H. Peter Anvin
@ 2003-02-25 20:39         ` Ion Badulescu
  2003-02-25 21:06           ` Ion Badulescu
  2003-02-25 21:55           ` H. Peter Anvin
  0 siblings, 2 replies; 13+ messages in thread
From: Ion Badulescu @ 2003-02-25 20:39 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Charles P. Wright, linux-fsdevel, ezk

On Tue, 25 Feb 2003, H. Peter Anvin wrote:

> It seems a lot uglier than what I proposed, though.  I would prefer not
> to have to create such "hack inodes", especially since some filesystems
> legitimately change their i_ops around with time.

Unfortunately, you might not be able to do what you originally wanted. The 
more control you give to the "slave" filesystem, the harder it gets to 
keep things sane, and I'd guesstimate that it becomes next to impossible 
if the slave has the upper hand (i.e. the dentry).

Direct mounts would solve all these issues in a much more elegant way. We 
already have overlay mounts, we already have bind mounts, all the elements 
are in place to support direct mounts without much effort.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-02-25 20:39         ` Ion Badulescu
@ 2003-02-25 21:06           ` Ion Badulescu
  2003-02-25 21:55           ` H. Peter Anvin
  1 sibling, 0 replies; 13+ messages in thread
From: Ion Badulescu @ 2003-02-25 21:06 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Charles P. Wright, linux-fsdevel, ezk

On Tue, 25 Feb 2003, Ion Badulescu wrote:

> Direct mounts would solve all these issues in a much more elegant way. We 
> already have overlay mounts, we already have bind mounts, all the elements 
> are in place to support direct mounts without much effort.

Looking at the VFS code, it seems that all you need is a follow_link() 
autofs method which makes an upcall and then calls follow_mount(). This 
also has the nice property that the follow_link() code is completely 
bypassed on the lookup path, after a successful mount.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-02-25 20:39         ` Ion Badulescu
  2003-02-25 21:06           ` Ion Badulescu
@ 2003-02-25 21:55           ` H. Peter Anvin
  2003-02-26 15:37             ` Erez Zadok
  1 sibling, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2003-02-25 21:55 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: Charles P. Wright, linux-fsdevel, ezk

Ion Badulescu wrote:
> 
> Direct mounts would solve all these issues in a much more elegant way. We 
> already have overlay mounts, we already have bind mounts, all the elements 
> are in place to support direct mounts without much effort.
> 
Yes, except for atomicity concerns.  Anyway, I had a talk with Linus
about this about an hour ago, and he suggested adding a notification
mechanism to the namespace (mount) mechanism and do things at that
level.  It entails more kernel core hacking, but it would be cleaner in
a whole lot of ways.  I will look into it this evening.

	-hpa



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-02-25 21:55           ` H. Peter Anvin
@ 2003-02-26 15:37             ` Erez Zadok
  2003-02-26 15:45               ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Erez Zadok @ 2003-02-26 15:37 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ion Badulescu, Charles P. Wright, linux-fsdevel, ezk

In message <3E5BE646.80507@zytor.com>, "H. Peter Anvin" writes:
> Ion Badulescu wrote:
> > 
> > Direct mounts would solve all these issues in a much more elegant way. We 
> > already have overlay mounts, we already have bind mounts, all the elements 
> > are in place to support direct mounts without much effort.
> > 
> Yes, except for atomicity concerns.  Anyway, I had a talk with Linus
> about this about an hour ago, and he suggested adding a notification
> mechanism to the namespace (mount) mechanism and do things at that
> level.  It entails more kernel core hacking, but it would be cleaner in
> a whole lot of ways.  I will look into it this evening.

If you cannot change the core 2.4 kernel, then maybe you'd consider a
fist-lite based template as a starting point for v5.  Although it takes over
more ops than you may need, it is a fully standalone f/s that's reasonably
stable and with <1% overhead.

BTW HPA, you do recall Ion's "how to design a good autofs that plays well
with automounters" from some time ago, right? :-)

> 	-hpa

Erez.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-02-26 15:37             ` Erez Zadok
@ 2003-02-26 15:45               ` H. Peter Anvin
  0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2003-02-26 15:45 UTC (permalink / raw)
  To: Erez Zadok; +Cc: Ion Badulescu, Charles P. Wright, linux-fsdevel

Erez Zadok wrote:
> 
> BTW HPA, you do recall Ion's "how to design a good autofs that plays well
> with automounters" from some time ago, right? :-)
> 

No, I don't believe I have seen it.

	-hpa



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-02-25 18:46     ` Ion Badulescu
  2003-02-25 19:53       ` H. Peter Anvin
@ 2003-03-06 16:53       ` David Chow
  2003-03-06 17:18         ` Charles P. Wright
  1 sibling, 1 reply; 13+ messages in thread
From: David Chow @ 2003-03-06 16:53 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: Charles P. Wright, H. Peter Anvin, linux-fsdevel, ezk

Ion Badulescu wrote:

>On Tue, 25 Feb 2003, Charles P. Wright wrote:
>
>  
>
>>AFAIK, In FiST-lite what happens is the upper level (wrapfs) inode has its
>>address space operations set to the operations of the lower level (e.g.,
>>EXT2) inode.  A quick look at the code seemed to confirm this.
>>    
>>
>
>Hmm. You're probably right (you've been hacking that code more recently
>than I have...). Anyway, what I said initially was definitely the original
>approach for FiST-lite, and I guess my memory is getting fuzzy on recent
>details, even though I actively worked on them. :-)
>  
>
 From what I know about FiST, this is not quite true. The aops of wrapfs 
inode are not pointed to the lower level fs's aops. Most of the address 
space operations have a struct page, which a struct page have its 
mapping and host ref to inode. The lower level inode is ref by the 
wrapfs inode private data. The wrapgs has its own page cache and looks 
like a separate fs to the VFS in all aspects.

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Sick VFS question
  2003-03-06 16:53       ` David Chow
@ 2003-03-06 17:18         ` Charles P. Wright
  0 siblings, 0 replies; 13+ messages in thread
From: Charles P. Wright @ 2003-03-06 17:18 UTC (permalink / raw)
  To: David Chow; +Cc: H. Peter Anvin, linux-fsdevel, Ion Badulescu, ezk

On Fri, 7 Mar 2003, David Chow wrote:
> Ion Badulescu wrote:
> >On Tue, 25 Feb 2003, Charles P. Wright wrote:
> >>AFAIK, In FiST-lite what happens is the upper level (wrapfs) inode has its
> >>address space operations set to the operations of the lower level (e.g.,
> >>EXT2) inode.  A quick look at the code seemed to confirm this.
> >>    
> >>
> >
> >Hmm. You're probably right (you've been hacking that code more recently
> >than I have...). Anyway, what I said initially was definitely the original
> >approach for FiST-lite, and I guess my memory is getting fuzzy on recent
> >details, even though I actively worked on them. :-)
> >  
> >
>  From what I know about FiST, this is not quite true. The aops of wrapfs 
> inode are not pointed to the lower level fs's aops. Most of the address 
> space operations have a struct page, which a struct page have its 
> mapping and host ref to inode. The lower level inode is ref by the 
> wrapfs inode private data. The wrapgs has its own page cache and looks 
> like a separate fs to the VFS in all aspects.
David,

We've recently improved the FiST code, such that if you specify filter
data, this is indeed the case and a separate page cache is kept.  
However, if data is not filtered then FiST-lite is used (the method
described above).  This decreases performance overhead to about 1% over a
native file system (compared to full-blown FiST which is about 5%).

Charles


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-03-06 17:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-25  9:48 Sick VFS question H. Peter Anvin
2003-02-25 16:19 ` Ion Badulescu
2003-02-25 17:30   ` Charles P. Wright
2003-02-25 17:57     ` H. Peter Anvin
2003-02-25 18:46     ` Ion Badulescu
2003-02-25 19:53       ` H. Peter Anvin
2003-02-25 20:39         ` Ion Badulescu
2003-02-25 21:06           ` Ion Badulescu
2003-02-25 21:55           ` H. Peter Anvin
2003-02-26 15:37             ` Erez Zadok
2003-02-26 15:45               ` H. Peter Anvin
2003-03-06 16:53       ` David Chow
2003-03-06 17:18         ` Charles P. Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).