mountpoint-crossing

All of lore.kernel.org
 help / color / mirror / Atom feed

* mountpoint-crossing
@ 2009-12-13 21:39 J. Bruce Fields
  2009-12-13 22:33 ` mountpoint-crossing Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2009-12-13 21:39 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On a recent kernel:

	# mount -tnfs4 pearlet1:/ /mnt/
	# find /mnt/
	/mnt/
	find: File system loop detected; `/mnt/DIR' is part of the same
	file system loop as `/mnt/'.

Here /mnt/DIR is a server-side mountpoint, hence has a different fsid
than /mnt/.  Wireshark confirms that the server is returning a different
fsid.  However, 'strace -v find /mnt/' shows stat returning
st_dev=makedev(0, 22) for both /mnt and /mnt/DIR.

If I then do a 'ls /mnt/DIR', followed by another find, the error goes
away, and this time an strace shows that stat is returning (0, 23) for
/mnt/DIR.

I don't see any obvious problem with the network trace, so it looks to
me like the client is failing to recognize the mountpoint when it
should?

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mountpoint-crossing
  2009-12-13 21:39 mountpoint-crossing J. Bruce Fields
@ 2009-12-13 22:33 ` Trond Myklebust
  2009-12-14 13:38   ` mountpoint-crossing Jeff Layton
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2009-12-13 22:33 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Sun, 2009-12-13 at 16:39 -0500, J. Bruce Fields wrote: 
> On a recent kernel:
> 
> 	# mount -tnfs4 pearlet1:/ /mnt/
> 	# find /mnt/
> 	/mnt/
> 	find: File system loop detected; `/mnt/DIR' is part of the same
> 	file system loop as `/mnt/'.
> 
> Here /mnt/DIR is a server-side mountpoint, hence has a different fsid
> than /mnt/.  Wireshark confirms that the server is returning a different
> fsid.  However, 'strace -v find /mnt/' shows stat returning
> st_dev=makedev(0, 22) for both /mnt and /mnt/DIR.
> 
> If I then do a 'ls /mnt/DIR', followed by another find, the error goes
> away, and this time an strace shows that stat is returning (0, 23) for
> /mnt/DIR.
> 
> I don't see any obvious problem with the network trace, so it looks to
> me like the client is failing to recognize the mountpoint when it
> should?

This is a known consequence of the way we treat submounts (and
referrals); we're basically treating them as a special kind of symlink.
The problem then arises when syscalls such as stat() fail to set the
LOOKUP_FOLLOW flag, and so the user is granted a temporary peek of the
underlying inode.

I'm not sure how we should treat this. I suppose we could change the
test in __link_path_walk() so that it always call follow_link() if the
inode is not a symlink...

Cheers
  Trond

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mountpoint-crossing
  2009-12-13 22:33 ` mountpoint-crossing Trond Myklebust
@ 2009-12-14 13:38   ` Jeff Layton
       [not found]     ` <20091214083843.5e6e73f5-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2009-12-14 13:38 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: J. Bruce Fields, linux-nfs

On Sun, 13 Dec 2009 17:33:15 -0500
Trond Myklebust <Trond.Myklebust@netapp.com> wrote:

> On Sun, 2009-12-13 at 16:39 -0500, J. Bruce Fields wrote: 
> > On a recent kernel:
> > 
> > 	# mount -tnfs4 pearlet1:/ /mnt/
> > 	# find /mnt/
> > 	/mnt/
> > 	find: File system loop detected; `/mnt/DIR' is part of the same
> > 	file system loop as `/mnt/'.
> > 
> > Here /mnt/DIR is a server-side mountpoint, hence has a different fsid
> > than /mnt/.  Wireshark confirms that the server is returning a different
> > fsid.  However, 'strace -v find /mnt/' shows stat returning
> > st_dev=makedev(0, 22) for both /mnt and /mnt/DIR.
> > 
> > If I then do a 'ls /mnt/DIR', followed by another find, the error goes
> > away, and this time an strace shows that stat is returning (0, 23) for
> > /mnt/DIR.
> > 
> > I don't see any obvious problem with the network trace, so it looks to
> > me like the client is failing to recognize the mountpoint when it
> > should?
> 
> This is a known consequence of the way we treat submounts (and
> referrals); we're basically treating them as a special kind of symlink.
> The problem then arises when syscalls such as stat() fail to set the
> LOOKUP_FOLLOW flag, and so the user is granted a temporary peek of the
> underlying inode.
> 
> I'm not sure how we should treat this. I suppose we could change the
> test in __link_path_walk() so that it always call follow_link() if the
> inode is not a symlink...
> 

I looked at this problem recently based on a request by some of our
coreutils folks. A bit of the discussion is here:

    https://bugzilla.redhat.com/show_bug.cgi?id=533569

...and earlier:

    https://bugzilla.redhat.com/show_bug.cgi?id=501848

Jim Meyering also brought this up on LKML:

    http://lkml.org/lkml/2009/11/4/451

I'm a little leery of triggering a mount for any server-side mountpoint
that we just happen to have a peek at. That seems like it might get
expensive. Suppose you had 1000 filesystems mounted under the root
share here?

One idea in the mailing list discussion is to flag these inodes with
some sort of "i'm actually a mountpoint" flag and teach utilities that
care about inode numbers to deal with that. Not a great solution but it
wouldn't incur extra overhead.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mountpoint-crossing
       [not found]     ` <20091214083843.5e6e73f5-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
@ 2009-12-14 15:24       ` J. Bruce Fields
  2009-12-14 15:52         ` mountpoint-crossing Jeff Layton
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2009-12-14 15:24 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Trond Myklebust, linux-nfs

On Mon, Dec 14, 2009 at 08:38:43AM -0500, Jeff Layton wrote:
> On Sun, 13 Dec 2009 17:33:15 -0500
> Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> 
> > On Sun, 2009-12-13 at 16:39 -0500, J. Bruce Fields wrote: 
> > > On a recent kernel:
> > > 
> > > 	# mount -tnfs4 pearlet1:/ /mnt/
> > > 	# find /mnt/
> > > 	/mnt/
> > > 	find: File system loop detected; `/mnt/DIR' is part of the same
> > > 	file system loop as `/mnt/'.
> > > 
> > > Here /mnt/DIR is a server-side mountpoint, hence has a different fsid
> > > than /mnt/.  Wireshark confirms that the server is returning a different
> > > fsid.  However, 'strace -v find /mnt/' shows stat returning
> > > st_dev=makedev(0, 22) for both /mnt and /mnt/DIR.
> > > 
> > > If I then do a 'ls /mnt/DIR', followed by another find, the error goes
> > > away, and this time an strace shows that stat is returning (0, 23) for
> > > /mnt/DIR.
> > > 
> > > I don't see any obvious problem with the network trace, so it looks to
> > > me like the client is failing to recognize the mountpoint when it
> > > should?
> > 
> > This is a known consequence of the way we treat submounts (and
> > referrals); we're basically treating them as a special kind of symlink.
> > The problem then arises when syscalls such as stat() fail to set the
> > LOOKUP_FOLLOW flag, and so the user is granted a temporary peek of the
> > underlying inode.
> > 
> > I'm not sure how we should treat this. I suppose we could change the
> > test in __link_path_walk() so that it always call follow_link() if the
> > inode is not a symlink...
> > 
> 
> I looked at this problem recently based on a request by some of our
> coreutils folks. A bit of the discussion is here:
> 
>     https://bugzilla.redhat.com/show_bug.cgi?id=533569
> 
> ...and earlier:
> 
>     https://bugzilla.redhat.com/show_bug.cgi?id=501848
> 
> Jim Meyering also brought this up on LKML:
> 
>     http://lkml.org/lkml/2009/11/4/451
> 
> I'm a little leery of triggering a mount for any server-side mountpoint
> that we just happen to have a peek at. That seems like it might get
> expensive. Suppose you had 1000 filesystems mounted under the root
> share here?

For what it's worth, I'll admit that I ran across this just in
artificial testing--I'm not claiming it was causing me a real problem.

--b.

> 
> One idea in the mailing list discussion is to flag these inodes with
> some sort of "i'm actually a mountpoint" flag and teach utilities that
> care about inode numbers to deal with that. Not a great solution but it
> wouldn't incur extra overhead.
> 
> -- 
> Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mountpoint-crossing
  2009-12-14 15:24       ` mountpoint-crossing J. Bruce Fields
@ 2009-12-14 15:52         ` Jeff Layton
       [not found]           ` <20091214105214.64714867-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2009-12-14 15:52 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Trond Myklebust, linux-nfs

On Mon, 14 Dec 2009 10:24:18 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Mon, Dec 14, 2009 at 08:38:43AM -0500, Jeff Layton wrote:
> > On Sun, 13 Dec 2009 17:33:15 -0500
> > Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> > 
> > > On Sun, 2009-12-13 at 16:39 -0500, J. Bruce Fields wrote: 
> > > > On a recent kernel:
> > > > 
> > > > 	# mount -tnfs4 pearlet1:/ /mnt/
> > > > 	# find /mnt/
> > > > 	/mnt/
> > > > 	find: File system loop detected; `/mnt/DIR' is part of the same
> > > > 	file system loop as `/mnt/'.
> > > > 
> > > > Here /mnt/DIR is a server-side mountpoint, hence has a different fsid
> > > > than /mnt/.  Wireshark confirms that the server is returning a different
> > > > fsid.  However, 'strace -v find /mnt/' shows stat returning
> > > > st_dev=makedev(0, 22) for both /mnt and /mnt/DIR.
> > > > 
> > > > If I then do a 'ls /mnt/DIR', followed by another find, the error goes
> > > > away, and this time an strace shows that stat is returning (0, 23) for
> > > > /mnt/DIR.
> > > > 
> > > > I don't see any obvious problem with the network trace, so it looks to
> > > > me like the client is failing to recognize the mountpoint when it
> > > > should?
> > > 
> > > This is a known consequence of the way we treat submounts (and
> > > referrals); we're basically treating them as a special kind of symlink.
> > > The problem then arises when syscalls such as stat() fail to set the
> > > LOOKUP_FOLLOW flag, and so the user is granted a temporary peek of the
> > > underlying inode.
> > > 
> > > I'm not sure how we should treat this. I suppose we could change the
> > > test in __link_path_walk() so that it always call follow_link() if the
> > > inode is not a symlink...
> > > 
> > 
> > I looked at this problem recently based on a request by some of our
> > coreutils folks. A bit of the discussion is here:
> > 
> >     https://bugzilla.redhat.com/show_bug.cgi?id=533569
> > 
> > ...and earlier:
> > 
> >     https://bugzilla.redhat.com/show_bug.cgi?id=501848
> > 
> > Jim Meyering also brought this up on LKML:
> > 
> >     http://lkml.org/lkml/2009/11/4/451
> > 
> > I'm a little leery of triggering a mount for any server-side mountpoint
> > that we just happen to have a peek at. That seems like it might get
> > expensive. Suppose you had 1000 filesystems mounted under the root
> > share here?
> 
> For what it's worth, I'll admit that I ran across this just in
> artificial testing--I'm not claiming it was causing me a real problem.
> 

Understood. It's a bit of a dilemma...

Clearly though, it's going to be a problem for some programs that need
to deal with mountpoints (stuff like backup programs in particular).
The problem though is that I don't think we want to trigger a bunch of
submounts just because someone does a "ls -l" in a directory that holds
a bunch of server-side mountpoints.

The real problem I think is that we allocate new dev minor numbers at
mount time. The ideal thing might be to have the client somehow
pre-determine what the dev number of that mount would be without
actually doing the mount. Then we could just present that device number
in the stat call.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mountpoint-crossing
       [not found]           ` <20091214105214.64714867-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
@ 2009-12-14 16:04             ` J. Bruce Fields
  2009-12-14 16:44               ` mountpoint-crossing Jeff Layton
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2009-12-14 16:04 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Trond Myklebust, linux-nfs

On Mon, Dec 14, 2009 at 10:52:14AM -0500, Jeff Layton wrote:
> On Mon, 14 Dec 2009 10:24:18 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Mon, Dec 14, 2009 at 08:38:43AM -0500, Jeff Layton wrote:
> > > I looked at this problem recently based on a request by some of our
> > > coreutils folks. A bit of the discussion is here:
> > > 
> > >     https://bugzilla.redhat.com/show_bug.cgi?id=533569
> > > 
> > > ...and earlier:
> > > 
> > >     https://bugzilla.redhat.com/show_bug.cgi?id=501848
> > > 
> > > Jim Meyering also brought this up on LKML:
> > > 
> > >     http://lkml.org/lkml/2009/11/4/451
> > > 
> > > I'm a little leery of triggering a mount for any server-side mountpoint
> > > that we just happen to have a peek at. That seems like it might get
> > > expensive. Suppose you had 1000 filesystems mounted under the root
> > > share here?
> > 
> > For what it's worth, I'll admit that I ran across this just in
> > artificial testing--I'm not claiming it was causing me a real problem.
> > 
> 
> Understood. It's a bit of a dilemma...
> 
> Clearly though, it's going to be a problem for some programs that need
> to deal with mountpoints (stuff like backup programs in particular).
> The problem though is that I don't think we want to trigger a bunch of
> submounts just because someone does a "ls -l" in a directory that holds
> a bunch of server-side mountpoints.
> 
> The real problem I think is that we allocate new dev minor numbers at
> mount time.

So you're not saying that the minor number allocation is the expensive
part, you're saying that it's cheap and something that we could do
before we do the rest of the mount?

> The ideal thing might be to have the client somehow
> pre-determine what the dev number of that mount would be without
> actually doing the mount. Then we could just present that device number
> in the stat call.

We also need the inode number, for example, which may require an rpc
call.

So what is the most expensive part of a mount?

For a directory full of referral points, there's the problem that you
don't want to have to wait on stat calls from a lot of different
servers.  But maybe that should be handled as a special case.

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mountpoint-crossing
  2009-12-14 16:04             ` mountpoint-crossing J. Bruce Fields
@ 2009-12-14 16:44               ` Jeff Layton
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2009-12-14 16:44 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Trond Myklebust, linux-nfs

On Mon, 14 Dec 2009 11:04:01 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Mon, Dec 14, 2009 at 10:52:14AM -0500, Jeff Layton wrote:
> > On Mon, 14 Dec 2009 10:24:18 -0500
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Mon, Dec 14, 2009 at 08:38:43AM -0500, Jeff Layton wrote:
> > > > I looked at this problem recently based on a request by some of our
> > > > coreutils folks. A bit of the discussion is here:
> > > > 
> > > >     https://bugzilla.redhat.com/show_bug.cgi?id=533569
> > > > 
> > > > ...and earlier:
> > > > 
> > > >     https://bugzilla.redhat.com/show_bug.cgi?id=501848
> > > > 
> > > > Jim Meyering also brought this up on LKML:
> > > > 
> > > >     http://lkml.org/lkml/2009/11/4/451
> > > > 
> > > > I'm a little leery of triggering a mount for any server-side mountpoint
> > > > that we just happen to have a peek at. That seems like it might get
> > > > expensive. Suppose you had 1000 filesystems mounted under the root
> > > > share here?
> > > 
> > > For what it's worth, I'll admit that I ran across this just in
> > > artificial testing--I'm not claiming it was causing me a real problem.
> > > 
> > 
> > Understood. It's a bit of a dilemma...
> > 
> > Clearly though, it's going to be a problem for some programs that need
> > to deal with mountpoints (stuff like backup programs in particular).
> > The problem though is that I don't think we want to trigger a bunch of
> > submounts just because someone does a "ls -l" in a directory that holds
> > a bunch of server-side mountpoints.
> > 
> > The real problem I think is that we allocate new dev minor numbers at
> > mount time.
> 
> So you're not saying that the minor number allocation is the expensive
> part, you're saying that it's cheap and something that we could do
> before we do the rest of the mount?
> 

Yeah, minor number allocation is fairly cheap (it's just IDA hash calls
I think). If we wanted to try and preallocate them then we have to
consider how long to cache them too. The hassle of doing that may outweigh the expense of triggering

I am making an assumption that mounts are somewhat expensive to do.
Maybe you can convince me otherwise. :)


> > The ideal thing might be to have the client somehow
> > pre-determine what the dev number of that mount would be without
> > actually doing the mount. Then we could just present that device number
> > in the stat call.
> 
> We also need the inode number, for example, which may require an rpc
> call.
> 

We already have that, right? We've done a GETATTR (or equivalent) and
noticed that the inode has a different fsid. We even send back the
"real" inode number in the statbuf (but obviously w/o the right
device info since the mount hasn't been triggered yet).

> So what is the most expensive part of a mount?
> 
> For a directory full of referral points, there's the problem that you
> don't want to have to wait on stat calls from a lot of different
> servers.  But maybe that should be handled as a special case.
> 

Good question. I suppose I was making an assumption here that
triggering a mount would mean at least some RPC's and that might make
that "ls -l" stall for a while if we have to talk to a bunch of
different servers.

Maybe I'm blowing the performance hit out of proportion though? That
said, I'm not crazy about altering the generic VFS to fix this. I
wonder if there's another way to do it?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-12-14 16:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-13 21:39 mountpoint-crossing J. Bruce Fields
2009-12-13 22:33 ` mountpoint-crossing Trond Myklebust
2009-12-14 13:38   ` mountpoint-crossing Jeff Layton
     [not found]     ` <20091214083843.5e6e73f5-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2009-12-14 15:24       ` mountpoint-crossing J. Bruce Fields
2009-12-14 15:52         ` mountpoint-crossing Jeff Layton
     [not found]           ` <20091214105214.64714867-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2009-12-14 16:04             ` mountpoint-crossing J. Bruce Fields
2009-12-14 16:44               ` mountpoint-crossing Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.