* rpc.mountd --manage-gids breaks on UID differences
@ 2009-11-17 15:39 Sander Smeenk
[not found] ` <20091117153928.GA12493-N0d2glMUd7m2/GFIlvLUCQ@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Sander Smeenk @ 2009-11-17 15:39 UTC (permalink / raw)
To: linux-nfs
Hi,
I ran into this problem the other day which i'd like to report as a
possible bug or at least an inconvenience. Please read and spill your
thoughts if you want.
In this situation i had a x86_64 Linux machine running Ubuntu 8.04.3 LTS
with kernel 2.6.24-25-generic and nfs-utils 1:1.1.2-2ubuntu2.
/usr/sbin/rpc.mountd identifies itself as 'kmountd 1.1.2'.
This machine acts as the server and has '--manage-gids' specified in
/etc/default/nfs-kernel-server's RPCMOUNTDOPTS. It exports one share
to a /24 subnet with (rw,sync,no_subtree_check,no_root_squash) options.
I have another machine acting as client, same OS-release, same kernel,
this machine only has nfs-common utilities installed and thus does not
run rpc.mountd. It mounts the NFS-server with 'rw,addr=172.17.145.210'
mountoptions. Nothing fancy there either.
There's no firewall between the machines.
On the server i have a user 'foo' with UID '1000' and GID '65000',
On the client i have a user 'foo' with UID '1001' and GID '65000'.
Username is identical as is the GID, the UID isn't.
If user 'foo' on the nfs-client tries to reach the mountpoint where the
nfs-server is mounted the shell of this user will freeze. For example,
user 'foo' on the nfs-client types 'cd /mnt/nfs/ap<tabtab>' and stalls.
The clientmachine then begins to log 'nfs: server 172.17.145.210 not
responding, still trying' and this user's shell doesn't recover in any
way until forcefully killed.
Funny thing is, a user 'bar' with the same UID on the server- and
client machine can still access the NFS-mount on the client even when
the session for user 'foo' is stalling. Also, running 'rpcinfo -p' or
'rpcinfo -u <host> <progid>' on the server reports all normal values.
The server logs nothing.
Aparently this has to do with UIDs being different on the nfs-server and
the nfs-client as i discovered debugging this problem in two VMs. It's
quite easy to reproduce and also seems to happen if the user on the
clientmachine is nonexistant on the servermachine.
Disabling '--manage-gids' and remouting, restarting or rebooting
completely fixes the problem. Reintroducing '--manage-gids' breaks it
again.
It appears to me '--manage-gids' is completely broken in this setup, or
i misunderstand what --manage-gids does, completely. I haven't tried
this on newer kernels (2.6.31-14-generic f.e.).
I do have kernel debug logs during broken and working NFS transactions
which i got by echoing the correct bitmask to /proc/sys/sunrpc/*debug.
http://www.freshdot.net/tmp/client-broken-syslog
http://www.freshdot.net/tmp/server-broken-syslog
vs
http://www.freshdot.net/tmp/client-working-syslog
http://www.freshdot.net/tmp/server-working-syslog
Is this behaviour intended?
Hope to hear from you!
With regards,
Sander Smeenk.
--
| When you've seen one shopping center you've seen a mall.
| 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7 FBD6 F3A9 9442 20CC 6CD2
^ permalink raw reply [flat|nested] 6+ messages in thread[parent not found: <20091117153928.GA12493-N0d2glMUd7m2/GFIlvLUCQ@public.gmane.org>]
* Re: rpc.mountd --manage-gids breaks on UID differences [not found] ` <20091117153928.GA12493-N0d2glMUd7m2/GFIlvLUCQ@public.gmane.org> @ 2009-11-17 20:08 ` J. Bruce Fields 2009-11-17 20:43 ` Sander Smeenk 2009-11-26 9:16 ` Sander Smeenk 0 siblings, 2 replies; 6+ messages in thread From: J. Bruce Fields @ 2009-11-17 20:08 UTC (permalink / raw) To: Sander Smeenk; +Cc: linux-nfs On Tue, Nov 17, 2009 at 04:39:28PM +0100, Sander Smeenk wrote: > Aparently this has to do with UIDs being different on the nfs-server and > the nfs-client as i discovered debugging this problem in two VMs. It's > quite easy to reproduce and also seems to happen if the user on the > clientmachine is nonexistant on the servermachine. > > Disabling '--manage-gids' and remouting, restarting or rebooting > completely fixes the problem. Reintroducing '--manage-gids' breaks it > again. Just from those symptoms it sounds to me like rpc.mountd isn't responding to upcalls when asked about a uid for which there's no account on the server. It should be returning a negative response immediately. (I assume your server isn't using ldap or nis or something that could cause lookups of a uid to take a long time?) --b. > It appears to me '--manage-gids' is completely broken in this setup, or > i misunderstand what --manage-gids does, completely. I haven't tried > this on newer kernels (2.6.31-14-generic f.e.). > > I do have kernel debug logs during broken and working NFS transactions > which i got by echoing the correct bitmask to /proc/sys/sunrpc/*debug. > > http://www.freshdot.net/tmp/client-broken-syslog > http://www.freshdot.net/tmp/server-broken-syslog > vs > http://www.freshdot.net/tmp/client-working-syslog > http://www.freshdot.net/tmp/server-working-syslog > > Is this behaviour intended? > Hope to hear from you! > > With regards, > Sander Smeenk. > -- > | When you've seen one shopping center you've seen a mall. > | 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7 FBD6 F3A9 9442 20CC 6CD2 > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rpc.mountd --manage-gids breaks on UID differences 2009-11-17 20:08 ` J. Bruce Fields @ 2009-11-17 20:43 ` Sander Smeenk 2009-11-18 21:31 ` J. Bruce Fields 2009-11-26 9:16 ` Sander Smeenk 1 sibling, 1 reply; 6+ messages in thread From: Sander Smeenk @ 2009-11-17 20:43 UTC (permalink / raw) To: linux-nfs Quoting J. Bruce Fields (bfields@fieldses.org): > > Disabling '--manage-gids' and remouting, restarting or rebooting > > completely fixes the problem. Reintroducing '--manage-gids' breaks it > > again. > Just from those symptoms it sounds to me like rpc.mountd isn't > responding to upcalls when asked about a uid for which there's no > account on the server. It should be returning a negative response > immediately. Something like that sounds right. Aparently the client is waiting for the server to respond, that's what i can tell from the debug logs[1] timestamped 14243124.709876 through 14243184.647892. Which eventually leads to the kernel logging the 'not responding' message at 14243306.507319. [1] http://www.freshdot.net/tmp/client-broken-syslog > (I assume your server isn't using ldap or nis or something that could > cause lookups of a uid to take a long time?) Correct. My client and server are using UNIX auth from shadow files. The timeframe described above matches the lines from the beginning up to 3732517.859721 in the server debug log[2]. I'd have to dig in the kernel code to find out what lines 3732513.221898 through 3732513.221913 exactly tell me. Is anyone on this list an RPC-code ninja? And what exactly is the 'NFSD laundromat service' :) [2] http://www.freshdot.net/tmp/server-broken-syslog Thanks, Sander. -- | A box without hinges, key, or lid, yet golden treasure inside is hid. | 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7 FBD6 F3A9 9442 20CC 6CD2 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rpc.mountd --manage-gids breaks on UID differences 2009-11-17 20:43 ` Sander Smeenk @ 2009-11-18 21:31 ` J. Bruce Fields 0 siblings, 0 replies; 6+ messages in thread From: J. Bruce Fields @ 2009-11-18 21:31 UTC (permalink / raw) To: linux-nfs On Tue, Nov 17, 2009 at 09:43:33PM +0100, Sander Smeenk wrote: > Quoting J. Bruce Fields (bfields@fieldses.org): > > > > Disabling '--manage-gids' and remouting, restarting or rebooting > > > completely fixes the problem. Reintroducing '--manage-gids' breaks it > > > again. > > Just from those symptoms it sounds to me like rpc.mountd isn't > > responding to upcalls when asked about a uid for which there's no > > account on the server. It should be returning a negative response > > immediately. > > Something like that sounds right. Aparently the client is waiting for > the server to respond, that's what i can tell from the debug logs[1] > timestamped 14243124.709876 through 14243184.647892. Which eventually > leads to the kernel logging the 'not responding' message at > 14243306.507319. > > [1] http://www.freshdot.net/tmp/client-broken-syslog > > > (I assume your server isn't using ldap or nis or something that could > > cause lookups of a uid to take a long time?) > > Correct. My client and server are using UNIX auth from shadow files. > > The timeframe described above matches the lines from the beginning up to > 3732517.859721 in the server debug log[2]. I'd have to dig in the kernel > code to find out what lines 3732513.221898 through 3732513.221913 > exactly tell me. Is anyone on this list an RPC-code ninja? I don't think there's anything interesting in there. If you do: date +%s >/proc/net/rpc/auth.unix.gid/flush strace -e trace=read,write -s4096 -p`pidof rpc.mountd` then do whatever you do the client to reproduce the problem, the resulting strace output might be interesting. I can't reproduce the problem with nfs-utils from karmic (1:1.1.4-1ubuntu1) and a recent kernel. --b. > > And what exactly is the 'NFSD laundromat service' :) > > [2] http://www.freshdot.net/tmp/server-broken-syslog > > Thanks, > Sander. > -- > | A box without hinges, key, or lid, yet golden treasure inside is hid. > | 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7 FBD6 F3A9 9442 20CC 6CD2 > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rpc.mountd --manage-gids breaks on UID differences 2009-11-17 20:08 ` J. Bruce Fields 2009-11-17 20:43 ` Sander Smeenk @ 2009-11-26 9:16 ` Sander Smeenk [not found] ` <20091126091657.GL15295-N0d2glMUd7m2/GFIlvLUCQ@public.gmane.org> 1 sibling, 1 reply; 6+ messages in thread From: Sander Smeenk @ 2009-11-26 9:16 UTC (permalink / raw) To: linux-nfs ** Sorry for messing up the thread - My mailconfig started rejecting mail from vger.kernel.org for which i am eternally sorry. This message i'm replying to was copied from marc.info ** Quoting J. Bruce Fields (bfields@fieldses.org): > > The timeframe described above matches the lines from the beginning > > up to 3732517.859721 in the server debug log[2]. I'd have to dig in > > the kernel code to find out what lines 3732513.221898 through > > 3732513.221913 exactly tell me. > > Is anyone on this list an RPC-code ninja? > > I don't think there's anything interesting in there. > If you do: > > date +%s >/proc/net/rpc/auth.unix.gid/flush > strace -e trace=read,write -s4096 -p`pidof rpc.mountd` > > then do whatever you do the client to reproduce the problem, the > resulting strace output might be interesting. This is the result of said strace. Server's auth.unix.gid was flushed, client reboots and auto-mounts the NFS-share: | [ .. ] | read(12, "172.17.145.222:/mnt/data/exports/application:0x00000009\n", 4096) = 56 | write(12, "172.17.145.222:/mnt/data/exports/application:0x0000000a\n", 56) = 56 | read(4, "0\n", 2048) = 2 | write(4, "0 1259227707 1 0 \n", 18) = 18 Again i flushed auth.unix.gid and directly accessed a file as root from the client: | read(4, "0\n", 2048) = 2 | write(4, "0 1259227903 1 0 \n", 18) = 18 This works as expected, file contents returned. Again i flushed auth.unix.gid and switched to the user with the mismatching uid on the server & client, accessed the exact same file directly: | read(4, "1002\n", 2048) = 5 | write(4, "1002 1259227918 \n", 17) = -1 EINVAL (Invalid argument) These two lines repeat at a very slow interval while the client retries: | read(4, "1002\n", 2048) = 5 | write(4, "1002 1259228098 \n", 17) = -1 EINVAL (Invalid argument) | read(4, "1002\n", 2048) = 5 | write(4, "1002 1259228278 \n", 17) = -1 EINVAL (Invalid argument) Filedescriptor 4 for the rpc.mountd process is connected to /proc/net/rpc/auth.unix.gid/channel. After this user has tried to access the file, NFS-operation stalls for all users on the client. > I can't reproduce the problem with nfs-utils from karmic > (1:1.1.4-1ubuntu1) and a recent kernel. Well, i've never ever seen this before either. It seems like it has been introduced with some of the more recent updates to either the kernel or the nfs-utils contained in Ubuntu Hardy 8.04.3 LTS. HTH, with regards, -Sander. -- | When everything is coming your way, you're in the wrong lane. | 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7 FBD6 F3A9 9442 20CC 6CD2 ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20091126091657.GL15295-N0d2glMUd7m2/GFIlvLUCQ@public.gmane.org>]
* Re: rpc.mountd --manage-gids breaks on UID differences [not found] ` <20091126091657.GL15295-N0d2glMUd7m2/GFIlvLUCQ@public.gmane.org> @ 2009-11-27 19:05 ` J. Bruce Fields 0 siblings, 0 replies; 6+ messages in thread From: J. Bruce Fields @ 2009-11-27 19:05 UTC (permalink / raw) To: linux-nfs On Thu, Nov 26, 2009 at 10:16:58AM +0100, Sander Smeenk wrote: > ** Sorry for messing up the thread - My mailconfig started rejecting > mail from vger.kernel.org for which i am eternally sorry. This message > i'm replying to was copied from marc.info ** > > Quoting J. Bruce Fields (bfields@fieldses.org): > > > > The timeframe described above matches the lines from the beginning > > > up to 3732517.859721 in the server debug log[2]. I'd have to dig in > > > the kernel code to find out what lines 3732513.221898 through > > > 3732513.221913 exactly tell me. > > > Is anyone on this list an RPC-code ninja? > > > > I don't think there's anything interesting in there. > > If you do: > > > > date +%s >/proc/net/rpc/auth.unix.gid/flush > > strace -e trace=read,write -s4096 -p`pidof rpc.mountd` > > > > then do whatever you do the client to reproduce the problem, the > > resulting strace output might be interesting. > > This is the result of said strace. Server's auth.unix.gid was flushed, > client reboots and auto-mounts the NFS-share: > > | [ .. ] > | read(12, "172.17.145.222:/mnt/data/exports/application:0x00000009\n", 4096) = 56 > | write(12, "172.17.145.222:/mnt/data/exports/application:0x0000000a\n", 56) = 56 > | read(4, "0\n", 2048) = 2 > | write(4, "0 1259227707 1 0 \n", 18) = 18 > > Again i flushed auth.unix.gid and directly accessed a file as root from > the client: > > | read(4, "0\n", 2048) = 2 > | write(4, "0 1259227903 1 0 \n", 18) = 18 > > This works as expected, file contents returned. Again i flushed > auth.unix.gid and switched to the user with the mismatching uid on the > server & client, accessed the exact same file directly: > > | read(4, "1002\n", 2048) = 5 > | write(4, "1002 1259227918 \n", 17) = -1 EINVAL (Invalid argument) > > These two lines repeat at a very slow interval while the client retries: OK, thanks. Looking through the git logs.... Looks like this problem was addressed recently in nfs-utils, by making mountd pass down a zero-length list of gid's instead of just passing down a negative response. The patch went in between 1.1.3 and 1.1.4. (Arguably maybe the kernel should also be modified to interpret a negative response as a zero-length list. I'd accept a patch.) --b. commit 86c3a79a108091fe08869a887438cc2d4e1126ed Author: Neil Brown <neilb@suse.de> Date: Wed Aug 27 16:30:19 2008 -0400 mount issue with Mac OSX and --manage-gids, client hangs Make sure are zero len group list is sent down to the kernel when the gids do not exist on the server. Tested-by: Alex Samad <alex-SGFoFqf0RKf0CCvOHzKKcA@public.gmane.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Steve Dickson <steved@redhat.com> diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c index f555dcc..609c6e3 100644 --- a/utils/mountd/cache.c +++ b/utils/mountd/cache.c @@ -158,8 +158,10 @@ void auth_unix_gid(FILE *f) qword_printint(f, ngroups); for (i=0; i<ngroups; i++) qword_printint(f, groups[i]); - } + } else + qword_printint(f, 0); qword_eol(f); + if (groups != glist) free(groups); } ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-11-27 19:04 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-17 15:39 rpc.mountd --manage-gids breaks on UID differences Sander Smeenk
[not found] ` <20091117153928.GA12493-N0d2glMUd7m2/GFIlvLUCQ@public.gmane.org>
2009-11-17 20:08 ` J. Bruce Fields
2009-11-17 20:43 ` Sander Smeenk
2009-11-18 21:31 ` J. Bruce Fields
2009-11-26 9:16 ` Sander Smeenk
[not found] ` <20091126091657.GL15295-N0d2glMUd7m2/GFIlvLUCQ@public.gmane.org>
2009-11-27 19:05 ` J. Bruce Fields
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox