All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
@ 2005-01-11 21:22 David Meleedy
  2005-01-12  5:38 ` Ian Kent
                   ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: David Meleedy @ 2005-01-11 21:22 UTC (permalink / raw)
  To: autofs


Hi Ian & Jeff,
	I am trying to track down an autofs issue that has been
plaguing us.  It seems to be caused by the interaction of autofs version
4 with a Network Appliance server, and cd'ing to /net directories
on the Netapp server.

A similar issue was seen in Analog Devices in Redhat 8, and apparently
the problem was worked around by Dwight Marzolf working with Ian Kent's
help.  So following what Dwight did I have been trying to recreate the fix
for Redhat Enterprise 3 update 3, and so far have not met with success.

THE PROBLEM DESCRIPTION:

Autofs hangs and refuses to mount any directories for a period of time
after cd'ing to /net/<Netapp>/vol/vol[0-3] and waiting a while.
The only way to clear this is to reboot the client.

Initially we started using the following software (Redhat Enterprise 3 update 
3)
autofs 4.1.3-12
kernel 2.4.21-20
nfs-utils 1.0.6-31EL

WHAT HAS BEEN TRIED SO FAR:

Mike Waychison, after seeing the messages from our log file said,

"These messages are due to starvation for reserved ports (< 1024).
Specifically, the kernel will only use ports < 800.  Currently, the
kernel uses one port per nfs filesystem.  If you mount filesystems very
fast, then you can also run out of reserved ports as the local (mountd
iirc?) will close tcp sessions and each must wait 2 minutes before being
released.

One solution is to try out the patch I posted last week that allows nfs
mounts to share tcp/udp connections:

http://marc.theaimsgroup.com/?l=linux-nfs&m=110261671705396&w=2
"

The problem is we are using a different version of the kernel 2.4,
and his patch was for the 2.6 kernel.  Also, although his patch
might make the number of ports available increase, I think it does
not really solve the problem, it just gives more breathing room.

After talking with Jeff Moyer about the issue, I updated autofs to 
autofs-4.1.3-67.  This was supposed to incorporate a patch that fixes
the port leak problem.

This did not solve the problem, but it did seem to improve things a bit.

After looking at Dwight Marzolf's document on his workaround I found
the following information (this is exactly the same sort of thing we
are seeing too):

"
we quickly found that if you did a cd via /net to one of our Network
Appliance filers (all our other netapp filers worked correctly when
unmounting /net mounts), the port release issue still existed.  In
fact, the mountpoints actively took more ports.  This meant that if you
mounted this filer with /net, your workstation could be rendered
useless in less than 24 hours.  It also became evident that this active
taking of ports by this filer was not limited to just autofs-4.1.3-28
but also earlier versions of autofs  ...  Further
research revealed the ports were being taken at the point of automount
timeout.  When the automounter had declared these mountpoints to be
timed out and ready to be unmounted and attempted to umount them, in
fact, it ended up remounting them, using new ports for the remount ...
"

HOW TO REPRODUCE THE PROBLEM:

Actually in our case we can render a machine useless in just about an
hour or two, and this happens for all of our Netapp filers.  The procedure
to do this is reproducible.

1) You cd to a /net directory on the filer.
2) Leave the shell in that /net directory for about 15 minutes-> 1/2 an hour.
and watch the "BUG" messages in the /var/log/messages file.

3) Log out. (so the automounter tries to unmount everything that was mounted).
4) Log in again, after 30 minutes and by then you won't be about to 
mount anything anymore

You can replace steps 3 and 4 with "init 6".  When the automounter process
is stopped by init, you will see the port messages scroll up the console
screen.

EXAMPLE OF REPRODUCING THE PROBLEM:

codered-51: cd /net/aflac/vol/vol2
( I can't help but wonder if this BUG message that shows up once a minute
is indicative of a problem )

codered-52: tail -f /var/log/messages
Jan 11 15:32:37 codered automount[6214]: attempting to mount entry /net/aflac
Jan 11 15:33:41 codered automount[7915]: BUG: /net/aflac/vol/vol2 already 
mounted
Jan 11 15:34:42 codered automount[8049]: BUG: /net/aflac/vol/vol2 already 
mounted
Jan 11 15:36:42 codered automount[8311]: BUG: /net/aflac/vol/vol2 already 
mounted
Jan 11 15:37:43 codered automount[8441]: BUG: /net/aflac/vol/vol2 already 
mounted
 ... (continues once a minute to print out this bug) ...
codered-53: sudo init 6
(after reboot log in to see error messages)

THE REALLY WEIRD PART:
Now the interesting thing here is that the machine is rebooting, so
there is no program requesting additional mounts, yet here in the log
files you can see that almost every subdirectory of /vol/vol2, /vol/vol3
and /vol/vol3 are attempted to be mounted, even though the only
thing that should be happening is an unmount of the directory aflac:/vol/vol2

jetcar-189: cd /net/aflac/vol/vol3
jetcar-190: ls
ad1983/      cad_archive/ emerald/     layout_old/  ta/          
archive/     design/      is_013std/   lx3/  
jetcar-191: cd ../vol2
jetcar-192: ls
9xcores/         danube/          nwd_layout/      ulc3/
DSPS_Finance/    gpdsp_PLD/       nwd_testmgr/     win2k/
WWM/             gpdsp_marketing/ pc_backups/      
bitpower/        india_mirror/    sh/              
bluetooth/       nile/            spitfire/        
jetcar-194: cd ../vol1
etcar-195: ls
IssueManager/ diablo/       is_013std/    ras/          tigersharc/
admin/        ed/           jordan/       soft/         
archive/      fsp/          nwd_fsp@      teton_lite/   
cpd/          herc_eval/    pe_workspace/ thor/         


codered-54: less /var/log/messages
Jan 11 15:51:14 codered automount[6214]: can't shutdown: filesystem /net still 
busy
Jan 11 15:51:17 codered autofs: automount -USR2 succeeded
Jan 11 15:51:19 codered automount[6214]: can't shutdown: filesystem /net still 
busy
Jan 11 15:51:20 codered autofs: automount -USR2 succeeded
Jan 11 15:51:23 codered autofs: automount -USR2 succeeded
Jan 11 15:51:26 codered autofs: automount -USR2 succeeded
Jan 11 15:51:26 codered automount[6214]: can't shutdown: filesystem /net still 
busy
Jan 11 15:51:28 codered automount[14708]: >> mount: wrong fs type, bad option, 
bad superblock on aflac:/vol/vol2/spitfire,
Jan 11 15:51:28 codered automount[14708]: >>        or too many mounted file 
sys
tems
Jan 11 15:51:28 codered automount[14708]: mount(nfs): nfs: mount failure 
aflac:/
vol/vol2/spitfire on /net/aflac/vol/vol2/spitfire
Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
Jan 11 15:51:28 codered kernel: nfs_read_super: get root inode failed
Jan 11 15:51:28 codered kernel: nfs warning: mount version older than kernel
Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
Jan 11 15:51:28 codered kernel: nfs_read_super: get root inode failed
Jan 11 15:51:28 codered automount[14708]: >> mount: wrong fs type, bad option, 
bad superblock on aflac:/vol/vol2/ulc3,
Jan 11 15:51:28 codered automount[14708]: >>        or too many mounted file 
systems
Jan 11 15:51:28 codered automount[14708]: mount(nfs): nfs: mount failure 
aflac:/vol/vol2/ulc3 on /net/aflac/vol/vol2/ulc3
...
This same pattern of error messages repeats for (in this order)
aflac:/vol/vol2/win2k
aflac:/vol/vol3/ad1983
aflac:/vol/vol3/archive
aflac:/vol/vol3/cad_archive
aflac:/vol/vol3/design
aflac:/vol/vol3/emerald
aflac:/vol/vol3
aflac:/vol/vol3/is_013std
aflac:/vol/vol3/layout_old
aflac:/vol/vol3/lx3
aflac:/vol/vol3/ta
aflac:/vol/vol2/DSPS_Finance
aflac:/vol/vol2
aflac:/vol/vol2/gpdsp_marketing
aflac:/vol/vol2/gpdsp_PLD
aflac:/vol/vol2/india_mirror
aflac:/vol/vol2/nile
aflac:/vol/vol2/nwd_layout
aflac:/vol/vol2/nwd_testmgr
aflac:/vol/vol2/pc_backups
aflac:/vol/vol2/sh

aflac:/vol/vol2/spitfire (repeats the whole thing again)
eventually gets to vol1:
...
aflac:/vol/vol3/ta
aflac:/vol/vol1/pe_workspace
aflac:/vol/vol1/ras
aflac:/vol/vol1/soft
aflac:/vol/vol1/teton_lite
aflac:/vol/vol1/thor
aflac:/vol/vol1/tigersharc
aflac:/vol/vol2/9xcores
aflac:/vol/vol2/bitpower
aflac:/vol/vol2/bluetooth
aflac:/vol/vol2/danube
aflac:/vol/vol2/DSPS_Finance
... (repeats the whole thing again)...

Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol3/ta 
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol3/lx3 
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol3/layout_old
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol3/is_013std
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol3 
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol2/win2k
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol2/ulc3
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol2/spitfire
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol2/sh 
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol2/pc_backups
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol2/nwd_testmgr
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol2/nwd_layout
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol2/nile
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol2/india_mirror
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol2/gpdsp_marketing
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol2/gpdsp_PLD
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol2 
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol1/tigersharc
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol1/thor
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol1/teton_lite
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol1/soft
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol1/ras 
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol1/pe_workspace
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol1/jordan
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol1/is_013std
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol1/herc_eval
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol1/fsp 
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
/net/aflac/vol/vol1/IssueManager
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol1 
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol0 
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol 
Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac 
Jan 11 15:51:37 codered automount[15971]: expired /net/aflac
Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol/vol3 
Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol/vol2 
Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol/vol1 
Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol 
Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac 
Jan 11 15:51:37 codered automount[15974]: expired /net/aflac
Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol/vol3 
Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol/vol2 
Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol/vol1 
Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol 
Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac 
Jan 11 15:51:37 codered automount[15975]: expired /net/aflac
Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol/vol3 
Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol/vol2 
Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol/vol1 
Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol 
Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac 
Jan 11 15:51:37 codered automount[15976]: expired /net/aflac
Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol/vol3 
Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol/vol2 
Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol/vol1 
Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol 
Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac 
Jan 11 15:51:37 codered automount[15977]: expired /net/aflac
Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol/vol3 
Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol/vol2 
Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol/vol1 
Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol 
Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac 
Jan 11 15:51:38 codered automount[15978]: expired /net/aflac
Jan 11 15:51:38 codered autofs: automount -USR2 succeeded
Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol/vol3 
Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol/vol2 
Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol/vol1 
Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol 
Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac 
Jan 11 15:51:38 codered automount[15986]: expired /net/aflac
Jan 11 15:51:39 codered automount[6214]: can't shutdown: filesystem /net still 
busy
.... (keeps repeating) ....
Jan 11 15:51:45 codered automount[6214]: can't shutdown: filesystem /net still 
busy
Jan 11 15:51:47 codered autofs: automount shutdown failed



HOW IT WAS FIXED IN REDHAT 8:

Dwight had implemented his fix in 3 steps for Redhat 8:
1) He updated his autofs to autofs-4.1.3-28 which had the port leak fix
2) He patched his kernel with the autofs4-2.4.20-20040508.patch
(is some equivalent patch needed for Redhat 3 Enterprise 3 which uses 
kernel 2.4.21-20 ?
3) He changed the way he exported filesystems from the Netapp:

"The last issue was the matter of how /vol/vol0 is exported from a
Network Appliance filer.  We found that the following exports broke
autofs4:

/vol/vol0     -root=node1:node2:node3:node4
/vol/vol0     -rw,root=node1:node2:node3
/vol/vol0     -anon=0

The export syntax that worked was:

/vol/vol0       -rw=node1:node2,root=node1,node2
"

WHAT HAPPENED WHEN I TRIED THE REDHAT 8 WORKAROUND:

Now when I tried to do something similar, I found that if you weren't
on node1 or node2, the filesystem was read-only, so I had to do this:

/vol/vol1	-rw=node1:node2,root=node1,node2
/vol/vol1/foo1	-root=node1:node2
/vol/vol1/foo2  -root=node1:node2

This way if you cd /net/filer/vol/vol1 it was read-only for most machines
but if you cd'd to /net/filer/vol/vol1/foo1 it was read-write.  

So using that Netapp export workaround that fixed the Redhat 8 autofs4 problem,
plus using autofs-4.1.3-67 has not yet solved the problem yet for our
Redhat Enterprise 3 clients.

CONCLUSION:

I hope this is enough info to track down this problem.  It appears
as though the interaction of using /net with a Netapp is causing
spurious mounts, and unmounting is not working.  I will assist with
any patch tests that you require, so let me know, and I will be able
to verify any fixes.

Thanks,

-Dave

________________________________________________________________________
David Meleedy				Analog Devices, Inc.
David.Meleedy@analog.com		Three Technology Way
Phone: 781 461 3494			Norwood, MA  02062-9106  USA

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-01-11 21:22 BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems David Meleedy
@ 2005-01-12  5:38 ` Ian Kent
  2005-01-12 16:55   ` Mike Waychison
  2005-01-12 14:50 ` raven
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 37+ messages in thread
From: Ian Kent @ 2005-01-12  5:38 UTC (permalink / raw)
  To: David Meleedy; +Cc: autofs


Just a quick note before we get deep into this.

Can you check something for me.
Get the source rpm for util-linux.
Check if there is a patch applied to it to probe for services during 
mount (it was a patch in FC). If it is rebuild the rpm without it and test 
again.

On Tue, 11 Jan 2005, David Meleedy wrote:

> 
> Hi Ian & Jeff,
> 	I am trying to track down an autofs issue that has been
> plaguing us.  It seems to be caused by the interaction of autofs version
> 4 with a Network Appliance server, and cd'ing to /net directories
> on the Netapp server.
> 
> A similar issue was seen in Analog Devices in Redhat 8, and apparently
> the problem was worked around by Dwight Marzolf working with Ian Kent's
> help.  So following what Dwight did I have been trying to recreate the fix
> for Redhat Enterprise 3 update 3, and so far have not met with success.
> 
> THE PROBLEM DESCRIPTION:
> 
> Autofs hangs and refuses to mount any directories for a period of time
> after cd'ing to /net/<Netapp>/vol/vol[0-3] and waiting a while.
> The only way to clear this is to reboot the client.
> 
> Initially we started using the following software (Redhat Enterprise 3 update 
> 3)
> autofs 4.1.3-12
> kernel 2.4.21-20
> nfs-utils 1.0.6-31EL
> 
> WHAT HAS BEEN TRIED SO FAR:
> 
> Mike Waychison, after seeing the messages from our log file said,
> 
> "These messages are due to starvation for reserved ports (< 1024).
> Specifically, the kernel will only use ports < 800.  Currently, the
> kernel uses one port per nfs filesystem.  If you mount filesystems very
> fast, then you can also run out of reserved ports as the local (mountd
> iirc?) will close tcp sessions and each must wait 2 minutes before being
> released.
> 
> One solution is to try out the patch I posted last week that allows nfs
> mounts to share tcp/udp connections:
> 
> http://marc.theaimsgroup.com/?l=linux-nfs&m=110261671705396&w=2
> "
> 
> The problem is we are using a different version of the kernel 2.4,
> and his patch was for the 2.6 kernel.  Also, although his patch
> might make the number of ports available increase, I think it does
> not really solve the problem, it just gives more breathing room.
> 
> After talking with Jeff Moyer about the issue, I updated autofs to 
> autofs-4.1.3-67.  This was supposed to incorporate a patch that fixes
> the port leak problem.
> 
> This did not solve the problem, but it did seem to improve things a bit.
> 
> After looking at Dwight Marzolf's document on his workaround I found
> the following information (this is exactly the same sort of thing we
> are seeing too):
> 
> "
> we quickly found that if you did a cd via /net to one of our Network
> Appliance filers (all our other netapp filers worked correctly when
> unmounting /net mounts), the port release issue still existed.  In
> fact, the mountpoints actively took more ports.  This meant that if you
> mounted this filer with /net, your workstation could be rendered
> useless in less than 24 hours.  It also became evident that this active
> taking of ports by this filer was not limited to just autofs-4.1.3-28
> but also earlier versions of autofs  ...  Further
> research revealed the ports were being taken at the point of automount
> timeout.  When the automounter had declared these mountpoints to be
> timed out and ready to be unmounted and attempted to umount them, in
> fact, it ended up remounting them, using new ports for the remount ...
> "
> 
> HOW TO REPRODUCE THE PROBLEM:
> 
> Actually in our case we can render a machine useless in just about an
> hour or two, and this happens for all of our Netapp filers.  The procedure
> to do this is reproducible.
> 
> 1) You cd to a /net directory on the filer.
> 2) Leave the shell in that /net directory for about 15 minutes-> 1/2 an hour.
> and watch the "BUG" messages in the /var/log/messages file.
> 
> 3) Log out. (so the automounter tries to unmount everything that was mounted).
> 4) Log in again, after 30 minutes and by then you won't be about to 
> mount anything anymore
> 
> You can replace steps 3 and 4 with "init 6".  When the automounter process
> is stopped by init, you will see the port messages scroll up the console
> screen.
> 
> EXAMPLE OF REPRODUCING THE PROBLEM:
> 
> codered-51: cd /net/aflac/vol/vol2
> ( I can't help but wonder if this BUG message that shows up once a minute
> is indicative of a problem )
> 
> codered-52: tail -f /var/log/messages
> Jan 11 15:32:37 codered automount[6214]: attempting to mount entry /net/aflac
> Jan 11 15:33:41 codered automount[7915]: BUG: /net/aflac/vol/vol2 already 
> mounted
> Jan 11 15:34:42 codered automount[8049]: BUG: /net/aflac/vol/vol2 already 
> mounted
> Jan 11 15:36:42 codered automount[8311]: BUG: /net/aflac/vol/vol2 already 
> mounted
> Jan 11 15:37:43 codered automount[8441]: BUG: /net/aflac/vol/vol2 already 
> mounted
>  ... (continues once a minute to print out this bug) ...
> codered-53: sudo init 6
> (after reboot log in to see error messages)
> 
> THE REALLY WEIRD PART:
> Now the interesting thing here is that the machine is rebooting, so
> there is no program requesting additional mounts, yet here in the log
> files you can see that almost every subdirectory of /vol/vol2, /vol/vol3
> and /vol/vol3 are attempted to be mounted, even though the only
> thing that should be happening is an unmount of the directory aflac:/vol/vol2
> 
> jetcar-189: cd /net/aflac/vol/vol3
> jetcar-190: ls
> ad1983/      cad_archive/ emerald/     layout_old/  ta/          
> archive/     design/      is_013std/   lx3/  
> jetcar-191: cd ../vol2
> jetcar-192: ls
> 9xcores/         danube/          nwd_layout/      ulc3/
> DSPS_Finance/    gpdsp_PLD/       nwd_testmgr/     win2k/
> WWM/             gpdsp_marketing/ pc_backups/      
> bitpower/        india_mirror/    sh/              
> bluetooth/       nile/            spitfire/        
> jetcar-194: cd ../vol1
> etcar-195: ls
> IssueManager/ diablo/       is_013std/    ras/          tigersharc/
> admin/        ed/           jordan/       soft/         
> archive/      fsp/          nwd_fsp@      teton_lite/   
> cpd/          herc_eval/    pe_workspace/ thor/         
> 
> 
> codered-54: less /var/log/messages
> Jan 11 15:51:14 codered automount[6214]: can't shutdown: filesystem /net still 
> busy
> Jan 11 15:51:17 codered autofs: automount -USR2 succeeded
> Jan 11 15:51:19 codered automount[6214]: can't shutdown: filesystem /net still 
> busy
> Jan 11 15:51:20 codered autofs: automount -USR2 succeeded
> Jan 11 15:51:23 codered autofs: automount -USR2 succeeded
> Jan 11 15:51:26 codered autofs: automount -USR2 succeeded
> Jan 11 15:51:26 codered automount[6214]: can't shutdown: filesystem /net still 
> busy
> Jan 11 15:51:28 codered automount[14708]: >> mount: wrong fs type, bad option, 
> bad superblock on aflac:/vol/vol2/spitfire,
> Jan 11 15:51:28 codered automount[14708]: >>        or too many mounted file 
> sys
> tems
> Jan 11 15:51:28 codered automount[14708]: mount(nfs): nfs: mount failure 
> aflac:/
> vol/vol2/spitfire on /net/aflac/vol/vol2/spitfire
> Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
> Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
> Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
> Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
> Jan 11 15:51:28 codered kernel: nfs_read_super: get root inode failed
> Jan 11 15:51:28 codered kernel: nfs warning: mount version older than kernel
> Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
> Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
> Jan 11 15:51:28 codered kernel: nfs_read_super: get root inode failed
> Jan 11 15:51:28 codered automount[14708]: >> mount: wrong fs type, bad option, 
> bad superblock on aflac:/vol/vol2/ulc3,
> Jan 11 15:51:28 codered automount[14708]: >>        or too many mounted file 
> systems
> Jan 11 15:51:28 codered automount[14708]: mount(nfs): nfs: mount failure 
> aflac:/vol/vol2/ulc3 on /net/aflac/vol/vol2/ulc3
> ...
> This same pattern of error messages repeats for (in this order)
> aflac:/vol/vol2/win2k
> aflac:/vol/vol3/ad1983
> aflac:/vol/vol3/archive
> aflac:/vol/vol3/cad_archive
> aflac:/vol/vol3/design
> aflac:/vol/vol3/emerald
> aflac:/vol/vol3
> aflac:/vol/vol3/is_013std
> aflac:/vol/vol3/layout_old
> aflac:/vol/vol3/lx3
> aflac:/vol/vol3/ta
> aflac:/vol/vol2/DSPS_Finance
> aflac:/vol/vol2
> aflac:/vol/vol2/gpdsp_marketing
> aflac:/vol/vol2/gpdsp_PLD
> aflac:/vol/vol2/india_mirror
> aflac:/vol/vol2/nile
> aflac:/vol/vol2/nwd_layout
> aflac:/vol/vol2/nwd_testmgr
> aflac:/vol/vol2/pc_backups
> aflac:/vol/vol2/sh
> 
> aflac:/vol/vol2/spitfire (repeats the whole thing again)
> eventually gets to vol1:
> ...
> aflac:/vol/vol3/ta
> aflac:/vol/vol1/pe_workspace
> aflac:/vol/vol1/ras
> aflac:/vol/vol1/soft
> aflac:/vol/vol1/teton_lite
> aflac:/vol/vol1/thor
> aflac:/vol/vol1/tigersharc
> aflac:/vol/vol2/9xcores
> aflac:/vol/vol2/bitpower
> aflac:/vol/vol2/bluetooth
> aflac:/vol/vol2/danube
> aflac:/vol/vol2/DSPS_Finance
> ... (repeats the whole thing again)...
> 
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol3/ta 
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol3/lx3 
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol3/layout_old
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol3/is_013std
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol3 
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol2/win2k
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol2/ulc3
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol2/spitfire
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol2/sh 
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol2/pc_backups
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol2/nwd_testmgr
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol2/nwd_layout
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol2/nile
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol2/india_mirror
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol2/gpdsp_marketing
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol2/gpdsp_PLD
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol2 
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol1/tigersharc
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol1/thor
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol1/teton_lite
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol1/soft
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol1/ras 
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol1/pe_workspace
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol1/jordan
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol1/is_013std
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol1/herc_eval
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol1/fsp 
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
> /net/aflac/vol/vol1/IssueManager
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol1 
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol0 
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol 
> Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac 
> Jan 11 15:51:37 codered automount[15971]: expired /net/aflac
> Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol/vol3 
> Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol/vol2 
> Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol/vol1 
> Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol 
> Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac 
> Jan 11 15:51:37 codered automount[15974]: expired /net/aflac
> Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol/vol3 
> Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol/vol2 
> Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol/vol1 
> Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol 
> Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac 
> Jan 11 15:51:37 codered automount[15975]: expired /net/aflac
> Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol/vol3 
> Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol/vol2 
> Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol/vol1 
> Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol 
> Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac 
> Jan 11 15:51:37 codered automount[15976]: expired /net/aflac
> Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol/vol3 
> Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol/vol2 
> Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol/vol1 
> Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol 
> Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac 
> Jan 11 15:51:37 codered automount[15977]: expired /net/aflac
> Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol/vol3 
> Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol/vol2 
> Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol/vol1 
> Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol 
> Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac 
> Jan 11 15:51:38 codered automount[15978]: expired /net/aflac
> Jan 11 15:51:38 codered autofs: automount -USR2 succeeded
> Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol/vol3 
> Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol/vol2 
> Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol/vol1 
> Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol 
> Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac 
> Jan 11 15:51:38 codered automount[15986]: expired /net/aflac
> Jan 11 15:51:39 codered automount[6214]: can't shutdown: filesystem /net still 
> busy
> .... (keeps repeating) ....
> Jan 11 15:51:45 codered automount[6214]: can't shutdown: filesystem /net still 
> busy
> Jan 11 15:51:47 codered autofs: automount shutdown failed
> 
> 
> 
> HOW IT WAS FIXED IN REDHAT 8:
> 
> Dwight had implemented his fix in 3 steps for Redhat 8:
> 1) He updated his autofs to autofs-4.1.3-28 which had the port leak fix
> 2) He patched his kernel with the autofs4-2.4.20-20040508.patch
> (is some equivalent patch needed for Redhat 3 Enterprise 3 which uses 
> kernel 2.4.21-20 ?
> 3) He changed the way he exported filesystems from the Netapp:
> 
> "The last issue was the matter of how /vol/vol0 is exported from a
> Network Appliance filer.  We found that the following exports broke
> autofs4:
> 
> /vol/vol0     -root=node1:node2:node3:node4
> /vol/vol0     -rw,root=node1:node2:node3
> /vol/vol0     -anon=0
> 
> The export syntax that worked was:
> 
> /vol/vol0       -rw=node1:node2,root=node1,node2
> "
> 
> WHAT HAPPENED WHEN I TRIED THE REDHAT 8 WORKAROUND:
> 
> Now when I tried to do something similar, I found that if you weren't
> on node1 or node2, the filesystem was read-only, so I had to do this:
> 
> /vol/vol1	-rw=node1:node2,root=node1,node2
> /vol/vol1/foo1	-root=node1:node2
> /vol/vol1/foo2  -root=node1:node2
> 
> This way if you cd /net/filer/vol/vol1 it was read-only for most machines
> but if you cd'd to /net/filer/vol/vol1/foo1 it was read-write.  
> 
> So using that Netapp export workaround that fixed the Redhat 8 autofs4 problem,
> plus using autofs-4.1.3-67 has not yet solved the problem yet for our
> Redhat Enterprise 3 clients.
> 
> CONCLUSION:
> 
> I hope this is enough info to track down this problem.  It appears
> as though the interaction of using /net with a Netapp is causing
> spurious mounts, and unmounting is not working.  I will assist with
> any patch tests that you require, so let me know, and I will be able
> to verify any fixes.
> 
> Thanks,
> 
> -Dave
> 
> ________________________________________________________________________
> David Meleedy				Analog Devices, Inc.
> David.Meleedy@analog.com		Three Technology Way
> Phone: 781 461 3494			Norwood, MA  02062-9106  USA
> 
> 
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-01-11 21:22 BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems David Meleedy
  2005-01-12  5:38 ` Ian Kent
@ 2005-01-12 14:50 ` raven
  2005-01-12 22:22   ` David Meleedy
  2005-01-12 16:13 ` Dwight Marzolf
  2005-08-25 22:14 ` Rob Sims
  3 siblings, 1 reply; 37+ messages in thread
From: raven @ 2005-01-12 14:50 UTC (permalink / raw)
  To: David Meleedy; +Cc: autofs

On Tue, 11 Jan 2005, David Meleedy wrote:

>
> Hi Ian & Jeff,
> 	I am trying to track down an autofs issue that has been
> plaguing us.  It seems to be caused by the interaction of autofs version
> 4 with a Network Appliance server, and cd'ing to /net directories
> on the Netapp server.
>
> A similar issue was seen in Analog Devices in Redhat 8, and apparently
> the problem was worked around by Dwight Marzolf working with Ian Kent's
> help.  So following what Dwight did I have been trying to recreate the fix
> for Redhat Enterprise 3 update 3, and so far have not met with success.
>
> THE PROBLEM DESCRIPTION:
>
> Autofs hangs and refuses to mount any directories for a period of time
> after cd'ing to /net/<Netapp>/vol/vol[0-3] and waiting a while.
> The only way to clear this is to reboot the client.

OK.

This is interesting to me as your description below indicates that autofs 
is poorly behaved in this hostile evironment (aka. it's not dealing with 
this unusual situation at all well).

Also I'd like to add I've been seeing these symptoms in testing my new 
version on FC with a good number of entries in a master map (ie. >50).

It was clear from a "netstat --inet" that mount was causing several 
connections for each mount attempt. autofs, in this case, doesn't do 
any probing or opening of connections, it just calls mount.

This, and Mikes' comments regarding RPC transport multiplexing, has caused 
me to dig out a patch that I worked on some time ago. It was originally 
written by the NFS maintainer but never completed or tested.

Unfortuneately, I gave up on it when I tried to merge it into a RedHat 
kernel. The patches that had been appled to the RH kernel made it very 
difficult to apply, largely because my understanding of the RPC subsystem 
is just not good enough.

The patch that I worked on is very different to the one Mike proposed but 
achieves the same thing. There are other obstacles to having an RPC 
multiplexing patch accepted as well, but, maybe later.

So there are some options here.

>
> Initially we started using the following software (Redhat Enterprise 3 update
> 3)
> autofs 4.1.3-12
> kernel 2.4.21-20
> nfs-utils 1.0.6-31EL

I don't have access to these kernel sources.
That will be a problem as I don't know what autofs4 patches have been 
applied. Jeff?

You really should add util-linux to the list of packages to consider in 
the investigation. It may contain a patch which probes NFS servers and 
opens a number of connections for each mount.

>
> WHAT HAS BEEN TRIED SO FAR:
>
> Mike Waychison, after seeing the messages from our log file said,
>
> "These messages are due to starvation for reserved ports (< 1024).
> Specifically, the kernel will only use ports < 800.  Currently, the
> kernel uses one port per nfs filesystem.  If you mount filesystems very
> fast, then you can also run out of reserved ports as the local (mountd
> iirc?) will close tcp sessions and each must wait 2 minutes before being
> released.
>
> One solution is to try out the patch I posted last week that allows nfs
> mounts to share tcp/udp connections:
>
> http://marc.theaimsgroup.com/?l=linux-nfs&m=110261671705396&w=2
> "
>
> The problem is we are using a different version of the kernel 2.4,
> and his patch was for the 2.6 kernel.  Also, although his patch
> might make the number of ports available increase, I think it does
> not really solve the problem, it just gives more breathing room.

I'm not sure about that.

The multiplexing of the RPC transport would probably provide a solid 
solution to your problem by the sound of things. The patches I mentioned 
above were done against 2.4.22 and 2.6.0.

Problem here is that to get a working patch will probably take a while, so 
we probably need a workaround in the mean time.

>
> After talking with Jeff Moyer about the issue, I updated autofs to
> autofs-4.1.3-67.  This was supposed to incorporate a patch that fixes
> the port leak problem.

Certainly a bug, but not the heart of your problem I'm afraid.

>
> This did not solve the problem, but it did seem to improve things a bit.
>
> After looking at Dwight Marzolf's document on his workaround I found
> the following information (this is exactly the same sort of thing we
> are seeing too):
>
> "
> we quickly found that if you did a cd via /net to one of our Network
> Appliance filers (all our other netapp filers worked correctly when
> unmounting /net mounts), the port release issue still existed.  In
> fact, the mountpoints actively took more ports.  This meant that if you
> mounted this filer with /net, your workstation could be rendered
> useless in less than 24 hours.  It also became evident that this active
> taking of ports by this filer was not limited to just autofs-4.1.3-28
> but also earlier versions of autofs  ...  Further
> research revealed the ports were being taken at the point of automount
> timeout.  When the automounter had declared these mountpoints to be
> timed out and ready to be unmounted and attempted to umount them, in
> fact, it ended up remounting them, using new ports for the remount ...
> "

Do you have any messages on in the log on the server side like:

Jan 10 22:01:36 budgie-wl rpc.mountd: refused unmount request from 
raven-wl.themaw.net for /usr/local/sbin (/usr/local/sbin): illegal port 
36233

This indicates that the client has been patched to use non-priveledged 
ports to increase the number of available ports but the NFS server has 
not.

Just wondering?

>
> HOW TO REPRODUCE THE PROBLEM:
>
> Actually in our case we can render a machine useless in just about an
> hour or two, and this happens for all of our Netapp filers.  The procedure
> to do this is reproducible.
>
> 1) You cd to a /net directory on the filer.
> 2) Leave the shell in that /net directory for about 15 minutes-> 1/2 an hour.
> and watch the "BUG" messages in the /var/log/messages file.
>
> 3) Log out. (so the automounter tries to unmount everything that was mounted).
> 4) Log in again, after 30 minutes and by then you won't be about to
> mount anything anymore
>
> You can replace steps 3 and 4 with "init 6".  When the automounter process
> is stopped by init, you will see the port messages scroll up the console
> screen.
>
> EXAMPLE OF REPRODUCING THE PROBLEM:
>
> codered-51: cd /net/aflac/vol/vol2
> ( I can't help but wonder if this BUG message that shows up once a minute
> is indicative of a problem )
>
> codered-52: tail -f /var/log/messages
> Jan 11 15:32:37 codered automount[6214]: attempting to mount entry /net/aflac
> Jan 11 15:33:41 codered automount[7915]: BUG: /net/aflac/vol/vol2 already
> mounted
> Jan 11 15:34:42 codered automount[8049]: BUG: /net/aflac/vol/vol2 already
> mounted
> Jan 11 15:36:42 codered automount[8311]: BUG: /net/aflac/vol/vol2 already
> mounted
> Jan 11 15:37:43 codered automount[8441]: BUG: /net/aflac/vol/vol2 already
> mounted

Seen that lately. Definutely want to get to the bottom of this.

I don't yet understand why autofs is getting requests to mount an already 
mounted file system. Even in a hostile situation autofs needs to deal 
with this properly.

In the past I observed that this might have been somehow related to 
corruption in /etc/mtab.

> ... (continues once a minute to print out this bug) ...
> codered-53: sudo init 6
> (after reboot log in to see error messages)
>
> THE REALLY WEIRD PART:
> Now the interesting thing here is that the machine is rebooting, so
> there is no program requesting additional mounts, yet here in the log
> files you can see that almost every subdirectory of /vol/vol2, /vol/vol3
> and /vol/vol3 are attempted to be mounted, even though the only
> thing that should be happening is an unmount of the directory aflac:/vol/vol2
>
> jetcar-189: cd /net/aflac/vol/vol3
> jetcar-190: ls
> ad1983/      cad_archive/ emerald/     layout_old/  ta/
> archive/     design/      is_013std/   lx3/
> jetcar-191: cd ../vol2
> jetcar-192: ls
> 9xcores/         danube/          nwd_layout/      ulc3/
> DSPS_Finance/    gpdsp_PLD/       nwd_testmgr/     win2k/
> WWM/             gpdsp_marketing/ pc_backups/
> bitpower/        india_mirror/    sh/
> bluetooth/       nile/            spitfire/
> jetcar-194: cd ../vol1
> etcar-195: ls
> IssueManager/ diablo/       is_013std/    ras/          tigersharc/
> admin/        ed/           jordan/       soft/
> archive/      fsp/          nwd_fsp@      teton_lite/
> cpd/          herc_eval/    pe_workspace/ thor/
>
>
> codered-54: less /var/log/messages
> Jan 11 15:51:14 codered automount[6214]: can't shutdown: filesystem /net still
> busy
> Jan 11 15:51:17 codered autofs: automount -USR2 succeeded
> Jan 11 15:51:19 codered automount[6214]: can't shutdown: filesystem /net still
> busy
> Jan 11 15:51:20 codered autofs: automount -USR2 succeeded
> Jan 11 15:51:23 codered autofs: automount -USR2 succeeded
> Jan 11 15:51:26 codered autofs: automount -USR2 succeeded
> Jan 11 15:51:26 codered automount[6214]: can't shutdown: filesystem /net still
> busy
> Jan 11 15:51:28 codered automount[14708]: >> mount: wrong fs type, bad option,
> bad superblock on aflac:/vol/vol2/spitfire,
> Jan 11 15:51:28 codered automount[14708]: >>        or too many mounted file
> sys
> tems
> Jan 11 15:51:28 codered automount[14708]: mount(nfs): nfs: mount failure
> aflac:/
> vol/vol2/spitfire on /net/aflac/vol/vol2/spitfire
> Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
> Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
> Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
> Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
> Jan 11 15:51:28 codered kernel: nfs_read_super: get root inode failed
> Jan 11 15:51:28 codered kernel: nfs warning: mount version older than kernel
> Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
> Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
> Jan 11 15:51:28 codered kernel: nfs_read_super: get root inode failed

Looks like you've run out of priviledged port space here, at least the 
ones that RPC is trying to use.

snip ...

>
> HOW IT WAS FIXED IN REDHAT 8:
>
> Dwight had implemented his fix in 3 steps for Redhat 8:
> 1) He updated his autofs to autofs-4.1.3-28 which had the port leak fix
> 2) He patched his kernel with the autofs4-2.4.20-20040508.patch
> (is some equivalent patch needed for Redhat 3 Enterprise 3 which uses
> kernel 2.4.21-20 ?
> 3) He changed the way he exported filesystems from the Netapp:
>
> "The last issue was the matter of how /vol/vol0 is exported from a
> Network Appliance filer.  We found that the following exports broke
> autofs4:
>
> /vol/vol0     -root=node1:node2:node3:node4
> /vol/vol0     -rw,root=node1:node2:node3
> /vol/vol0     -anon=0
>
> The export syntax that worked was:
>
> /vol/vol0       -rw=node1:node2,root=node1,node2
> "

This is a bug in the option parsing. I'll need to fix that.

>
> WHAT HAPPENED WHEN I TRIED THE REDHAT 8 WORKAROUND:
>
> Now when I tried to do something similar, I found that if you weren't
> on node1 or node2, the filesystem was read-only, so I had to do this:
>
> /vol/vol1	-rw=node1:node2,root=node1,node2
> /vol/vol1/foo1	-root=node1:node2
> /vol/vol1/foo2  -root=node1:node2
>
> This way if you cd /net/filer/vol/vol1 it was read-only for most machines
> but if you cd'd to /net/filer/vol/vol1/foo1 it was read-write.
>
> So using that Netapp export workaround that fixed the Redhat 8 autofs4 problem,
> plus using autofs-4.1.3-67 has not yet solved the problem yet for our
> Redhat Enterprise 3 clients.
>
> CONCLUSION:
>
> I hope this is enough info to track down this problem.  It appears
> as though the interaction of using /net with a Netapp is causing
> spurious mounts, and unmounting is not working.  I will assist with
> any patch tests that you require, so let me know, and I will be able
> to verify any fixes.

Might be a bit of a long road here but we'll have to see how we go.

btw, on average, how many exports do you have on a filer?

Regards
Ian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-01-11 21:22 BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems David Meleedy
  2005-01-12  5:38 ` Ian Kent
  2005-01-12 14:50 ` raven
@ 2005-01-12 16:13 ` Dwight Marzolf
  2005-01-12 20:55   ` David Meleedy
  2005-08-25 22:14 ` Rob Sims
  3 siblings, 1 reply; 37+ messages in thread
From: Dwight Marzolf @ 2005-01-12 16:13 UTC (permalink / raw)
  To: David Meleedy; +Cc: autofs

Dave,

>Now when I tried to do something similar, I found that if you weren't
>on node1 or node2, the filesystem was read-only, so I had to do this:
>
>/vol/vol1	-rw=node1:node2,root=node1,node2
>/vol/vol1/foo1	-root=node1:node2
>/vol/vol1/foo2  -root=node1:node2

On this one here, the top line is correct but the other two lines should be:

/vol/vol1/foo1	-rw,root=node1:node2
/vol/vol1/foo2  -rw,root=node1:node2

This way, the vol/vol1 dir does not mount when you cd to 
/net/machine/vol/vol1 but the other two directories do mount and are 
accessible by all workstations that need to read and write to it.  This 
should work under both RedHat 8 and Enterprise 3.  Now, I don't know why 
autofs4 seems to require the exports to be this way on a netapp box when 
Solaris didn't seem to care but this is what is working for us.

Dwight Marzolf


David Meleedy wrote:

>Hi Ian & Jeff,
>	I am trying to track down an autofs issue that has been
>plaguing us.  It seems to be caused by the interaction of autofs version
>4 with a Network Appliance server, and cd'ing to /net directories
>on the Netapp server.
>
>A similar issue was seen in Analog Devices in Redhat 8, and apparently
>the problem was worked around by Dwight Marzolf working with Ian Kent's
>help.  So following what Dwight did I have been trying to recreate the fix
>for Redhat Enterprise 3 update 3, and so far have not met with success.
>
>THE PROBLEM DESCRIPTION:
>
>Autofs hangs and refuses to mount any directories for a period of time
>after cd'ing to /net/<Netapp>/vol/vol[0-3] and waiting a while.
>The only way to clear this is to reboot the client.
>
>Initially we started using the following software (Redhat Enterprise 3 update 
>3)
>autofs 4.1.3-12
>kernel 2.4.21-20
>nfs-utils 1.0.6-31EL
>
>WHAT HAS BEEN TRIED SO FAR:
>
>Mike Waychison, after seeing the messages from our log file said,
>
>"These messages are due to starvation for reserved ports (< 1024).
>Specifically, the kernel will only use ports < 800.  Currently, the
>kernel uses one port per nfs filesystem.  If you mount filesystems very
>fast, then you can also run out of reserved ports as the local (mountd
>iirc?) will close tcp sessions and each must wait 2 minutes before being
>released.
>
>One solution is to try out the patch I posted last week that allows nfs
>mounts to share tcp/udp connections:
>
>http://marc.theaimsgroup.com/?l=linux-nfs&m=110261671705396&w=2
>"
>
>The problem is we are using a different version of the kernel 2.4,
>and his patch was for the 2.6 kernel.  Also, although his patch
>might make the number of ports available increase, I think it does
>not really solve the problem, it just gives more breathing room.
>
>After talking with Jeff Moyer about the issue, I updated autofs to 
>autofs-4.1.3-67.  This was supposed to incorporate a patch that fixes
>the port leak problem.
>
>This did not solve the problem, but it did seem to improve things a bit.
>
>After looking at Dwight Marzolf's document on his workaround I found
>the following information (this is exactly the same sort of thing we
>are seeing too):
>
>"
>we quickly found that if you did a cd via /net to one of our Network
>Appliance filers (all our other netapp filers worked correctly when
>unmounting /net mounts), the port release issue still existed.  In
>fact, the mountpoints actively took more ports.  This meant that if you
>mounted this filer with /net, your workstation could be rendered
>useless in less than 24 hours.  It also became evident that this active
>taking of ports by this filer was not limited to just autofs-4.1.3-28
>but also earlier versions of autofs  ...  Further
>research revealed the ports were being taken at the point of automount
>timeout.  When the automounter had declared these mountpoints to be
>timed out and ready to be unmounted and attempted to umount them, in
>fact, it ended up remounting them, using new ports for the remount ...
>"
>
>HOW TO REPRODUCE THE PROBLEM:
>
>Actually in our case we can render a machine useless in just about an
>hour or two, and this happens for all of our Netapp filers.  The procedure
>to do this is reproducible.
>
>1) You cd to a /net directory on the filer.
>2) Leave the shell in that /net directory for about 15 minutes-> 1/2 an hour.
>and watch the "BUG" messages in the /var/log/messages file.
>
>3) Log out. (so the automounter tries to unmount everything that was mounted).
>4) Log in again, after 30 minutes and by then you won't be about to 
>mount anything anymore
>
>You can replace steps 3 and 4 with "init 6".  When the automounter process
>is stopped by init, you will see the port messages scroll up the console
>screen.
>
>EXAMPLE OF REPRODUCING THE PROBLEM:
>
>codered-51: cd /net/aflac/vol/vol2
>( I can't help but wonder if this BUG message that shows up once a minute
>is indicative of a problem )
>
>codered-52: tail -f /var/log/messages
>Jan 11 15:32:37 codered automount[6214]: attempting to mount entry /net/aflac
>Jan 11 15:33:41 codered automount[7915]: BUG: /net/aflac/vol/vol2 already 
>mounted
>Jan 11 15:34:42 codered automount[8049]: BUG: /net/aflac/vol/vol2 already 
>mounted
>Jan 11 15:36:42 codered automount[8311]: BUG: /net/aflac/vol/vol2 already 
>mounted
>Jan 11 15:37:43 codered automount[8441]: BUG: /net/aflac/vol/vol2 already 
>mounted
> ... (continues once a minute to print out this bug) ...
>codered-53: sudo init 6
>(after reboot log in to see error messages)
>
>THE REALLY WEIRD PART:
>Now the interesting thing here is that the machine is rebooting, so
>there is no program requesting additional mounts, yet here in the log
>files you can see that almost every subdirectory of /vol/vol2, /vol/vol3
>and /vol/vol3 are attempted to be mounted, even though the only
>thing that should be happening is an unmount of the directory aflac:/vol/vol2
>
>jetcar-189: cd /net/aflac/vol/vol3
>jetcar-190: ls
>ad1983/      cad_archive/ emerald/     layout_old/  ta/          
>archive/     design/      is_013std/   lx3/  
>jetcar-191: cd ../vol2
>jetcar-192: ls
>9xcores/         danube/          nwd_layout/      ulc3/
>DSPS_Finance/    gpdsp_PLD/       nwd_testmgr/     win2k/
>WWM/             gpdsp_marketing/ pc_backups/      
>bitpower/        india_mirror/    sh/              
>bluetooth/       nile/            spitfire/        
>jetcar-194: cd ../vol1
>etcar-195: ls
>IssueManager/ diablo/       is_013std/    ras/          tigersharc/
>admin/        ed/           jordan/       soft/         
>archive/      fsp/          nwd_fsp@      teton_lite/   
>cpd/          herc_eval/    pe_workspace/ thor/         
>
>
>codered-54: less /var/log/messages
>Jan 11 15:51:14 codered automount[6214]: can't shutdown: filesystem /net still 
>busy
>Jan 11 15:51:17 codered autofs: automount -USR2 succeeded
>Jan 11 15:51:19 codered automount[6214]: can't shutdown: filesystem /net still 
>busy
>Jan 11 15:51:20 codered autofs: automount -USR2 succeeded
>Jan 11 15:51:23 codered autofs: automount -USR2 succeeded
>Jan 11 15:51:26 codered autofs: automount -USR2 succeeded
>Jan 11 15:51:26 codered automount[6214]: can't shutdown: filesystem /net still 
>busy
>Jan 11 15:51:28 codered automount[14708]: >> mount: wrong fs type, bad option, 
>bad superblock on aflac:/vol/vol2/spitfire,
>Jan 11 15:51:28 codered automount[14708]: >>        or too many mounted file 
>sys
>tems
>Jan 11 15:51:28 codered automount[14708]: mount(nfs): nfs: mount failure 
>aflac:/
>vol/vol2/spitfire on /net/aflac/vol/vol2/spitfire
>Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
>Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
>Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
>Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
>Jan 11 15:51:28 codered kernel: nfs_read_super: get root inode failed
>Jan 11 15:51:28 codered kernel: nfs warning: mount version older than kernel
>Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
>Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
>Jan 11 15:51:28 codered kernel: nfs_read_super: get root inode failed
>Jan 11 15:51:28 codered automount[14708]: >> mount: wrong fs type, bad option, 
>bad superblock on aflac:/vol/vol2/ulc3,
>Jan 11 15:51:28 codered automount[14708]: >>        or too many mounted file 
>systems
>Jan 11 15:51:28 codered automount[14708]: mount(nfs): nfs: mount failure 
>aflac:/vol/vol2/ulc3 on /net/aflac/vol/vol2/ulc3
>...
>This same pattern of error messages repeats for (in this order)
>aflac:/vol/vol2/win2k
>aflac:/vol/vol3/ad1983
>aflac:/vol/vol3/archive
>aflac:/vol/vol3/cad_archive
>aflac:/vol/vol3/design
>aflac:/vol/vol3/emerald
>aflac:/vol/vol3
>aflac:/vol/vol3/is_013std
>aflac:/vol/vol3/layout_old
>aflac:/vol/vol3/lx3
>aflac:/vol/vol3/ta
>aflac:/vol/vol2/DSPS_Finance
>aflac:/vol/vol2
>aflac:/vol/vol2/gpdsp_marketing
>aflac:/vol/vol2/gpdsp_PLD
>aflac:/vol/vol2/india_mirror
>aflac:/vol/vol2/nile
>aflac:/vol/vol2/nwd_layout
>aflac:/vol/vol2/nwd_testmgr
>aflac:/vol/vol2/pc_backups
>aflac:/vol/vol2/sh
>
>aflac:/vol/vol2/spitfire (repeats the whole thing again)
>eventually gets to vol1:
>...
>aflac:/vol/vol3/ta
>aflac:/vol/vol1/pe_workspace
>aflac:/vol/vol1/ras
>aflac:/vol/vol1/soft
>aflac:/vol/vol1/teton_lite
>aflac:/vol/vol1/thor
>aflac:/vol/vol1/tigersharc
>aflac:/vol/vol2/9xcores
>aflac:/vol/vol2/bitpower
>aflac:/vol/vol2/bluetooth
>aflac:/vol/vol2/danube
>aflac:/vol/vol2/DSPS_Finance
>... (repeats the whole thing again)...
>
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol3/ta 
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol3/lx3 
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol3/layout_old
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol3/is_013std
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol3 
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol2/win2k
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol2/ulc3
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol2/spitfire
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol2/sh 
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol2/pc_backups
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol2/nwd_testmgr
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol2/nwd_layout
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol2/nile
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol2/india_mirror
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol2/gpdsp_marketing
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol2/gpdsp_PLD
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol2 
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol1/tigersharc
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol1/thor
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol1/teton_lite
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol1/soft
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol1/ras 
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol1/pe_workspace
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol1/jordan
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol1/is_013std
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol1/herc_eval
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol1/fsp 
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: 
>/net/aflac/vol/vol1/IssueManager
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol1 
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol/vol0 
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac/vol 
>Jan 11 15:51:37 codered automount[15971]: rm_unwanted: /net/aflac 
>Jan 11 15:51:37 codered automount[15971]: expired /net/aflac
>Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol/vol3 
>Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol/vol2 
>Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol/vol1 
>Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac/vol 
>Jan 11 15:51:37 codered automount[15974]: rm_unwanted: /net/aflac 
>Jan 11 15:51:37 codered automount[15974]: expired /net/aflac
>Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol/vol3 
>Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol/vol2 
>Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol/vol1 
>Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac/vol 
>Jan 11 15:51:37 codered automount[15975]: rm_unwanted: /net/aflac 
>Jan 11 15:51:37 codered automount[15975]: expired /net/aflac
>Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol/vol3 
>Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol/vol2 
>Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol/vol1 
>Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac/vol 
>Jan 11 15:51:37 codered automount[15976]: rm_unwanted: /net/aflac 
>Jan 11 15:51:37 codered automount[15976]: expired /net/aflac
>Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol/vol3 
>Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol/vol2 
>Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol/vol1 
>Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac/vol 
>Jan 11 15:51:37 codered automount[15977]: rm_unwanted: /net/aflac 
>Jan 11 15:51:37 codered automount[15977]: expired /net/aflac
>Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol/vol3 
>Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol/vol2 
>Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol/vol1 
>Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac/vol 
>Jan 11 15:51:38 codered automount[15978]: rm_unwanted: /net/aflac 
>Jan 11 15:51:38 codered automount[15978]: expired /net/aflac
>Jan 11 15:51:38 codered autofs: automount -USR2 succeeded
>Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol/vol3 
>Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol/vol2 
>Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol/vol1 
>Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac/vol 
>Jan 11 15:51:38 codered automount[15986]: rm_unwanted: /net/aflac 
>Jan 11 15:51:38 codered automount[15986]: expired /net/aflac
>Jan 11 15:51:39 codered automount[6214]: can't shutdown: filesystem /net still 
>busy
>.... (keeps repeating) ....
>Jan 11 15:51:45 codered automount[6214]: can't shutdown: filesystem /net still 
>busy
>Jan 11 15:51:47 codered autofs: automount shutdown failed
>
>
>
>HOW IT WAS FIXED IN REDHAT 8:
>
>Dwight had implemented his fix in 3 steps for Redhat 8:
>1) He updated his autofs to autofs-4.1.3-28 which had the port leak fix
>2) He patched his kernel with the autofs4-2.4.20-20040508.patch
>(is some equivalent patch needed for Redhat 3 Enterprise 3 which uses 
>kernel 2.4.21-20 ?
>3) He changed the way he exported filesystems from the Netapp:
>
>"The last issue was the matter of how /vol/vol0 is exported from a
>Network Appliance filer.  We found that the following exports broke
>autofs4:
>
>/vol/vol0     -root=node1:node2:node3:node4
>/vol/vol0     -rw,root=node1:node2:node3
>/vol/vol0     -anon=0
>
>The export syntax that worked was:
>
>/vol/vol0       -rw=node1:node2,root=node1,node2
>"
>
>WHAT HAPPENED WHEN I TRIED THE REDHAT 8 WORKAROUND:
>
>Now when I tried to do something similar, I found that if you weren't
>on node1 or node2, the filesystem was read-only, so I had to do this:
>
>/vol/vol1	-rw=node1:node2,root=node1,node2
>/vol/vol1/foo1	-root=node1:node2
>/vol/vol1/foo2  -root=node1:node2
>
>This way if you cd /net/filer/vol/vol1 it was read-only for most machines
>but if you cd'd to /net/filer/vol/vol1/foo1 it was read-write.  
>
>So using that Netapp export workaround that fixed the Redhat 8 autofs4 problem,
>plus using autofs-4.1.3-67 has not yet solved the problem yet for our
>Redhat Enterprise 3 clients.
>
>CONCLUSION:
>
>I hope this is enough info to track down this problem.  It appears
>as though the interaction of using /net with a Netapp is causing
>spurious mounts, and unmounting is not working.  I will assist with
>any patch tests that you require, so let me know, and I will be able
>to verify any fixes.
>
>Thanks,
>
>-Dave
>
>________________________________________________________________________
>David Meleedy				Analog Devices, Inc.
>David.Meleedy@analog.com		Three Technology Way
>Phone: 781 461 3494			Norwood, MA  02062-9106  USA
>
>
>
>
>  
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-01-12  5:38 ` Ian Kent
@ 2005-01-12 16:55   ` Mike Waychison
  2005-01-12 20:43     ` David Meleedy
                       ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Mike Waychison @ 2005-01-12 16:55 UTC (permalink / raw)
  To: David Meleedy; +Cc: autofs, Ian Kent

[-- Attachment #1: Type: text/plain, Size: 4513 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Kent wrote:
> Just a quick note before we get deep into this.
> 
> Can you check something for me.
> Get the source rpm for util-linux.
> Check if there is a patch applied to it to probe for services during 
> mount (it was a patch in FC). If it is rebuild the rpm without it and test 
> again.
> 
> On Tue, 11 Jan 2005, David Meleedy wrote:
> 
> 
>>Hi Ian & Jeff,
>>	I am trying to track down an autofs issue that has been
>>plaguing us.  It seems to be caused by the interaction of autofs version
>>4 with a Network Appliance server, and cd'ing to /net directories
>>on the Netapp server.
>>
>>A similar issue was seen in Analog Devices in Redhat 8, and apparently
>>the problem was worked around by Dwight Marzolf working with Ian Kent's
>>help.  So following what Dwight did I have been trying to recreate the fix
>>for Redhat Enterprise 3 update 3, and so far have not met with success.
>>
>>THE PROBLEM DESCRIPTION:
>>
>>Autofs hangs and refuses to mount any directories for a period of time
>>after cd'ing to /net/<Netapp>/vol/vol[0-3] and waiting a while.
>>The only way to clear this is to reboot the client.
>>
>>Initially we started using the following software (Redhat Enterprise 3 update 
>>3)
>>autofs 4.1.3-12
>>kernel 2.4.21-20
>>nfs-utils 1.0.6-31EL
>>
>>WHAT HAS BEEN TRIED SO FAR:
>>
>>Mike Waychison, after seeing the messages from our log file said,
>>
>>"These messages are due to starvation for reserved ports (< 1024).
>>Specifically, the kernel will only use ports < 800.  Currently, the
>>kernel uses one port per nfs filesystem.  If you mount filesystems very
>>fast, then you can also run out of reserved ports as the local (mountd
>>iirc?) will close tcp sessions and each must wait 2 minutes before being
>>released.
>>
>>One solution is to try out the patch I posted last week that allows nfs
>>mounts to share tcp/udp connections:
>>
>>http://marc.theaimsgroup.com/?l=linux-nfs&m=110261671705396&w=2
>>"
>>
>>The problem is we are using a different version of the kernel 2.4,
>>and his patch was for the 2.6 kernel.  Also, although his patch
>>might make the number of ports available increase, I think it does
>>not really solve the problem, it just gives more breathing room.

Well, it will pretty much guarantee only one port is used for any given
filer for talking to the nfs program.  Other ports are still used
temporarily to talk to mountd and the portmapper.

I've attached patch that applies cleanly to 2.4.21-20.EL, though I
haven't had the chance to test it other than by compiling it.

>>
>>After talking with Jeff Moyer about the issue, I updated autofs to 
>>autofs-4.1.3-67.  This was supposed to incorporate a patch that fixes
>>the port leak problem.
>>
>>This did not solve the problem, but it did seem to improve things a bit.
>>
>>After looking at Dwight Marzolf's document on his workaround I found
>>the following information (this is exactly the same sort of thing we
>>are seeing too):
>>
>>"
>>we quickly found that if you did a cd via /net to one of our Network
>>Appliance filers (all our other netapp filers worked correctly when
>>unmounting /net mounts), the port release issue still existed.  In
>>fact, the mountpoints actively took more ports.  This meant that if you
>>mounted this filer with /net, your workstation could be rendered
>>useless in less than 24 hours.  It also became evident that this active
>>taking of ports by this filer was not limited to just autofs-4.1.3-28
>>but also earlier versions of autofs  ...  Further
>>research revealed the ports were being taken at the point of automount
>>timeout.  When the automounter had declared these mountpoints to be
>>timed out and ready to be unmounted and attempted to umount them, in
>>fact, it ended up remounting them, using new ports for the remount ...
>>"
>>

Out of curiosity, can we see the output of showmount -e against your filer?

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB5VaHdQs4kOxk3/MRAvvhAJ4uOaMXMTE4rjZ6ivLrbyeowcZkuACfdshX
yBzl0PSwvsMaQZgKelhmrd4=
=vjuL
-----END PGP SIGNATURE-----

[-- Attachment #2: xprt_sharing-2.4.21-20.EL.patch --]
[-- Type: text/x-patch, Size: 5312 bytes --]

This patch allows for sharing of xprts.  This is done by keeping a list of
current xprts and passing them back to the caller of xprt_create_proto if they
match the specifications required (IP X port X protocol X timeout).

We do this multiplexing at the xprt layer as it handles transport creation and
destruction.

This patch has been tested in a test-only environment but has been able to
handle a couple hundreds distinct nfs mounts from the same server over a single
tcp stream.

This effectively gets rid of the 800 nfs mounts max problem, as long as you
aren't mounting from many (800) nfs servers.

Signed-off-by: Mike Waychison <michael.waychison@sun.com>

Index: linux-2.4.20/include/linux/sunrpc/xprt.h
===================================================================
--- linux-2.4.20.orig/include/linux/sunrpc/xprt.h	2004-06-24 22:00:25.000000000 -0700
+++ linux-2.4.20/include/linux/sunrpc/xprt.h	2005-01-12 08:23:33.000000000 -0800
@@ -15,6 +15,8 @@
 #include <linux/sunrpc/sched.h>
 #include <linux/sunrpc/xdr.h>
 
+#include <asm/atomic.h>
+
 /*
  * The transport code maintains an estimate on the maximum number of out-
  * standing RPC requests, using a smoothed version of the congestion
@@ -161,6 +163,9 @@ struct rpc_xprt {
 	void			(*old_write_space)(struct sock *);
 
 	wait_queue_head_t	cong_wait;
+
+	atomic_t		count;		/* shared xprt refcount */
+	struct list_head	shared;		/* link to shared list */
 };
 
 #ifdef __KERNEL__
Index: linux-2.4.20/net/sunrpc/xprt.c
===================================================================
--- linux-2.4.20.orig/net/sunrpc/xprt.c	2004-06-21 16:19:58.000000000 -0700
+++ linux-2.4.20/net/sunrpc/xprt.c	2005-01-12 08:20:29.000000000 -0800
@@ -79,6 +79,12 @@
 #define XPRT_MAX_BACKOFF	(8)
 
 /*
+ * List of shared xprt
+ */
+static DECLARE_MUTEX(shared_xprt_sem);
+static LIST_HEAD(shared_xprt_list);
+
+/*
  * Local functions
  */
 static void	xprt_request_init(struct rpc_task *, struct rpc_xprt *);
@@ -1292,6 +1298,33 @@ xprt_release(struct rpc_task *task)
 }
 
 /*
+ * Compare two rpc_timeout to see if they are the same.
+ */
+static int
+xprt_is_same_timeout(struct rpc_timeout *left, struct rpc_timeout *right)
+{
+	/* to_increment isn't used if to_exponential is true */
+	return left->to_initval     == right->to_initval
+            && left->to_maxval      == right->to_maxval
+            && left->to_retries     == right->to_retries
+            && left->to_exponential == right->to_exponential
+            && (left->to_exponential
+                || (left->to_increment  == right->to_increment));
+}
+
+/*
+ * Check to see if the timeout is the default timeout.
+ */
+static int
+xprt_is_default_timeout(struct rpc_timeout *to, int proto)
+{
+	struct rpc_timeout defaultto;
+
+	xprt_default_timeout(&defaultto, proto);
+	return xprt_is_same_timeout(&defaultto, to);
+}
+
+/*
  * Set default timeout parameters
  */
 void
@@ -1349,6 +1382,8 @@ xprt_setup(struct socket *sock, int prot
 	init_waitqueue_head(&xprt->cong_wait);
 
 	INIT_LIST_HEAD(&xprt->recv);
+	INIT_LIST_HEAD(&xprt->shared);
+	atomic_set(&xprt->count, 1);
 
 	/* Set timeout parameters */
 	if (to) {
@@ -1490,8 +1525,8 @@ failed:
 /*
  * Create an RPC client transport given the protocol and peer address.
  */
-struct rpc_xprt *
-xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to)
+static struct rpc_xprt *
+__xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to)
 {
 	struct socket	*sock;
 	struct rpc_xprt	*xprt;
@@ -1508,6 +1543,43 @@ xprt_create_proto(int proto, struct sock
 }
 
 /*
+ * Create an RPC client transport that is shared given the protocol and peer
+ * address.
+ */
+struct rpc_xprt *
+xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to)
+{
+	struct rpc_xprt *xprt;
+
+	down(&shared_xprt_sem);
+	/* walk the list and find an existing mathing xprt */
+	list_for_each_entry(xprt, &shared_xprt_list, shared) {
+		/* Filter out mismatches */
+		if (sap->sin_addr.s_addr != xprt->addr.sin_addr.s_addr)
+			continue;
+		if (sap->sin_port != xprt->addr.sin_port)
+			continue;
+		if (xprt->prot != proto)
+			continue;
+		if (to == NULL && !xprt_is_default_timeout(&xprt->timeout, proto))
+			continue;
+		if (to && !xprt_is_same_timeout(&xprt->timeout, to))
+			continue;
+
+		atomic_inc(&xprt->count);
+		goto out;
+	}
+
+	/* make a new one */
+	xprt = __xprt_create_proto(proto, sap, to);
+	if (!IS_ERR(xprt))
+		list_add(&xprt->shared, &shared_xprt_list);
+out:
+	up(&shared_xprt_sem);
+	return xprt;
+}
+
+/*
  * Prepare for transport shutdown.
  */
 void
@@ -1536,8 +1608,8 @@ xprt_clear_backlog(struct rpc_xprt *xprt
 /*
  * Destroy an RPC transport, killing off all requests.
  */
-int
-xprt_destroy(struct rpc_xprt *xprt)
+static int
+__xprt_destroy(struct rpc_xprt *xprt)
 {
 	dprintk("RPC:      destroying transport %p\n", xprt);
 	xprt_shutdown(xprt);
@@ -1546,3 +1618,20 @@ xprt_destroy(struct rpc_xprt *xprt)
 
 	return 0;
 }
+
+/*
+ * Destroy a shared RPC transport.
+ * (XXX: what about the remaining live requests?)
+ */
+int
+xprt_destroy(struct rpc_xprt *xprt)
+{
+	int ret = 0;
+	down(&shared_xprt_sem);
+	if (atomic_dec_and_test(&xprt->count)) {
+		list_del_init(&xprt->shared);
+		ret = __xprt_destroy(xprt);
+	}
+	up(&shared_xprt_sem);
+	return ret;
+}

[-- Attachment #3: Type: text/plain, Size: 140 bytes --]

_______________________________________________
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-12 16:55   ` Mike Waychison
@ 2005-01-12 20:43     ` David Meleedy
  2005-01-13  0:37     ` David Meleedy
  2005-01-13  8:13     ` Ian Kent
  2 siblings, 0 replies; 37+ messages in thread
From: David Meleedy @ 2005-01-12 20:43 UTC (permalink / raw)
  To: Mike Waychison; +Cc: autofs, Ian Kent


> Out of curiosity, can we see the output of showmount -e against your filer?

codered-51: showmount -e aflac
Export list for aflac:
/vol/vol2/9xcores         (everyone)
/vol/vol2/gpdsp_marketing (everyone)
/vol/vol1/thor            (everyone)
/vol/vol1/admin           (everyone)
/vol/vol3/is_013std       (everyone)
/vol/vol1/pe_workspace    (everyone)
/vol/vol1                 (everyone)
/vol/vol2/spitfire        (everyone)
/vol/vol3/layout_old      (everyone)
/vol/vol1/jordan          (everyone)
/vol/vol2/DSPS_Finance    (everyone)
/vol/vol2/sh              (everyone)
/vol/vol2/gpdsp_PLD       (everyone)
/vol/vol1/tigersharc      (everyone)
/vol/vol1/ed              (everyone)
/vol/vol3/ta              (everyone)
/vol/vol2                 (everyone)
/vol/vol2/pc_backups      (everyone)
/vol/vol2/india_mirror    (everyone)
/vol/vol2/ulc3            (everyone)
/vol/vol1/is_013std       (everyone)
/vol/vol3/lx3             (everyone)
/vol/vol2/danube          (everyone)
/vol/vol3                 (everyone)
/vol/vol1/teton_lite      (everyone)
/vol/vol3/emerald         (everyone)
/vol/vol3/archive         (everyone)
/vol/vol1/diablo          (everyone)
/vol/vol2/nwd_testmgr     (everyone)
/vol/vol1/ras             (everyone)
/vol/vol1/soft            (everyone)
/vol/vol2/nile            (everyone)
/vol/vol2/bluetooth       (everyone)
/vol/vol3/ad1983          (everyone)
/vol/vol1/IssueManager    (everyone)
/vol/vol3/cad_archive     (everyone)
/vol/vol2/win2k           (everyone)
/vol/vol2/nwd_layout      (everyone)
/vol/vol1/cpd             (everyone)
/vol/vol3/design          (everyone)
/vol/vol1/herc_eval       (everyone)
/vol/vol0                 (everyone)
/vol/vol2/bitpower        (everyone)
/vol/vol1/fsp             (everyone)
/vol/vol1/archive         (everyone)


codered-52: /etc/auto.net aflac
-fstype=nfs,hard,intr,nodev,nosuid \
        /vol/vol0 aflac:/vol/vol0 \
        /vol/vol1/admin aflac:/vol/vol1/admin \
        /vol/vol1/archive aflac:/vol/vol1/archive \
        /vol/vol1/cpd aflac:/vol/vol1/cpd \
        /vol/vol1/diablo aflac:/vol/vol1/diablo \
        /vol/vol1/ed aflac:/vol/vol1/ed \
        /vol/vol1 aflac:/vol/vol1 \
        /vol/vol1/fsp aflac:/vol/vol1/fsp \
        /vol/vol1/herc_eval aflac:/vol/vol1/herc_eval \
        /vol/vol1/is_013std aflac:/vol/vol1/is_013std \
        /vol/vol1/IssueManager aflac:/vol/vol1/IssueManager \
        /vol/vol1/jordan aflac:/vol/vol1/jordan \
        /vol/vol1/pe_workspace aflac:/vol/vol1/pe_workspace \
        /vol/vol1/ras aflac:/vol/vol1/ras \
        /vol/vol1/soft aflac:/vol/vol1/soft \
        /vol/vol1/teton_lite aflac:/vol/vol1/teton_lite \
        /vol/vol1/thor aflac:/vol/vol1/thor \
        /vol/vol1/tigersharc aflac:/vol/vol1/tigersharc \
        /vol/vol2/9xcores aflac:/vol/vol2/9xcores \
        /vol/vol2/bitpower aflac:/vol/vol2/bitpower \
        /vol/vol2/bluetooth aflac:/vol/vol2/bluetooth \
        /vol/vol2/danube aflac:/vol/vol2/danube \
        /vol/vol2/DSPS_Finance aflac:/vol/vol2/DSPS_Finance \
        /vol/vol2 aflac:/vol/vol2 \
        /vol/vol2/gpdsp_marketing aflac:/vol/vol2/gpdsp_marketing \
        /vol/vol2/gpdsp_PLD aflac:/vol/vol2/gpdsp_PLD \
        /vol/vol2/india_mirror aflac:/vol/vol2/india_mirror \
        /vol/vol2/nile aflac:/vol/vol2/nile \
        /vol/vol2/nwd_layout aflac:/vol/vol2/nwd_layout \
        /vol/vol2/nwd_testmgr aflac:/vol/vol2/nwd_testmgr \
        /vol/vol2/pc_backups aflac:/vol/vol2/pc_backups \
        /vol/vol2/sh aflac:/vol/vol2/sh \
        /vol/vol2/spitfire aflac:/vol/vol2/spitfire \
        /vol/vol2/ulc3 aflac:/vol/vol2/ulc3 \
        /vol/vol2/win2k aflac:/vol/vol2/win2k \
        /vol/vol3/ad1983 aflac:/vol/vol3/ad1983 \
        /vol/vol3/archive aflac:/vol/vol3/archive \
        /vol/vol3/cad_archive aflac:/vol/vol3/cad_archive \
        /vol/vol3/design aflac:/vol/vol3/design \
        /vol/vol3/emerald aflac:/vol/vol3/emerald \
        /vol/vol3 aflac:/vol/vol3 \
        /vol/vol3/is_013std aflac:/vol/vol3/is_013std \
        /vol/vol3/layout_old aflac:/vol/vol3/layout_old \
        /vol/vol3/lx3 aflac:/vol/vol3/lx3 \
        /vol/vol3/ta aflac:/vol/vol3/ta


________________________________________________________________________
David Meleedy				Analog Devices, Inc.
David.Meleedy@analog.com		Three Technology Way
Phone: 781 461 3494			Norwood, MA  02062-9106  USA

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-01-12 16:13 ` Dwight Marzolf
@ 2005-01-12 20:55   ` David Meleedy
  0 siblings, 0 replies; 37+ messages in thread
From: David Meleedy @ 2005-01-12 20:55 UTC (permalink / raw)
  To: Dwight Marzolf; +Cc: autofs


> Dave,
> 
> >Now when I tried to do something similar, I found that if you weren't
> >on node1 or node2, the filesystem was read-only, so I had to do this:
> >
> >/vol/vol1	-rw=node1:node2,root=node1,node2
> >/vol/vol1/foo1	-root=node1:node2
> >/vol/vol1/foo2  -root=node1:node2
> 
> On this one here, the top line is correct but the other two lines should be:
> 
> /vol/vol1/foo1	-rw,root=node1:node2
> /vol/vol1/foo2  -rw,root=node1:node2
> 
> This way, the vol/vol1 dir does not mount when you cd to 
> /net/machine/vol/vol1 but the other two directories do mount and are 
> accessible by all workstations that need to read and write to it.  This 
> should work under both RedHat 8 and Enterprise 3.  Now, I don't know why 
> autofs4 seems to require the exports to be this way on a netapp box when 
> Solaris didn't seem to care but this is what is working for us.
> 
> Dwight Marzolf

I tried changing the syntax, as you suggested, and I got the following
error:

aflac> exportfs -a
export: No "=<hosts>" in "rw" option
exportfs: invalid option, /vol/vol1/IssueManager not exported
export: No "=<hosts>" in "rw" option
exportfs: invalid option, /vol/vol1/admin not exported

... etc ...

Here is the actual entry from the exports file on the filer:

/vol/vol1/IssueManager  -rw,root=sloth.spd.analog.com:zeus.spd.analog.com:chimc
h
im.spd.analog.com:chimchim-ge2.spd.analog.com:thrak.spd.analog.com:mr_coffee.sp
d
.analog.com:jetcar.spd.analog.com:topgun.spd.analog.com:cyclone.spd.analog.com

But my understanding anyway is that by default the permission for an
exported filesystem on the filer should be -rw anyway, so I'm not sure why you
would have to specify it.

-Dave

________________________________________________________________________
David Meleedy				Analog Devices, Inc.
David.Meleedy@analog.com		Three Technology Way
Phone: 781 461 3494			Norwood, MA  02062-9106  USA

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-12 14:50 ` raven
@ 2005-01-12 22:22   ` David Meleedy
  2005-01-12 23:01     ` Jeff Moyer
  0 siblings, 1 reply; 37+ messages in thread
From: David Meleedy @ 2005-01-12 22:22 UTC (permalink / raw)
  To: raven; +Cc: autofs


Answers to questions below:

> > Initially we started using the following software (Redhat Enterprise 3 update
> > 3)
> > autofs 4.1.3-12
> > kernel 2.4.21-20
> > nfs-utils 1.0.6-31EL
> 
> I don't have access to these kernel sources.
> That will be a problem as I don't know what autofs4 patches have been 
> applied. Jeff?

Well, I have also tried autofs 4.1.3-67 : Here is a complete list of
the patches installed on that version of autofs:

Patch1: autofs-4.1.0-hesiod-bind.patch
Patch2: autofs-4.1.0-loop.patch
Patch3: autofs-4.1.0-auto-master.patch
Patch4: autofs-4.1.2-init-redhat-only.patch
Patch5: autofs-4.1.3-non-strict-loop-fix.patch
Patch12: autofs-4.1.2-option-parsing.patch
Patch14: autofs-4.1.3-underlinei18n.patch
Patch15: autofs-4.1.3-rpc-ping.patch
Patch16: autofs-4.1.3-bad_chdir.patch
Patch17: autofs-4.1.3-mtab_lock.patch
#Patch18: autofs-4.1.3-ian-map-expiry-1.patch
Patch19: autofs-4.1.3-disable-direct.patch
Patch20: autofs-4.1.3-umount-loopback.patch
Patch21: autofs-4.1.3-localopts-multi.patch
Patch22: autofs-4.1.2-init-duplicate-map.patch
Patch23: autofs-4.1.3-filemap-etc-append.patch
Patch24: autofs-4.1.3-ldap-search-limit.patch
Patch25: autofs-4.1.3-replicated_server_select.patch
Patch26: autofs-4.1.3-browse.patch
Patch27: autofs-4.1.3-sock-leak-fix.patch
Patch28: autofs-4.1.3-no-reserved-ports.patch
Patch29: autofs-4.1.3-ldap-multiple-map.patch
Patch30: autofs-4.1.3-large-program-map.patch


> You really should add util-linux to the list of packages to consider in 
> the investigation. It may contain a patch which probes NFS servers and 
> opens a number of connections for each mount.

So far, haven't found that, but maybe after the list of patches I sent
is examined, we'll know for sure.

> >
> > WHAT HAS BEEN TRIED SO FAR:
> >
> > Mike Waychison, after seeing the messages from our log file said,
> >
> > "These messages are due to starvation for reserved ports (< 1024).
> > Specifically, the kernel will only use ports < 800.  Currently, the
> > kernel uses one port per nfs filesystem.  If you mount filesystems very
> > fast, then you can also run out of reserved ports as the local (mountd
> > iirc?) will close tcp sessions and each must wait 2 minutes before being
> > released.
> >
> > One solution is to try out the patch I posted last week that allows nfs
> > mounts to share tcp/udp connections:
> >
> > http://marc.theaimsgroup.com/?l=linux-nfs&m=110261671705396&w=2
> > "
> >
> > The problem is we are using a different version of the kernel 2.4,
> > and his patch was for the 2.6 kernel.  Also, although his patch
> > might make the number of ports available increase, I think it does
> > not really solve the problem, it just gives more breathing room.
> 
> I'm not sure about that.
> 
> The multiplexing of the RPC transport would probably provide a solid 
> solution to your problem by the sound of things. The patches I mentioned 
> above were done against 2.4.22 and 2.6.0.
> 
> Problem here is that to get a working patch will probably take a while, so 
> we probably need a workaround in the mean time.

Mike sent me a patch for 2.4.21-20.EL that I will test in the near
future.  However, I know that Redhat also has an "up2date" patch
available for the kernel already, so ultimately we need to get the
patch applied, if it works.

> > After talking with Jeff Moyer about the issue, I updated autofs to
> > autofs-4.1.3-67.  This was supposed to incorporate a patch that fixes
> > the port leak problem.
> 
> Certainly a bug, but not the heart of your problem I'm afraid.

agreed.

> >
> > This did not solve the problem, but it did seem to improve things a bit.
> >
> > After looking at Dwight Marzolf's document on his workaround I found
> > the following information (this is exactly the same sort of thing we
> > are seeing too):
> >
> > "
> > we quickly found that if you did a cd via /net to one of our Network
> > Appliance filers (all our other netapp filers worked correctly when
> > unmounting /net mounts), the port release issue still existed.  In
> > fact, the mountpoints actively took more ports.  This meant that if you
> > mounted this filer with /net, your workstation could be rendered
> > useless in less than 24 hours.  It also became evident that this active
> > taking of ports by this filer was not limited to just autofs-4.1.3-28
> > but also earlier versions of autofs  ...  Further
> > research revealed the ports were being taken at the point of automount
> > timeout.  When the automounter had declared these mountpoints to be
> > timed out and ready to be unmounted and attempted to umount them, in
> > fact, it ended up remounting them, using new ports for the remount ...
> > "
> 
> Do you have any messages on in the log on the server side like:
> 
> Jan 10 22:01:36 budgie-wl rpc.mountd: refused unmount request from 
> raven-wl.themaw.net for /usr/local/sbin (/usr/local/sbin): illegal port 
> 36233
> 
> This indicates that the client has been patched to use non-priveledged 
> ports to increase the number of available ports but the NFS server has 
> not.
> 
> Just wondering?


Unfortunately not.  Our Netapp fileserver is not a unix system,
so it does not run rpc.mountd.  

> >
> > HOW TO REPRODUCE THE PROBLEM:
> >
> > Actually in our case we can render a machine useless in just about an
> > hour or two, and this happens for all of our Netapp filers.  The procedure
> > to do this is reproducible.
> >
> > 1) You cd to a /net directory on the filer.
> > 2) Leave the shell in that /net directory for about 15 minutes-> 1/2 an hour.
> > and watch the "BUG" messages in the /var/log/messages file.
> >
> > 3) Log out. (so the automounter tries to unmount everything that was mounted).
> > 4) Log in again, after 30 minutes and by then you won't be about to
> > mount anything anymore
> >
> > You can replace steps 3 and 4 with "init 6".  When the automounter process
> > is stopped by init, you will see the port messages scroll up the console
> > screen.
> >
> > EXAMPLE OF REPRODUCING THE PROBLEM:
> >
> > codered-51: cd /net/aflac/vol/vol2
> > ( I can't help but wonder if this BUG message that shows up once a minute
> > is indicative of a problem )
> >
> > codered-52: tail -f /var/log/messages
> > Jan 11 15:32:37 codered automount[6214]: attempting to mount entry /net/aflac
> > Jan 11 15:33:41 codered automount[7915]: BUG: /net/aflac/vol/vol2 already
> > mounted
> > Jan 11 15:34:42 codered automount[8049]: BUG: /net/aflac/vol/vol2 already
> > mounted
> > Jan 11 15:36:42 codered automount[8311]: BUG: /net/aflac/vol/vol2 already
> > mounted
> > Jan 11 15:37:43 codered automount[8441]: BUG: /net/aflac/vol/vol2 already
> > mounted
> 
> Seen that lately. Definutely want to get to the bottom of this.
> 
> I don't yet understand why autofs is getting requests to mount an already 
> mounted file system. Even in a hostile situation autofs needs to deal 
> with this properly.

Well, one thing I am noticing is that it can never unmount or expire
/net/aflac once it is mounted.  So maybe during each 1 minute
timeout, it tries to expire the mount, and then when it fails, it tries
to assert again that it is already mounted by trying to remount it?

i.e. what really happens in the code if a mount expiration is thought
to be successfull but failed, or is thought to fail.

> In the past I observed that this might have been somehow related to 
> corruption in /etc/mtab.

Then this would have to be a consistent corruption across many machines.
This happens on over 8 Redhat Enterprise 3 clients (I have a large
testbed here).

> > ... (continues once a minute to print out this bug) ...
> > codered-53: sudo init 6
> > (after reboot log in to see error messages)
> >
> > THE REALLY WEIRD PART:
> > Now the interesting thing here is that the machine is rebooting, so
> > there is no program requesting additional mounts, yet here in the log
> > files you can see that almost every subdirectory of /vol/vol2, /vol/vol3
> > and /vol/vol3 are attempted to be mounted, even though the only
> > thing that should be happening is an unmount of the directory aflac:/vol/vol2
> >
> > jetcar-189: cd /net/aflac/vol/vol3
> > jetcar-190: ls
> > ad1983/      cad_archive/ emerald/     layout_old/  ta/
> > archive/     design/      is_013std/   lx3/
> > jetcar-191: cd ../vol2
> > jetcar-192: ls
> > 9xcores/         danube/          nwd_layout/      ulc3/
> > DSPS_Finance/    gpdsp_PLD/       nwd_testmgr/     win2k/
> > WWM/             gpdsp_marketing/ pc_backups/
> > bitpower/        india_mirror/    sh/
> > bluetooth/       nile/            spitfire/
> > jetcar-194: cd ../vol1
> > etcar-195: ls
> > IssueManager/ diablo/       is_013std/    ras/          tigersharc/
> > admin/        ed/           jordan/       soft/
> > archive/      fsp/          nwd_fsp@      teton_lite/
> > cpd/          herc_eval/    pe_workspace/ thor/
> >
> >
> > codered-54: less /var/log/messages
> > Jan 11 15:51:14 codered automount[6214]: can't shutdown: filesystem /net still
> > busy
> > Jan 11 15:51:17 codered autofs: automount -USR2 succeeded
> > Jan 11 15:51:19 codered automount[6214]: can't shutdown: filesystem /net still
> > busy
> > Jan 11 15:51:20 codered autofs: automount -USR2 succeeded
> > Jan 11 15:51:23 codered autofs: automount -USR2 succeeded
> > Jan 11 15:51:26 codered autofs: automount -USR2 succeeded
> > Jan 11 15:51:26 codered automount[6214]: can't shutdown: filesystem /net still
> > busy
> > Jan 11 15:51:28 codered automount[14708]: >> mount: wrong fs type, bad option,
> > bad superblock on aflac:/vol/vol2/spitfire,
> > Jan 11 15:51:28 codered automount[14708]: >>        or too many mounted file
> > sys
> > tems
> > Jan 11 15:51:28 codered automount[14708]: mount(nfs): nfs: mount failure
> > aflac:/
> > vol/vol2/spitfire on /net/aflac/vol/vol2/spitfire
> > Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
> > Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
> > Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
> > Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
> > Jan 11 15:51:28 codered kernel: nfs_read_super: get root inode failed
> > Jan 11 15:51:28 codered kernel: nfs warning: mount version older than kernel
> > Jan 11 15:51:28 codered kernel: RPC: Can't bind to reserved port (98).
> > Jan 11 15:51:28 codered kernel: nfs_get_root: getattr error = 5
> > Jan 11 15:51:28 codered kernel: nfs_read_super: get root inode failed
> 
> Looks like you've run out of priviledged port space here, at least the 
> ones that RPC is trying to use.
> 
> snip ...


yup.

> >
> > HOW IT WAS FIXED IN REDHAT 8:
> >
> > Dwight had implemented his fix in 3 steps for Redhat 8:
> > 1) He updated his autofs to autofs-4.1.3-28 which had the port leak fix
> > 2) He patched his kernel with the autofs4-2.4.20-20040508.patch
> > (is some equivalent patch needed for Redhat 3 Enterprise 3 which uses
> > kernel 2.4.21-20 ?
> > 3) He changed the way he exported filesystems from the Netapp:
> >
> > "The last issue was the matter of how /vol/vol0 is exported from a
> > Network Appliance filer.  We found that the following exports broke
> > autofs4:
> >
> > /vol/vol0     -root=node1:node2:node3:node4
> > /vol/vol0     -rw,root=node1:node2:node3
> > /vol/vol0     -anon=0
> >
> > The export syntax that worked was:
> >
> > /vol/vol0       -rw=node1:node2,root=node1,node2
> > "
> 
> This is a bug in the option parsing. I'll need to fix that.

Well, keep in mind that these are options in our exports file
on our Netapp filer, not a linux machine, so perhaps not an issue for you.


> >
> > WHAT HAPPENED WHEN I TRIED THE REDHAT 8 WORKAROUND:
> >
> > Now when I tried to do something similar, I found that if you weren't
> > on node1 or node2, the filesystem was read-only, so I had to do this:
> >
> > /vol/vol1	-rw=node1:node2,root=node1,node2
> > /vol/vol1/foo1	-root=node1:node2
> > /vol/vol1/foo2  -root=node1:node2
> >
> > This way if you cd /net/filer/vol/vol1 it was read-only for most machines
> > but if you cd'd to /net/filer/vol/vol1/foo1 it was read-write.
> >
> > So using that Netapp export workaround that fixed the Redhat 8 autofs4 problem,
> > plus using autofs-4.1.3-67 has not yet solved the problem yet for our
> > Redhat Enterprise 3 clients.
> >
> > CONCLUSION:
> >
> > I hope this is enough info to track down this problem.  It appears
> > as though the interaction of using /net with a Netapp is causing
> > spurious mounts, and unmounting is not working.  I will assist with
> > any patch tests that you require, so let me know, and I will be able
> > to verify any fixes.
> 
> Might be a bit of a long road here but we'll have to see how we go.
> 
> btw, on average, how many exports do you have on a filer?
> 
> Regards
> Ian
> 


________________________________________________________________________
David Meleedy				Analog Devices, Inc.
David.Meleedy@analog.com		Three Technology Way
Phone: 781 461 3494			Norwood, MA  02062-9106  USA

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-12 22:22   ` David Meleedy
@ 2005-01-12 23:01     ` Jeff Moyer
  0 siblings, 0 replies; 37+ messages in thread
From: Jeff Moyer @ 2005-01-12 23:01 UTC (permalink / raw)
  To: David Meleedy; +Cc: autofs, raven

==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; David Meleedy <david.meleedy@analog.com> adds:

david.meleedy> Answers to questions below:

>> > Initially we started using the following software (Redhat Enterprise 3
>> update
>> > 3)
>> > autofs 4.1.3-12 > kernel 2.4.21-20 > nfs-utils 1.0.6-31EL
>> 
>> I don't have access to these kernel sources.  That will be a problem as
>> I don't know what autofs4 patches have been applied. Jeff?

These sources should be up to date with all of your patches except the
autofs4_lookup fixes that you made for the map update functionality.


david.meleedy> Well, I have also tried autofs 4.1.3-67 : Here is a complete
david.meleedy> list of the patches installed on that version of autofs:

david.meleedy> Patch30: autofs-4.1.3-large-program-map.patch

I am concerned about the autofs-4.1.3-large-program-map.patch.  There was a
bug in that, and I want to ensure you have the most recent version,
especially since you're using /net.  (the -67 rpm you have has the buggy
version).

I've attached the correct version of the patch.

-Jeff

--- autofs-4.1.3/modules/lookup_program.c.orig	2004-12-20 10:05:12.272654616 -0500
+++ autofs-4.1.3/modules/lookup_program.c	2004-12-20 10:04:21.336398104 -0500
@@ -84,7 +84,7 @@ int lookup_ghost(const char *root, int g
 int lookup_mount(const char *root, const char *name, int name_len, void *context)
 {
 	struct lookup_context *ctxt = (struct lookup_context *) context;
-	char mapent[MAPENT_MAX_LEN + 1], *mapp;
+	char *mapent, *mapp, *tmp;
 	char errbuf[1024], *errp;
 	char ch;
 	int pipefd[2], epipefd[2];
@@ -94,11 +94,19 @@ int lookup_mount(const char *root, const
 	fd_set readfds, ourfds;
 	enum state { st_space, st_map, st_done } state;
 	int quoted = 0;
-	int ret;
+	int ret = 1;
 	int max_fd;
+	int distance;
+	int alloci = 1;
 
 	debug(MODPREFIX "looking up %s", name);
 
+	mapent = (char *)malloc(MAPENT_MAX_LEN + 1);
+	if (!mapent) {
+		error(MODPREFIX "malloc: %s\n", strerror(errno));
+		return 1;
+	}
+
 	/*
 	 * We don't use popen because we don't want to run /bin/sh plus we
 	 * want to send stderr to the syslog, and we don't use spawnl()
@@ -107,12 +115,12 @@ int lookup_mount(const char *root, const
 
 	if (pipe(pipefd)) {
 		error(MODPREFIX "pipe: %m");
-		return 1;
+		goto out_free;
 	}
 	if (pipe(epipefd)) {
 		close(pipefd[0]);
 		close(pipefd[1]);
-		return 1;
+		goto out_free;
 	}
 
 	f = fork();
@@ -122,7 +130,7 @@ int lookup_mount(const char *root, const
 		close(epipefd[0]);
 		close(epipefd[1]);
 		error(MODPREFIX "fork: %m");
-		return 1;
+		goto out_free;
 	} else if (f == 0) {
 		reset_signals();
 		close(pipefd[0]);
@@ -177,21 +185,44 @@ int lookup_mount(const char *root, const
 				if (!quoted && ch == '\n') {
 					*mapp = '\0';
 					state = st_done;
-				} else if (mapp - mapent < MAPENT_MAX_LEN - 1) {
-					/* 
-					 * Eat \ quoting \n, otherwise pass it
-					 * through for the parser
+					break;
+				}
+
+				/* We overwrite up to 3 characters, so we
+				 * need to make sure we have enough room
+				 * in the buffer for this. */
+				/* else */
+				if (mapp - mapent > 
+				    ((MAPENT_MAX_LEN+1) * alloci) - 3) {
+					/*
+					 * Alloc another page for map entries.
 					 */
-					if (quoted) {
-						if (ch == '\n')
-							*mapp++ = ' ';
-						else {
-							*mapp++ = '\\';
-							*mapp++ = ch;
-						}
-					} else
-						*mapp++ = ch;
+					distance = mapp - mapent;
+					tmp = realloc(mapent,
+						      ((MAPENT_MAX_LEN + 1) * 
+						       ++alloci));
+					if (!tmp) {
+						alloci--;
+						error(MODPREFIX "realloc: %s\n",
+						      strerror(errno));
+						break;
+					}
+					mapent = tmp;
+					mapp = tmp + distance;
 				}
+				/* 
+				 * Eat \ quoting \n, otherwise pass it
+				 * through for the parser
+				 */
+				if (quoted) {
+					if (ch == '\n')
+						*mapp++ = ' ';
+					else {
+						*mapp++ = '\\';
+						*mapp++ = ch;
+					}
+				} else
+					*mapp++ = ch;
 				break;
 			case st_done:
 				/* Eat characters till there's no more output */
@@ -233,18 +264,20 @@ int lookup_mount(const char *root, const
 
 	if (waitpid(f, &status, 0) != f) {
 		error(MODPREFIX "waitpid: %m");
-		return 1;
+		goto out_free;
 	}
 
 	if (mapp == mapent || !WIFEXITED(status) || WEXITSTATUS(status) != 0) {
 		error(MODPREFIX "lookup for %s failed", name);
-		return 1;
+		goto out_free;
 	}
 
 	debug(MODPREFIX "%s -> %s", name, mapent);
 
 	ret = ctxt->parse->parse_mount(root, name, name_len,
 				       mapent, ctxt->parse->context);
+out_free:
+	free(mapent);
 	return ret;
 }

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-12 16:55   ` Mike Waychison
  2005-01-12 20:43     ` David Meleedy
@ 2005-01-13  0:37     ` David Meleedy
  2005-01-13  1:05       ` Mike Waychison
                         ` (2 more replies)
  2005-01-13  8:13     ` Ian Kent
  2 siblings, 3 replies; 37+ messages in thread
From: David Meleedy @ 2005-01-13  0:37 UTC (permalink / raw)
  To: Mike Waychison; +Cc: autofs, Ian Kent


Mike,
	I just recompiled my kernel with your xprt-sharing.patch.
Although this did fix the port error problems, it did not fix
the automounter problem.  So I think your patch should be incorporated
into Redhat Enterprise 3 (2.4 kernel) because it appears to work.

I think the problem is that the automounter just cannot unmount
the /net/aflac directory, and ends up trying to remount it instead,
here are the log files after the port patch that Mike gave me:
(this is during the reboot after cd'ing to /net/aflac/vol/vol2)

Jan 12 19:23:36 codered automount[5396]: can't shutdown: filesystem /net still 
busy
Jan 12 19:23:38 codered autofs: automount -USR2 succeeded
Jan 12 19:23:41 codered automount[5396]: can't shutdown: filesystem /net still 
busy
... those lines keep repeating until ....

Jan 12 19:24:08 codered automount[16092]: >> mount table full
Jan 12 19:24:08 codered automount[16092]: mount(nfs): nfs: mount failure 
aflac:/vol/vol3/cad_archive on /net/aflac/vol/vol3/cad_archive
Jan 12 19:24:08 codered automount[16092]: >> mount table full
Jan 12 19:24:08 codered automount[16092]: mount(nfs): nfs: mount failure 
aflac:/vol/vol3/design on /net/aflac/vol/vol3/design
... those lines repeart for each subdirectory of the volumes ...

Jan 12 19:24:09 codered autofs: automount shutdown failed

As you can see, it keeps trying to unmount /net, and eventually
fills up the mount table because it instead remounts it.  Before,
when the port issue was a problem, it wouldn't get far enough to
fill the mount table, but now it can (thanks Mike!)

-Dave

________________________________________________________________________
David Meleedy				Analog Devices, Inc.
David.Meleedy@analog.com		Three Technology Way
Phone: 781 461 3494			Norwood, MA  02062-9106  USA

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-01-13  0:37     ` David Meleedy
@ 2005-01-13  1:05       ` Mike Waychison
  2005-01-13  1:07       ` Ian Kent
  2005-01-14 14:35       ` raven
  2 siblings, 0 replies; 37+ messages in thread
From: Mike Waychison @ 2005-01-13  1:05 UTC (permalink / raw)
  To: David Meleedy; +Cc: autofs, Ian Kent

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Meleedy wrote:
> Mike,
> 	I just recompiled my kernel with your xprt-sharing.patch.
> Although this did fix the port error problems, it did not fix
> the automounter problem.  So I think your patch should be incorporated
> into Redhat Enterprise 3 (2.4 kernel) because it appears to work.


There is a bit of hold-back from the NFS guys for getting this included
mainstream (what Red Hat does is their own business of course).

The reason does make sense.. there is a static number of request slots
available per transport session in the current rpc code.  Translated:
Only 16 io's are capable of being in flight for a given tcp stream with
this patch.

There is concern that this may impact performance numbers for NFS,
though I haven't yet seen hard numbers.  The good news is that there has
been extensive work done as well to allow for dynamic scaling of the
available request slots as part of the rpc transport switch code
happening at UMich.  I haven't had the chance to play with it though and
don't know if it is ready for prime time.  The likelyhood of it ever
making its way into RHEL3 is pretty low, but I can't speak for others.

> 
> I think the problem is that the automounter just cannot unmount
> the /net/aflac directory, and ends up trying to remount it instead,
> here are the log files after the port patch that Mike gave me:
> (this is during the reboot after cd'ing to /net/aflac/vol/vol2)
> 
> Jan 12 19:23:36 codered automount[5396]: can't shutdown: filesystem /net still 
> busy
> Jan 12 19:23:38 codered autofs: automount -USR2 succeeded
> Jan 12 19:23:41 codered automount[5396]: can't shutdown: filesystem /net still 
> busy
> ... those lines keep repeating until ....
> 
> Jan 12 19:24:08 codered automount[16092]: >> mount table full
> Jan 12 19:24:08 codered automount[16092]: mount(nfs): nfs: mount failure 
> aflac:/vol/vol3/cad_archive on /net/aflac/vol/vol3/cad_archive
> Jan 12 19:24:08 codered automount[16092]: >> mount table full
> Jan 12 19:24:08 codered automount[16092]: mount(nfs): nfs: mount failure 
> aflac:/vol/vol3/design on /net/aflac/vol/vol3/design
> ... those lines repeart for each subdirectory of the volumes ...
> 
> Jan 12 19:24:09 codered autofs: automount shutdown failed
> 
> As you can see, it keeps trying to unmount /net, and eventually
> fills up the mount table because it instead remounts it.  Before,
> when the port issue was a problem, it wouldn't get far enough to
> fill the mount table, but now it can (thanks Mike!)
> 
> -Dave
> 
> ________________________________________________________________________
> David Meleedy				Analog Devices, Inc.
> David.Meleedy@analog.com		Three Technology Way
> Phone: 781 461 3494			Norwood, MA  02062-9106  USA
> 
> 


- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB5clmdQs4kOxk3/MRAo8LAJ9ud4FiaaXKEM9wdQQqyBxegiz8AQCeNvFF
+ZRiL2/tnbqd5BT78pFrAf4=
=aNy9
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-13  0:37     ` David Meleedy
  2005-01-13  1:05       ` Mike Waychison
@ 2005-01-13  1:07       ` Ian Kent
  2005-01-14 14:35       ` raven
  2 siblings, 0 replies; 37+ messages in thread
From: Ian Kent @ 2005-01-13  1:07 UTC (permalink / raw)
  To: David Meleedy; +Cc: autofs, Mike Waychison

On Wed, 12 Jan 2005, David Meleedy wrote:

> 
> Mike,
> 	I just recompiled my kernel with your xprt-sharing.patch.
> Although this did fix the port error problems, it did not fix
> the automounter problem.  So I think your patch should be incorporated
> into Redhat Enterprise 3 (2.4 kernel) because it appears to work.
> 
> I think the problem is that the automounter just cannot unmount
> the /net/aflac directory, and ends up trying to remount it instead,
> here are the log files after the port patch that Mike gave me:
> (this is during the reboot after cd'ing to /net/aflac/vol/vol2)
> 
> Jan 12 19:23:36 codered automount[5396]: can't shutdown: filesystem /net still 
> busy
> Jan 12 19:23:38 codered autofs: automount -USR2 succeeded
> Jan 12 19:23:41 codered automount[5396]: can't shutdown: filesystem /net still 
> busy
> ... those lines keep repeating until ....
> 
> Jan 12 19:24:08 codered automount[16092]: >> mount table full
> Jan 12 19:24:08 codered automount[16092]: mount(nfs): nfs: mount failure 
> aflac:/vol/vol3/cad_archive on /net/aflac/vol/vol3/cad_archive
> Jan 12 19:24:08 codered automount[16092]: >> mount table full
> Jan 12 19:24:08 codered automount[16092]: mount(nfs): nfs: mount failure 
> aflac:/vol/vol3/design on /net/aflac/vol/vol3/design
> ... those lines repeart for each subdirectory of the volumes ...

Could I have the full log of this please.
From automount start to destruction.

> 
> Jan 12 19:24:09 codered autofs: automount shutdown failed

Does RHEL3 have the more unnamed patch applied?

> 
> As you can see, it keeps trying to unmount /net, and eventually
> fills up the mount table because it instead remounts it.  Before,
> when the port issue was a problem, it wouldn't get far enough to
> fill the mount table, but now it can (thanks Mike!)

Yes, I know that bit of code.

I've always been suspicious of trying to cover up a failure by trying to 
go back the way you came but I left it like that as I didn't have a better 
idea for how to handle the situation.

Ian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-01-12 16:55   ` Mike Waychison
  2005-01-12 20:43     ` David Meleedy
  2005-01-13  0:37     ` David Meleedy
@ 2005-01-13  8:13     ` Ian Kent
  2 siblings, 0 replies; 37+ messages in thread
From: Ian Kent @ 2005-01-13  8:13 UTC (permalink / raw)
  To: Mike Waychison; +Cc: autofs, David Meleedy

On Wed, 12 Jan 2005, Mike Waychison wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Ian Kent wrote:
> > Just a quick note before we get deep into this.
> > 
> > Can you check something for me.
> > Get the source rpm for util-linux.
> > Check if there is a patch applied to it to probe for services during 
> > mount (it was a patch in FC). If it is rebuild the rpm without it and test 
> > again.
> > 
> > On Tue, 11 Jan 2005, David Meleedy wrote:
> > 
> > 
> >>Hi Ian & Jeff,
> >>	I am trying to track down an autofs issue that has been
> >>plaguing us.  It seems to be caused by the interaction of autofs version
> >>4 with a Network Appliance server, and cd'ing to /net directories
> >>on the Netapp server.
> >>
> >>A similar issue was seen in Analog Devices in Redhat 8, and apparently
> >>the problem was worked around by Dwight Marzolf working with Ian Kent's
> >>help.  So following what Dwight did I have been trying to recreate the fix
> >>for Redhat Enterprise 3 update 3, and so far have not met with success.
> >>
> >>THE PROBLEM DESCRIPTION:
> >>
> >>Autofs hangs and refuses to mount any directories for a period of time
> >>after cd'ing to /net/<Netapp>/vol/vol[0-3] and waiting a while.
> >>The only way to clear this is to reboot the client.
> >>
> >>Initially we started using the following software (Redhat Enterprise 3 update 
> >>3)
> >>autofs 4.1.3-12
> >>kernel 2.4.21-20
> >>nfs-utils 1.0.6-31EL
> >>
> >>WHAT HAS BEEN TRIED SO FAR:
> >>
> >>Mike Waychison, after seeing the messages from our log file said,
> >>
> >>"These messages are due to starvation for reserved ports (< 1024).
> >>Specifically, the kernel will only use ports < 800.  Currently, the
> >>kernel uses one port per nfs filesystem.  If you mount filesystems very
> >>fast, then you can also run out of reserved ports as the local (mountd
> >>iirc?) will close tcp sessions and each must wait 2 minutes before being
> >>released.
> >>
> >>One solution is to try out the patch I posted last week that allows nfs
> >>mounts to share tcp/udp connections:
> >>
> >>http://marc.theaimsgroup.com/?l=linux-nfs&m=110261671705396&w=2
> >>"
> >>
> >>The problem is we are using a different version of the kernel 2.4,
> >>and his patch was for the 2.6 kernel.  Also, although his patch
> >>might make the number of ports available increase, I think it does
> >>not really solve the problem, it just gives more breathing room.
> 
> Well, it will pretty much guarantee only one port is used for any given
> filer for talking to the nfs program.  Other ports are still used
> temporarily to talk to mountd and the portmapper.

Both of these are a significant problem as well from what I've seen in 
the netstat output.

> 
> I've attached patch that applies cleanly to 2.4.21-20.EL, though I
> haven't had the chance to test it other than by compiling it.
> 
> >>
> >>After talking with Jeff Moyer about the issue, I updated autofs to 
> >>autofs-4.1.3-67.  This was supposed to incorporate a patch that fixes
> >>the port leak problem.
> >>
> >>This did not solve the problem, but it did seem to improve things a bit.
> >>
> >>After looking at Dwight Marzolf's document on his workaround I found
> >>the following information (this is exactly the same sort of thing we
> >>are seeing too):
> >>
> >>"
> >>we quickly found that if you did a cd via /net to one of our Network
> >>Appliance filers (all our other netapp filers worked correctly when
> >>unmounting /net mounts), the port release issue still existed.  In
> >>fact, the mountpoints actively took more ports.  This meant that if you
> >>mounted this filer with /net, your workstation could be rendered
> >>useless in less than 24 hours.  It also became evident that this active
> >>taking of ports by this filer was not limited to just autofs-4.1.3-28
> >>but also earlier versions of autofs  ...  Further
> >>research revealed the ports were being taken at the point of automount
> >>timeout.  When the automounter had declared these mountpoints to be
> >>timed out and ready to be unmounted and attempted to umount them, in
> >>fact, it ended up remounting them, using new ports for the remount ...
> >>"
> >>
> 
> Out of curiosity, can we see the output of showmount -e against your filer?
> 
> - --
> Mike Waychison
> Sun Microsystems, Inc.
> 1 (650) 352-5299 voice
> 1 (416) 202-8336 voice
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NOTICE:  The opinions expressed in this email are held by me,
> and may not represent the views of Sun Microsystems, Inc.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
> 
> iD8DBQFB5VaHdQs4kOxk3/MRAvvhAJ4uOaMXMTE4rjZ6ivLrbyeowcZkuACfdshX
> yBzl0PSwvsMaQZgKelhmrd4=
> =vjuL
> -----END PGP SIGNATURE-----
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-13  0:37     ` David Meleedy
  2005-01-13  1:05       ` Mike Waychison
  2005-01-13  1:07       ` Ian Kent
@ 2005-01-14 14:35       ` raven
  2005-01-14 22:38         ` David Meleedy
  2005-01-17 14:01         ` raven
  2 siblings, 2 replies; 37+ messages in thread
From: raven @ 2005-01-14 14:35 UTC (permalink / raw)
  To: David Meleedy; +Cc: autofs mailing list, Mike Waychison

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2540 bytes --]


Hi Mike,

I have attached a patch I'd like you to try please.

I think I`ve been able to duplicate the problem here between my Gentoo 
2.4 kernel and my Debian 2.4 kernel systems. I've had trouble with NFS 
export permissions between my FC2 2.6 and the Debian 2.4 systems so I 
can't be sure about it.

I've used the patch, as an additional patch after all the others, to build 
rpms with Jefs' autofs-4.1.3-16 and autofs-4.1.3-55.1 source rpms so I 
hope it will apply OK to the rpm you are using.

On Wed, 12 Jan 2005, David Meleedy wrote:

>
> Mike,
> 	I just recompiled my kernel with your xprt-sharing.patch.
> Although this did fix the port error problems, it did not fix
> the automounter problem.  So I think your patch should be incorporated
> into Redhat Enterprise 3 (2.4 kernel) because it appears to work.
>
> I think the problem is that the automounter just cannot unmount
> the /net/aflac directory, and ends up trying to remount it instead,
> here are the log files after the port patch that Mike gave me:
> (this is during the reboot after cd'ing to /net/aflac/vol/vol2)
>
> Jan 12 19:23:36 codered automount[5396]: can't shutdown: filesystem /net still
> busy
> Jan 12 19:23:38 codered autofs: automount -USR2 succeeded
> Jan 12 19:23:41 codered automount[5396]: can't shutdown: filesystem /net still
> busy
> ... those lines keep repeating until ....
>
> Jan 12 19:24:08 codered automount[16092]: >> mount table full
> Jan 12 19:24:08 codered automount[16092]: mount(nfs): nfs: mount failure
> aflac:/vol/vol3/cad_archive on /net/aflac/vol/vol3/cad_archive
> Jan 12 19:24:08 codered automount[16092]: >> mount table full
> Jan 12 19:24:08 codered automount[16092]: mount(nfs): nfs: mount failure
> aflac:/vol/vol3/design on /net/aflac/vol/vol3/design
> ... those lines repeart for each subdirectory of the volumes ...
>
> Jan 12 19:24:09 codered autofs: automount shutdown failed
>
> As you can see, it keeps trying to unmount /net, and eventually
> fills up the mount table because it instead remounts it.  Before,
> when the port issue was a problem, it wouldn't get far enough to
> fill the mount table, but now it can (thanks Mike!)
>
> -Dave
>
> ________________________________________________________________________
> David Meleedy				Analog Devices, Inc.
> David.Meleedy@analog.com		Three Technology Way
> Phone: 781 461 3494			Norwood, MA  02062-9106  USA
>
>
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs
>

[-- Attachment #2: Type: TEXT/PLAIN, Size: 3834 bytes --]

--- autofs-4.1.3/modules/parse_sun.c.multi-over	2005-01-14 19:44:39.000000000 +0800
+++ autofs-4.1.3/modules/parse_sun.c	2005-01-14 20:36:17.000000000 +0800
@@ -55,6 +55,13 @@
 	int slashify_colons;	/* Change colons to slashes? */
 };
 
+struct multi_mnt {
+	char *path;
+	char *options;
+	char *location;
+	struct multi_mnt *next;
+};
+
 struct utsname un;
 char processor[65];		/* Not defined on Linux, so we make our own */
 
@@ -609,6 +616,69 @@
 }
 
 /*
+ * Build list of mounts in shortest -> longest order.
+ * Pass in list head and return list head.
+ */
+struct multi_mnt *multi_add_list(struct multi_mnt *list,
+				 char *path, char *options, char *location)
+{
+	struct multi_mnt *mmptr, *new, *old = NULL;
+	int plen;
+
+	if (!path || !options || !location)
+		return NULL;
+
+	new = malloc(sizeof(struct multi_mnt));
+	if (!new)
+		return NULL;
+
+	new->path = path;
+	new->options = options;
+	new->location = location;
+
+	plen = strlen(path);
+	mmptr = list;
+	while (mmptr) {
+		if (plen <= strlen(mmptr->path))
+			break;
+		old = mmptr;
+		mmptr = mmptr->next;
+	}
+
+	if (old)
+		old->next = new;
+	new->next = mmptr;
+
+	return old ? list : new;
+}
+
+void multi_free_list(struct multi_mnt *list)
+{
+	struct multi_mnt *next;
+
+	if (!list)
+		return;
+
+	next = list;
+	while (next) {
+		struct multi_mnt *this = next;
+
+		next = this->next;
+
+		if (this->path)
+			free(this->path);
+
+		if (this->options)
+			free(this->options);
+
+		if (this->location)
+			free(this->location);
+
+		free(this);
+	}
+}
+
+/*
  * syntax is:
  *	[-options] location [location] ...
  *	[-options] [mountpoint [-options] location [location] ... ]...
@@ -661,6 +731,7 @@
 	debug(MODPREFIX "gathered options: %s", options);
 
 	if (*p == '/') {
+		struct multi_mnt *list, *head = NULL, *next;
 		int l;
 		char *multi_root;
 
@@ -684,11 +755,11 @@
 			if (myoptions == NULL) {
 				error(MODPREFIX "multi strdup: %m");
 				free(options);
+				multi_free_list(head);
 				return 1;
 			}
 
 			path = dequote(p, l = chunklen(p, 0));
-			pathlen = strlen(path);
 
 			p += l;
 			p = skipspace(p);
@@ -706,6 +777,7 @@
 						    "multi concat_options: %m");
 						free(options);
 						free(path);
+						multi_free_list(head);
 						return 1;
 					}
 					p = skipspace(p);
@@ -721,28 +793,42 @@
 			l = q - p;
 
 			loc = dequote(p, l);
-			loclen = strlen(loc);
-
 			if (loc == NULL || path == NULL) {
 				error(MODPREFIX "out of memory");
 				free(loc);
 				free(path);
 				free(options);
+				free(myoptions);
+				multi_free_list(head);
 				return 1;
 			}
 
 			p += l;
 			p = skipspace(p);
 
+			list = head;
+			head = multi_add_list(list, path, myoptions, loc);
+			if (!head) {
+				free(loc);
+				free(path);
+				free(options);
+				free(myoptions);
+				multi_free_list(head);
+				return 1;
+			}
+		} while (*p == '/');
+
+		next = head;
+		while (next) {
 			debug(MODPREFIX
 			      "multimount: %.*s on %.*s with options %s",
-			      loclen, loc, pathlen, path, myoptions);
+			      strlen(next->location), next->location,
+			      strlen(next->path), next->path, next->options);
 
-			rv = sun_mount(multi_root, path, pathlen, loc, loclen,
-				       myoptions);
-			free(path);
-			free(loc);
-			free(myoptions);
+			rv = sun_mount(multi_root,
+				       next->path, strlen(next->path),
+				       next->location, strlen(next->location),
+				       next->options);
 
 			/* Convert non-strict failure into success */
 			if (rv < 0) {
@@ -751,7 +837,10 @@
 			} else if (rv > 0)
 				break;
 
-		} while (*p == '/');
+			next = next->next;
+		}
+
+		multi_free_list(head);
 
 		free(options);
 		return rv;

[-- Attachment #3: Type: text/plain, Size: 140 bytes --]

_______________________________________________
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-14 14:35       ` raven
@ 2005-01-14 22:38         ` David Meleedy
  2005-01-15  2:50           ` raven
  2005-01-17 14:01         ` raven
  1 sibling, 1 reply; 37+ messages in thread
From: David Meleedy @ 2005-01-14 22:38 UTC (permalink / raw)
  To: raven; +Cc: autofs mailing list, Mike Waychison


Ian,
	I have installed the multi-over patch into our version
of the automounter 4.1.3-67 (with updated large-program-map patch)
and so far everything looks great!  I am going to test our machines
for a longer period of time and make sure everything looks stable,
but so far, so good!

It seems to have eliminated the "BUG" message in the messages file,
and it seems as though the automounter can unmount /net/aflac
which it was not able to do in the past during a reboot.  I suspect
that this means it will use a lot less ports, and I might not
even need the kernel patch (given the small amount of mounts we actually
use) -- I am testing this as well.

Thanks!

-Dave

________________________________________________________________________
David Meleedy				Analog Devices, Inc.
David.Meleedy@analog.com		Three Technology Way
Phone: 781 461 3494			Norwood, MA  02062-9106  USA

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-14 22:38         ` David Meleedy
@ 2005-01-15  2:50           ` raven
  2005-01-17 14:52             ` Jeff Moyer
  0 siblings, 1 reply; 37+ messages in thread
From: raven @ 2005-01-15  2:50 UTC (permalink / raw)
  To: David Meleedy; +Cc: autofs mailing list, Mike Waychison

On Fri, 14 Jan 2005, David Meleedy wrote:

>
> Ian,
> 	I have installed the multi-over patch into our version
> of the automounter 4.1.3-67 (with updated large-program-map patch)
> and so far everything looks great!  I am going to test our machines
> for a longer period of time and make sure everything looks stable,
> but so far, so good!

That sounds very encouraging. Great!

>
> It seems to have eliminated the "BUG" message in the messages file,
> and it seems as though the automounter can unmount /net/aflac
> which it was not able to do in the past during a reboot.  I suspect
> that this means it will use a lot less ports, and I might not
> even need the kernel patch (given the small amount of mounts we actually
> use) -- I am testing this as well.

The BUG messages were placed there to identify this happening as this 
problem has come up in various forms several times.

In this case it appears to be caused by the order in which the mounts are 
done (ie. received from auto.net). Given that current autofs 
implementation of multi-mounts must handle them as a single unit, nested 
filesystem mounts, made in the wrong order, cause overmounting which 
caused the umount problem.

Perhaps.

Depends on whether the mount program has the patch which probes the 
NFS server. The port usage problem still remains and I expect it will 
continue to cause problems for us one way or another. Hopefully it will be 
addressed in the near future.

Regards
Ian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-14 14:35       ` raven
  2005-01-14 22:38         ` David Meleedy
@ 2005-01-17 14:01         ` raven
  2005-01-17 16:19           ` David Meleedy
  1 sibling, 1 reply; 37+ messages in thread
From: raven @ 2005-01-17 14:01 UTC (permalink / raw)
  To: David Meleedy; +Cc: autofs mailing list, Mike Waychison

On Fri, 14 Jan 2005 raven@themaw.net wrote:

>
> I have attached a patch I'd like you to try please.
>

Hi David,

I've found a small problem with the patch.
Please update your rpms with the extra patch below or get the version I 
have uploaded to the autofs v4 directory on kernel.org.

If you look at the code around the second hunk it should be fairly 
clear how important this could be.

Jeff I've called the patch autofs-4.1.3-multi-over.patch if you would like 
to have a look and perhaps add it to your rpms.

Regards
Ian

--- autofs-4.1.3/modules/parse_sun.c.multi-over-2	2005-01-17 21:50:18.000000000 +0800
+++ autofs-4.1.3/modules/parse_sun.c	2005-01-17 21:51:06.000000000 +0800
@@ -750,7 +750,6 @@
  		do {
  			char *myoptions = strdup(options);
  			char *path, *loc;
-			int pathlen, loclen;

  			if (myoptions == NULL) {
  				error(MODPREFIX "multi strdup: %m");
@@ -813,7 +812,7 @@
  				free(path);
  				free(options);
  				free(myoptions);
-				multi_free_list(head);
+				multi_free_list(list);
  				return 1;
  			}
  		} while (*p == '/');

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-15  2:50           ` raven
@ 2005-01-17 14:52             ` Jeff Moyer
  2005-01-18  1:31               ` Ian Kent
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Moyer @ 2005-01-17 14:52 UTC (permalink / raw)
  To: raven; +Cc: autofs mailing list, Mike Waychison, David Meleedy

==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; raven@themaw.net adds:

raven> On Fri, 14 Jan 2005, David Meleedy wrote:
>> Ian, I have installed the multi-over patch into our version of the
>> automounter 4.1.3-67 (with updated large-program-map patch) and so far
>> everything looks great!  I am going to test our machines for a longer
>> period of time and make sure everything looks stable, but so far, so
>> good!

raven> That sounds very encouraging. Great!

Very encouraging indeed.  Good catch, Ian!

>> It seems to have eliminated the "BUG" message in the messages file, and
>> it seems as though the automounter can unmount /net/aflac which it was
>> not able to do in the past during a reboot.  I suspect that this means
>> it will use a lot less ports, and I might not even need the kernel patch
>> (given the small amount of mounts we actually use) -- I am testing this
>> as well.

raven> The BUG messages were placed there to identify this happening as
raven> this problem has come up in various forms several times.

raven> In this case it appears to be caused by the order in which the
raven> mounts are done (ie. received from auto.net). Given that current
raven> autofs implementation of multi-mounts must handle them as a single
raven> unit, nested filesystem mounts, made in the wrong order, cause
raven> overmounting which caused the umount problem.

raven> Perhaps.

raven> Depends on whether the mount program has the patch which probes the
raven> NFS server. The port usage problem still remains and I expect it
raven> will continue to cause problems for us one way or another. Hopefully
raven> it will be addressed in the near future.

Hmm, I wonder what probing it actually does.  I'll have a look and see if
we can change the probe code to use non-reserved ports.

-Jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-17 14:01         ` raven
@ 2005-01-17 16:19           ` David Meleedy
  2005-01-18  1:33             ` Ian Kent
  0 siblings, 1 reply; 37+ messages in thread
From: David Meleedy @ 2005-01-17 16:19 UTC (permalink / raw)
  To: raven; +Cc: autofs mailing list, Mike Waychison


> Hi David,
> 
> I've found a small problem with the patch.
> Please update your rpms with the extra patch below or get the version I 
> have uploaded to the autofs v4 directory on kernel.org.

The small patch to the patch you sent did not work, however
the updated patch in:

ftp://ftp.kernel.org/pub/linux/daemons/autofs/v4/autofs-4.1.3-multi-over.patch

worked just fine.  I will get some testing done today and let you
know how it worked out.

Thanks again Ian,

-Dave

________________________________________________________________________
David Meleedy				Analog Devices, Inc.
David.Meleedy@analog.com		Three Technology Way
Phone: 781 461 3494			Norwood, MA  02062-9106  USA

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-17 14:52             ` Jeff Moyer
@ 2005-01-18  1:31               ` Ian Kent
  2005-01-18 14:18                 ` Jeff Moyer
  2005-01-18 14:20                 ` Jeff Moyer
  0 siblings, 2 replies; 37+ messages in thread
From: Ian Kent @ 2005-01-18  1:31 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: autofs mailing list, Mike Waychison, David Meleedy

On Mon, 17 Jan 2005, Jeff Moyer wrote:

> ==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; raven@themaw.net adds:
> 
> raven> On Fri, 14 Jan 2005, David Meleedy wrote:
> >> Ian, I have installed the multi-over patch into our version of the
> >> automounter 4.1.3-67 (with updated large-program-map patch) and so far
> >> everything looks great!  I am going to test our machines for a longer
> >> period of time and make sure everything looks stable, but so far, so
> >> good!
> 
> raven> That sounds very encouraging. Great!
> 
> Very encouraging indeed.  Good catch, Ian!
> 
> >> It seems to have eliminated the "BUG" message in the messages file, and
> >> it seems as though the automounter can unmount /net/aflac which it was
> >> not able to do in the past during a reboot.  I suspect that this means
> >> it will use a lot less ports, and I might not even need the kernel patch
> >> (given the small amount of mounts we actually use) -- I am testing this
> >> as well.
> 
> raven> The BUG messages were placed there to identify this happening as
> raven> this problem has come up in various forms several times.
> 
> raven> In this case it appears to be caused by the order in which the
> raven> mounts are done (ie. received from auto.net). Given that current
> raven> autofs implementation of multi-mounts must handle them as a single
> raven> unit, nested filesystem mounts, made in the wrong order, cause
> raven> overmounting which caused the umount problem.
> 
> raven> Perhaps.
> 
> raven> Depends on whether the mount program has the patch which probes the
> raven> NFS server. The port usage problem still remains and I expect it
> raven> will continue to cause problems for us one way or another. Hopefully
> raven> it will be addressed in the near future.
> 
> Hmm, I wonder what probing it actually does.  I'll have a look and see if
> we can change the probe code to use non-reserved ports.

I looked at the code in an FC2 mount and found that it did quite a bit 
of probing.

In itself this is probably a good thing as it's more comprehensive than 
what I do for replicated server mount entries and it may be a precursor to 
providing that functionality in mount. This just means that we need to get 
a handle on the objections to RPC transport multiplexing and get it done.

Using non-priveledged ports has other dependencies. For example, on Debian 
with 2.4.27 mountd rejects connections from non-priveledged ports. I 
didn't spend much time to find out if I could work around it but never the 
less it likely will generate a bit of noise.

Ian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-17 16:19           ` David Meleedy
@ 2005-01-18  1:33             ` Ian Kent
  0 siblings, 0 replies; 37+ messages in thread
From: Ian Kent @ 2005-01-18  1:33 UTC (permalink / raw)
  To: David Meleedy; +Cc: autofs mailing list, Mike Waychison

On Mon, 17 Jan 2005, David Meleedy wrote:

> 
> > Hi David,
> > 
> > I've found a small problem with the patch.
> > Please update your rpms with the extra patch below or get the version I 
> > have uploaded to the autofs v4 directory on kernel.org.
> 
> The small patch to the patch you sent did not work, however
> the updated patch in:
> 
> ftp://ftp.kernel.org/pub/linux/daemons/autofs/v4/autofs-4.1.3-multi-over.patch
> 
> worked just fine.  I will get some testing done today and let you
> know how it worked out.

Oops.

Alls well that ends well!

Ian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-18  1:31               ` Ian Kent
@ 2005-01-18 14:18                 ` Jeff Moyer
  2005-01-18 17:00                   ` Ian Kent
  2005-01-18 14:20                 ` Jeff Moyer
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Moyer @ 2005-01-18 14:18 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs mailing list, Mike Waychison, David Meleedy

==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; Ian Kent <raven@themaw.net> adds:

raven> On Mon, 17 Jan 2005, Jeff Moyer wrote:
>> ==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3]
>> = port usage problems; raven@themaw.net adds:
>> 
raven> On Fri, 14 Jan 2005, David Meleedy wrote:
>> >> Ian, I have installed the multi-over patch into our version of the >>
>> automounter 4.1.3-67 (with updated large-program-map patch) and so far
>> >> everything looks great!  I am going to test our machines for a longer
>> >> period of time and make sure everything looks stable, but so far, so
>> >> good!
>> 
raven> That sounds very encouraging. Great!
>> Very encouraging indeed.  Good catch, Ian!
>> 
>> >> It seems to have eliminated the "BUG" message in the messages file,
>> and >> it seems as though the automounter can unmount /net/aflac which
>> it was >> not able to do in the past during a reboot.  I suspect that
>> this means >> it will use a lot less ports, and I might not even need
>> the kernel patch >> (given the small amount of mounts we actually use)
>> -- I am testing this >> as well.
>> 
raven> The BUG messages were placed there to identify this happening as
raven> this problem has come up in various forms several times.
>>
raven> In this case it appears to be caused by the order in which the
raven> mounts are done (ie. received from auto.net). Given that current
raven> autofs implementation of multi-mounts must handle them as a single
raven> unit, nested filesystem mounts, made in the wrong order, cause
raven> overmounting which caused the umount problem.
>>
raven> Perhaps.
>>
raven> Depends on whether the mount program has the patch which probes the
raven> NFS server. The port usage problem still remains and I expect it
raven> will continue to cause problems for us one way or another. Hopefully
raven> it will be addressed in the near future.
>> Hmm, I wonder what probing it actually does.  I'll have a look and see
>> if we can change the probe code to use non-reserved ports.

raven> I looked at the code in an FC2 mount and found that it did quite a
raven> bit of probing.

raven> In itself this is probably a good thing as it's more comprehensive
raven> than what I do for replicated server mount entries and it may be a
raven> precursor to providing that functionality in mount. This just means
raven> that we need to get a handle on the objections to RPC transport
raven> multiplexing and get it done.

Umm, you want mount to support replicated servers?  Interesting idea, but
I'm not sure I like it.

raven> Using non-priveledged ports has other dependencies. For example, on
raven> Debian with 2.4.27 mountd rejects connections from non-priveledged
raven> ports. I didn't spend much time to find out if I could work around
raven> it but never the less it likely will generate a bit of noise.

Right.  But you can still try to connect using non-priveledged ports, and
fallback to the current code path if that fails.

-Jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-18  1:31               ` Ian Kent
  2005-01-18 14:18                 ` Jeff Moyer
@ 2005-01-18 14:20                 ` Jeff Moyer
  2005-01-18 17:04                   ` Ian Kent
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Moyer @ 2005-01-18 14:20 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs mailing list, Mike Waychison, David Meleedy

==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; Ian Kent <raven@themaw.net> adds:

raven> On Mon, 17 Jan 2005, Jeff Moyer wrote:
>> ==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3]
>> = port usage problems; raven@themaw.net adds:
>> 
raven> On Fri, 14 Jan 2005, David Meleedy wrote:
>> >> Ian, I have installed the multi-over patch into our version of the >>
>> automounter 4.1.3-67 (with updated large-program-map patch) and so far
>> >> everything looks great!  I am going to test our machines for a longer
>> >> period of time and make sure everything looks stable, but so far, so
>> >> good!
>> 
raven> That sounds very encouraging. Great!
>> Very encouraging indeed.  Good catch, Ian!
>> 
>> >> It seems to have eliminated the "BUG" message in the messages file,
>> and >> it seems as though the automounter can unmount /net/aflac which
>> it was >> not able to do in the past during a reboot.  I suspect that
>> this means >> it will use a lot less ports, and I might not even need
>> the kernel patch >> (given the small amount of mounts we actually use)
>> -- I am testing this >> as well.
>> 
raven> The BUG messages were placed there to identify this happening as
raven> this problem has come up in various forms several times.
>>
raven> In this case it appears to be caused by the order in which the
raven> mounts are done (ie. received from auto.net). Given that current
raven> autofs implementation of multi-mounts must handle them as a single
raven> unit, nested filesystem mounts, made in the wrong order, cause
raven> overmounting which caused the umount problem.
>>
raven> Perhaps.
>>
raven> Depends on whether the mount program has the patch which probes the
raven> NFS server. The port usage problem still remains and I expect it
raven> will continue to cause problems for us one way or another. Hopefully
raven> it will be addressed in the near future.
>> Hmm, I wonder what probing it actually does.  I'll have a look and see
>> if we can change the probe code to use non-reserved ports.

raven> I looked at the code in an FC2 mount and found that it did quite a
raven> bit of probing.

[snip]

raven> This just means that we need to get a handle on the objections to
raven> RPC transport multiplexing and get it done.

I forwarded Mike's patch off to Steve Dickson.  However, with the caveats
he listed, I'm not optimistic.

-Jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-18 14:18                 ` Jeff Moyer
@ 2005-01-18 17:00                   ` Ian Kent
  2005-01-18 17:05                     ` Jeff Moyer
  0 siblings, 1 reply; 37+ messages in thread
From: Ian Kent @ 2005-01-18 17:00 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: autofs mailing list, Mike Waychison, David Meleedy

On Tue, 18 Jan 2005, Jeff Moyer wrote:

> ==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; Ian Kent <raven@themaw.net> adds:
> 
> raven> On Mon, 17 Jan 2005, Jeff Moyer wrote:
> >> ==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3]
> >> = port usage problems; raven@themaw.net adds:
> >> 
> raven> On Fri, 14 Jan 2005, David Meleedy wrote:
> >> >> Ian, I have installed the multi-over patch into our version of the >>
> >> automounter 4.1.3-67 (with updated large-program-map patch) and so far
> >> >> everything looks great!  I am going to test our machines for a longer
> >> >> period of time and make sure everything looks stable, but so far, so
> >> >> good!
> >> 
> raven> That sounds very encouraging. Great!
> >> Very encouraging indeed.  Good catch, Ian!
> >> 
> >> >> It seems to have eliminated the "BUG" message in the messages file,
> >> and >> it seems as though the automounter can unmount /net/aflac which
> >> it was >> not able to do in the past during a reboot.  I suspect that
> >> this means >> it will use a lot less ports, and I might not even need
> >> the kernel patch >> (given the small amount of mounts we actually use)
> >> -- I am testing this >> as well.
> >> 
> raven> The BUG messages were placed there to identify this happening as
> raven> this problem has come up in various forms several times.
> >>
> raven> In this case it appears to be caused by the order in which the
> raven> mounts are done (ie. received from auto.net). Given that current
> raven> autofs implementation of multi-mounts must handle them as a single
> raven> unit, nested filesystem mounts, made in the wrong order, cause
> raven> overmounting which caused the umount problem.
> >>
> raven> Perhaps.
> >>
> raven> Depends on whether the mount program has the patch which probes the
> raven> NFS server. The port usage problem still remains and I expect it
> raven> will continue to cause problems for us one way or another. Hopefully
> raven> it will be addressed in the near future.
> >> Hmm, I wonder what probing it actually does.  I'll have a look and see
> >> if we can change the probe code to use non-reserved ports.
> 
> raven> I looked at the code in an FC2 mount and found that it did quite a
> raven> bit of probing.
> 
> raven> In itself this is probably a good thing as it's more comprehensive
> raven> than what I do for replicated server mount entries and it may be a
> raven> precursor to providing that functionality in mount. This just means
> raven> that we need to get a handle on the objections to RPC transport
> raven> multiplexing and get it done.
> 
> Umm, you want mount to support replicated servers?  Interesting idea, but
> I'm not sure I like it.
> 
> raven> Using non-priveledged ports has other dependencies. For example, on
> raven> Debian with 2.4.27 mountd rejects connections from non-priveledged
> raven> ports. I didn't spend much time to find out if I could work around
> raven> it but never the less it likely will generate a bit of noise.
> 
> Right.  But you can still try to connect using non-priveledged ports, and
> fallback to the current code path if that fails.

But mount doesn't work (in this case) when the kernel on the server end 
doesn't support non-priveledged ports but the client does.

Ian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-18 14:20                 ` Jeff Moyer
@ 2005-01-18 17:04                   ` Ian Kent
  2005-01-18 17:07                     ` Jeff Moyer
  2005-01-18 17:32                     ` Mike Waychison
  0 siblings, 2 replies; 37+ messages in thread
From: Ian Kent @ 2005-01-18 17:04 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: autofs mailing list, Mike Waychison, David Meleedy

On Tue, 18 Jan 2005, Jeff Moyer wrote:

> ==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; Ian Kent <raven@themaw.net> adds:
> 
> raven> On Mon, 17 Jan 2005, Jeff Moyer wrote:
> >> ==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3]
> >> = port usage problems; raven@themaw.net adds:
> >> 
> raven> On Fri, 14 Jan 2005, David Meleedy wrote:
> >> >> Ian, I have installed the multi-over patch into our version of the >>
> >> automounter 4.1.3-67 (with updated large-program-map patch) and so far
> >> >> everything looks great!  I am going to test our machines for a longer
> >> >> period of time and make sure everything looks stable, but so far, so
> >> >> good!
> >> 
> raven> That sounds very encouraging. Great!
> >> Very encouraging indeed.  Good catch, Ian!
> >> 
> >> >> It seems to have eliminated the "BUG" message in the messages file,
> >> and >> it seems as though the automounter can unmount /net/aflac which
> >> it was >> not able to do in the past during a reboot.  I suspect that
> >> this means >> it will use a lot less ports, and I might not even need
> >> the kernel patch >> (given the small amount of mounts we actually use)
> >> -- I am testing this >> as well.
> >> 
> raven> The BUG messages were placed there to identify this happening as
> raven> this problem has come up in various forms several times.
> >>
> raven> In this case it appears to be caused by the order in which the
> raven> mounts are done (ie. received from auto.net). Given that current
> raven> autofs implementation of multi-mounts must handle them as a single
> raven> unit, nested filesystem mounts, made in the wrong order, cause
> raven> overmounting which caused the umount problem.
> >>
> raven> Perhaps.
> >>
> raven> Depends on whether the mount program has the patch which probes the
> raven> NFS server. The port usage problem still remains and I expect it
> raven> will continue to cause problems for us one way or another. Hopefully
> raven> it will be addressed in the near future.
> >> Hmm, I wonder what probing it actually does.  I'll have a look and see
> >> if we can change the probe code to use non-reserved ports.
> 
> raven> I looked at the code in an FC2 mount and found that it did quite a
> raven> bit of probing.
> 
> [snip]
> 
> raven> This just means that we need to get a handle on the objections to
> raven> RPC transport multiplexing and get it done.
> 
> I forwarded Mike's patch off to Steve Dickson.  However, with the caveats
> he listed, I'm not optimistic.

Neither am I. The patch that Trond originally did is probably a better 
starting point.

There's quite a bit to do meantime such as, general testing, scalability 
testing, dynamically allocating a new transport if a request slot can't be 
allocated and so on.

There will be quite a bit more discussion on this I expect.

What is Steves responsibility here?

Ian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-18 17:00                   ` Ian Kent
@ 2005-01-18 17:05                     ` Jeff Moyer
  2005-01-19  1:25                       ` Ian Kent
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Moyer @ 2005-01-18 17:05 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs mailing list, Mike Waychison, David Meleedy

==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; Ian Kent <raven@themaw.net> adds:

raven> On Tue, 18 Jan 2005, Jeff Moyer wrote:
>> ==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3]
>> = port usage problems; Ian Kent <raven@themaw.net> adds:
>> 
raven> On Mon, 17 Jan 2005, Jeff Moyer wrote:
>> >> ==> Regarding Re: [autofs] BUG: autofs4 + cd
>> /net/<Netapp>/vol/vol[0-3] >> = port usage problems; raven@themaw.net
>> adds:
>> >> 
raven> On Fri, 14 Jan 2005, David Meleedy wrote:
>> >> >> Ian, I have installed the multi-over patch into our version of the
>> >> >> automounter 4.1.3-67 (with updated large-program-map patch) and so
>> far >> >> everything looks great!  I am going to test our machines for a
>> longer >> >> period of time and make sure everything looks stable, but
>> so far, so >> >> good!
>> >> 
raven> That sounds very encouraging. Great!
>> >> Very encouraging indeed.  Good catch, Ian!
>> >> 
>> >> >> It seems to have eliminated the "BUG" message in the messages
>> file, >> and >> it seems as though the automounter can unmount
>> /net/aflac which >> it was >> not able to do in the past during a
>> reboot.  I suspect that >> this means >> it will use a lot less ports,
>> and I might not even need >> the kernel patch >> (given the small amount
>> of mounts we actually use) >> -- I am testing this >> as well.
>> >> 
raven> The BUG messages were placed there to identify this happening as
raven> this problem has come up in various forms several times.
>> >>
raven> In this case it appears to be caused by the order in which the
raven> mounts are done (ie. received from auto.net). Given that current
raven> autofs implementation of multi-mounts must handle them as a single
raven> unit, nested filesystem mounts, made in the wrong order, cause
raven> overmounting which caused the umount problem.
>> >>
raven> Perhaps.
>> >>
raven> Depends on whether the mount program has the patch which probes the
raven> NFS server. The port usage problem still remains and I expect it
raven> will continue to cause problems for us one way or another. Hopefully
raven> it will be addressed in the near future.
>> >> Hmm, I wonder what probing it actually does.  I'll have a look and
>> see >> if we can change the probe code to use non-reserved ports.
>> 
raven> I looked at the code in an FC2 mount and found that it did quite a
raven> bit of probing.
>>
raven> In itself this is probably a good thing as it's more comprehensive
raven> than what I do for replicated server mount entries and it may be a
raven> precursor to providing that functionality in mount. This just means
raven> that we need to get a handle on the objections to RPC transport
raven> multiplexing and get it done.
>> Umm, you want mount to support replicated servers?  Interesting idea,
>> but I'm not sure I like it.
>> 
raven> Using non-priveledged ports has other dependencies. For example, on
raven> Debian with 2.4.27 mountd rejects connections from non-priveledged
raven> ports. I didn't spend much time to find out if I could work around
raven> it but never the less it likely will generate a bit of noise.
>> Right.  But you can still try to connect using non-priveledged ports,
>> and fallback to the current code path if that fails.

raven> But mount doesn't work (in this case) when the kernel on the server
raven> end doesn't support non-priveledged ports but the client does.

Seems we have a disconnect.  I thought you were only talking about the
probe code.  I don't know how to work around this, either, but I think
mount is supposed to try reserved ports first?

-Jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-18 17:04                   ` Ian Kent
@ 2005-01-18 17:07                     ` Jeff Moyer
  2005-01-18 17:32                     ` Mike Waychison
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff Moyer @ 2005-01-18 17:07 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs mailing list, Mike Waychison, David Meleedy

==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; Ian Kent <raven@themaw.net> adds:

raven> On Tue, 18 Jan 2005, Jeff Moyer wrote:
>> ==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3]
>> = port usage problems; Ian Kent <raven@themaw.net> adds:
>> 
raven> On Mon, 17 Jan 2005, Jeff Moyer wrote:
>> >> ==> Regarding Re: [autofs] BUG: autofs4 + cd
>> /net/<Netapp>/vol/vol[0-3] >> = port usage problems; raven@themaw.net
>> adds:
>> >> 
raven> On Fri, 14 Jan 2005, David Meleedy wrote:
>> >> >> Ian, I have installed the multi-over patch into our version of the
>> >> >> automounter 4.1.3-67 (with updated large-program-map patch) and so
>> far >> >> everything looks great!  I am going to test our machines for a
>> longer >> >> period of time and make sure everything looks stable, but
>> so far, so >> >> good!
>> >> 
raven> That sounds very encouraging. Great!
>> >> Very encouraging indeed.  Good catch, Ian!
>> >> 
>> >> >> It seems to have eliminated the "BUG" message in the messages
>> file, >> and >> it seems as though the automounter can unmount
>> /net/aflac which >> it was >> not able to do in the past during a
>> reboot.  I suspect that >> this means >> it will use a lot less ports,
>> and I might not even need >> the kernel patch >> (given the small amount
>> of mounts we actually use) >> -- I am testing this >> as well.
>> >> 
raven> The BUG messages were placed there to identify this happening as
raven> this problem has come up in various forms several times.
>> >>
raven> In this case it appears to be caused by the order in which the
raven> mounts are done (ie. received from auto.net). Given that current
raven> autofs implementation of multi-mounts must handle them as a single
raven> unit, nested filesystem mounts, made in the wrong order, cause
raven> overmounting which caused the umount problem.
>> >>
raven> Perhaps.
>> >>
raven> Depends on whether the mount program has the patch which probes the
raven> NFS server. The port usage problem still remains and I expect it
raven> will continue to cause problems for us one way or another. Hopefully
raven> it will be addressed in the near future.
>> >> Hmm, I wonder what probing it actually does.  I'll have a look and
>> see >> if we can change the probe code to use non-reserved ports.
>> 
raven> I looked at the code in an FC2 mount and found that it did quite a
raven> bit of probing.
>> [snip]
>> 
raven> This just means that we need to get a handle on the objections to
raven> RPC transport multiplexing and get it done.
>> I forwarded Mike's patch off to Steve Dickson.  However, with the
>> caveats he listed, I'm not optimistic.

raven> Neither am I. The patch that Trond originally did is probably a
raven> better starting point.

raven> There's quite a bit to do meantime such as, general testing,
raven> scalability testing, dynamically allocating a new transport if a
raven> request slot can't be allocated and so on.

raven> There will be quite a bit more discussion on this I expect.

raven> What is Steves responsibility here?

Steve is our NFS maintainer.

-Jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-01-18 17:04                   ` Ian Kent
  2005-01-18 17:07                     ` Jeff Moyer
@ 2005-01-18 17:32                     ` Mike Waychison
  2005-01-19  4:21                       ` Ian Kent
  1 sibling, 1 reply; 37+ messages in thread
From: Mike Waychison @ 2005-01-18 17:32 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs mailing list, David Meleedy

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Kent wrote:
> On Tue, 18 Jan 2005, Jeff Moyer wrote:
> 
> 
>>==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; Ian Kent <raven@themaw.net> adds:
>>
>>raven> On Mon, 17 Jan 2005, Jeff Moyer wrote:
>>
>>>>==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3]
>>>>= port usage problems; raven@themaw.net adds:
>>>>
>>
>>raven> On Fri, 14 Jan 2005, David Meleedy wrote:
>>
>>>>>>Ian, I have installed the multi-over patch into our version of the >>
>>>>
>>>>automounter 4.1.3-67 (with updated large-program-map patch) and so far
>>>>
>>>>>>everything looks great!  I am going to test our machines for a longer
>>>>>>period of time and make sure everything looks stable, but so far, so
>>>>>>good!
>>>>
>>raven> That sounds very encouraging. Great!
>>
>>>>Very encouraging indeed.  Good catch, Ian!
>>>>
>>>>
>>>>>>It seems to have eliminated the "BUG" message in the messages file,
>>>>
>>>>and >> it seems as though the automounter can unmount /net/aflac which
>>>>it was >> not able to do in the past during a reboot.  I suspect that
>>>>this means >> it will use a lot less ports, and I might not even need
>>>>the kernel patch >> (given the small amount of mounts we actually use)
>>>>-- I am testing this >> as well.
>>>>
>>
>>raven> The BUG messages were placed there to identify this happening as
>>raven> this problem has come up in various forms several times.
>>
>>raven> In this case it appears to be caused by the order in which the
>>raven> mounts are done (ie. received from auto.net). Given that current
>>raven> autofs implementation of multi-mounts must handle them as a single
>>raven> unit, nested filesystem mounts, made in the wrong order, cause
>>raven> overmounting which caused the umount problem.
>>
>>raven> Perhaps.
>>
>>raven> Depends on whether the mount program has the patch which probes the
>>raven> NFS server. The port usage problem still remains and I expect it
>>raven> will continue to cause problems for us one way or another. Hopefully
>>raven> it will be addressed in the near future.
>>
>>>>Hmm, I wonder what probing it actually does.  I'll have a look and see
>>>>if we can change the probe code to use non-reserved ports.
>>
>>raven> I looked at the code in an FC2 mount and found that it did quite a
>>raven> bit of probing.
>>
>>[snip]
>>
>>raven> This just means that we need to get a handle on the objections to
>>raven> RPC transport multiplexing and get it done.
>>
>>I forwarded Mike's patch off to Steve Dickson.  However, with the caveats
>>he listed, I'm not optimistic.
> 
> 
> Neither am I. The patch that Trond originally did is probably a better 
> starting point.

Which patches are we referring to by chance?  I'm guessing the xprt
stuff?   I don't know of Trond's patches that do the same.

> 
> There's quite a bit to do meantime such as, general testing, scalability 
> testing, dynamically allocating a new transport if a request slot can't be 
> allocated and so on.
> 
> There will be quite a bit more discussion on this I expect.
> 
> What is Steves responsibility here?
> 
> Ian
> 
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs


- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB7UgtdQs4kOxk3/MRAjaqAJwIYgp0GjlAH2X9Jl/wCs885AIRVgCfYBqI
0nJ1cZt77/sebi1MjVlV7pk=
=n93V
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems
  2005-01-18 17:05                     ` Jeff Moyer
@ 2005-01-19  1:25                       ` Ian Kent
  0 siblings, 0 replies; 37+ messages in thread
From: Ian Kent @ 2005-01-19  1:25 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: autofs mailing list, Mike Waychison, David Meleedy

On Tue, 18 Jan 2005, Jeff Moyer wrote:

> ==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; Ian Kent <raven@themaw.net> adds:
> 
> raven> On Tue, 18 Jan 2005, Jeff Moyer wrote:
> >> ==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3]
> >> = port usage problems; Ian Kent <raven@themaw.net> adds:
> >> 
> raven> On Mon, 17 Jan 2005, Jeff Moyer wrote:
> >> >> ==> Regarding Re: [autofs] BUG: autofs4 + cd
> >> /net/<Netapp>/vol/vol[0-3] >> = port usage problems; raven@themaw.net
> >> adds:
> >> >> 
> raven> On Fri, 14 Jan 2005, David Meleedy wrote:
> >> >> >> Ian, I have installed the multi-over patch into our version of the
> >> >> >> automounter 4.1.3-67 (with updated large-program-map patch) and so
> >> far >> >> everything looks great!  I am going to test our machines for a
> >> longer >> >> period of time and make sure everything looks stable, but
> >> so far, so >> >> good!
> >> >> 
> raven> That sounds very encouraging. Great!
> >> >> Very encouraging indeed.  Good catch, Ian!
> >> >> 
> >> >> >> It seems to have eliminated the "BUG" message in the messages
> >> file, >> and >> it seems as though the automounter can unmount
> >> /net/aflac which >> it was >> not able to do in the past during a
> >> reboot.  I suspect that >> this means >> it will use a lot less ports,
> >> and I might not even need >> the kernel patch >> (given the small amount
> >> of mounts we actually use) >> -- I am testing this >> as well.
> >> >> 
> raven> The BUG messages were placed there to identify this happening as
> raven> this problem has come up in various forms several times.
> >> >>
> raven> In this case it appears to be caused by the order in which the
> raven> mounts are done (ie. received from auto.net). Given that current
> raven> autofs implementation of multi-mounts must handle them as a single
> raven> unit, nested filesystem mounts, made in the wrong order, cause
> raven> overmounting which caused the umount problem.
> >> >>
> raven> Perhaps.
> >> >>
> raven> Depends on whether the mount program has the patch which probes the
> raven> NFS server. The port usage problem still remains and I expect it
> raven> will continue to cause problems for us one way or another. Hopefully
> raven> it will be addressed in the near future.
> >> >> Hmm, I wonder what probing it actually does.  I'll have a look and
> >> see >> if we can change the probe code to use non-reserved ports.
> >> 
> raven> I looked at the code in an FC2 mount and found that it did quite a
> raven> bit of probing.
> >>
> raven> In itself this is probably a good thing as it's more comprehensive
> raven> than what I do for replicated server mount entries and it may be a
> raven> precursor to providing that functionality in mount. This just means
> raven> that we need to get a handle on the objections to RPC transport
> raven> multiplexing and get it done.
> >> Umm, you want mount to support replicated servers?  Interesting idea,
> >> but I'm not sure I like it.
> >> 
> raven> Using non-priveledged ports has other dependencies. For example, on
> raven> Debian with 2.4.27 mountd rejects connections from non-priveledged
> raven> ports. I didn't spend much time to find out if I could work around
> raven> it but never the less it likely will generate a bit of noise.
> >> Right.  But you can still try to connect using non-priveledged ports,
> >> and fallback to the current code path if that fails.
> 
> raven> But mount doesn't work (in this case) when the kernel on the server
> raven> end doesn't support non-priveledged ports but the client does.
> 
> Seems we have a disconnect.  I thought you were only talking about the
> probe code.  I don't know how to work around this, either, but I think
> mount is supposed to try reserved ports first?

Yea, but I'm talking about when there are lots of mounts.
It may be purely a mount issue I haven't dug deep enough.

Ian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-01-18 17:32                     ` Mike Waychison
@ 2005-01-19  4:21                       ` Ian Kent
  2005-01-19  5:00                         ` Re: [autofs] " Trond Myklebust
  0 siblings, 1 reply; 37+ messages in thread
From: Ian Kent @ 2005-01-19  4:21 UTC (permalink / raw)
  To: Mike Waychison; +Cc: autofs mailing list, nfs, David Meleedy

On Tue, 18 Jan 2005, Mike Waychison wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Ian Kent wrote:
> > On Tue, 18 Jan 2005, Jeff Moyer wrote:
> > 
> > 
> >>==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port  usage problems; Ian Kent <raven@themaw.net> adds:
> >>
> >>raven> On Mon, 17 Jan 2005, Jeff Moyer wrote:
> >>
> >>>>==> Regarding Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3]
> >>>>= port usage problems; raven@themaw.net adds:
> >>>>
> >>
> >>raven> On Fri, 14 Jan 2005, David Meleedy wrote:
> >>
> >>>>>>Ian, I have installed the multi-over patch into our version of the >>
> >>>>
> >>>>automounter 4.1.3-67 (with updated large-program-map patch) and so far
> >>>>
> >>>>>>everything looks great!  I am going to test our machines for a longer
> >>>>>>period of time and make sure everything looks stable, but so far, so
> >>>>>>good!
> >>>>
> >>raven> That sounds very encouraging. Great!
> >>
> >>>>Very encouraging indeed.  Good catch, Ian!
> >>>>
> >>>>
> >>>>>>It seems to have eliminated the "BUG" message in the messages file,
> >>>>
> >>>>and >> it seems as though the automounter can unmount /net/aflac which
> >>>>it was >> not able to do in the past during a reboot.  I suspect that
> >>>>this means >> it will use a lot less ports, and I might not even need
> >>>>the kernel patch >> (given the small amount of mounts we actually use)
> >>>>-- I am testing this >> as well.
> >>>>
> >>
> >>raven> The BUG messages were placed there to identify this happening as
> >>raven> this problem has come up in various forms several times.
> >>
> >>raven> In this case it appears to be caused by the order in which the
> >>raven> mounts are done (ie. received from auto.net). Given that current
> >>raven> autofs implementation of multi-mounts must handle them as a single
> >>raven> unit, nested filesystem mounts, made in the wrong order, cause
> >>raven> overmounting which caused the umount problem.
> >>
> >>raven> Perhaps.
> >>
> >>raven> Depends on whether the mount program has the patch which probes the
> >>raven> NFS server. The port usage problem still remains and I expect it
> >>raven> will continue to cause problems for us one way or another. Hopefully
> >>raven> it will be addressed in the near future.
> >>
> >>>>Hmm, I wonder what probing it actually does.  I'll have a look and see
> >>>>if we can change the probe code to use non-reserved ports.
> >>
> >>raven> I looked at the code in an FC2 mount and found that it did quite a
> >>raven> bit of probing.
> >>
> >>[snip]
> >>
> >>raven> This just means that we need to get a handle on the objections to
> >>raven> RPC transport multiplexing and get it done.
> >>
> >>I forwarded Mike's patch off to Steve Dickson.  However, with the caveats
> >>he listed, I'm not optimistic.
> > 
> > 
> > Neither am I. The patch that Trond originally did is probably a better 
> > starting point.
> 
> Which patches are we referring to by chance?  I'm guessing the xprt
> stuff?   I don't know of Trond's patches that do the same.

Trond never finished them. He probably got too much flack about 
scalability and request slot exhastion and gave up on it.

He reffered to them in a previous discussion on the same topic and said 
that if anyone wanted to port them to a current kernel and finish them 
they were welcome.

So I did that at the time for vanila kernels (2.4.22 amd 2.6.0) but had no 
confidence in my work as my understanding of the RPC subsystem is fairly 
poor and was worse at the time. I asked Trond to check them but he clearly 
didn't have time.

If you wish to look for youself they can be found at
http://www.kernel.org/pub/kernel/people/raven/nfs

The patch takes account of other kernel RPC users, lockd and portmap.

Basically it moves the transport allocation into the transmit/receive FSM 
using a get/put mechanism, requiring only that kernel RPC users allocate 
the client struct. It looks to me like this approach would lend itself to 
dynamic allocation of additional transports upon request slot exhaustion.

I spent some time checking this last weekend and it looks like porting 
them to current kernels is not too hard but will be tedious.

Mind you I've transcribed these from Tronds' original patch and there 
could be errors in that work so they will need to be sanity checked 
anyway.

I'm keen to spend some more time on this, so perhaps between this and what 
you have already done we can get something that will make it into the RPC 
subsystem. Stranger things have happened.

> 
> > 
> > There's quite a bit to do meantime such as, general testing, scalability 
> > testing, dynamically allocating a new transport if a request slot can't be 
> > allocated and so on.
> > 
> > There will be quite a bit more discussion on this I expect.
> > 
> > What is Steves responsibility here?
> > 
> > Ian
> > 
> > _______________________________________________
> > autofs mailing list
> > autofs@linux.kernel.org
> > http://linux.kernel.org/mailman/listinfo/autofs
> 
> 
> - --
> Mike Waychison
> Sun Microsystems, Inc.
> 1 (650) 352-5299 voice
> 1 (416) 202-8336 voice
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NOTICE:  The opinions expressed in this email are held by me,
> and may not represent the views of Sun Microsystems, Inc.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
> 
> iD8DBQFB7UgtdQs4kOxk3/MRAjaqAJwIYgp0GjlAH2X9Jl/wCs885AIRVgCfYBqI
> 0nJ1cZt77/sebi1MjVlV7pk=
> =n93V
> -----END PGP SIGNATURE-----
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Re: [autofs] BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-01-19  4:21                       ` Ian Kent
@ 2005-01-19  5:00                         ` Trond Myklebust
  0 siblings, 0 replies; 37+ messages in thread
From: Trond Myklebust @ 2005-01-19  5:00 UTC (permalink / raw)
  To: Ian Kent
  Cc: Mike Waychison, Jeff Moyer, autofs mailing list, David Meleedy,
	nfs

on den 19.01.2005 Klokka 12:21 (+0800) skreiv Ian Kent:

> Trond never finished them. He probably got too much flack about 
> scalability and request slot exhastion and gave up on it.
> 
> He reffered to them in a previous discussion on the same topic and said 
> that if anyone wanted to port them to a current kernel and finish them 
> they were welcome.
> 
> So I did that at the time for vanila kernels (2.4.22 amd 2.6.0) but had no 
> confidence in my work as my understanding of the RPC subsystem is fairly 
> poor and was worse at the time. I asked Trond to check them but he clearly 
> didn't have time.

No, I haven't given up on those changes, but I've postponed merging them
until after Chuck finishes testing his "transport switch". The latter is
the abstraction layer that is expected to take us beyond the single
udp/tcp socket paradigm so that we can add IPv6, infiniband, and
multipathing.
IOW it touches fairly heavily on the code in xprt.c.

btw, we're aiming to have that code ready for testing at Connectathon,
so if any of you are going to be attending, we can perhaps discuss this
subject there?

Cheers,
  Trond
-- 
Trond Myklebust <trond.myklebust@fys.uio.no>



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-01-11 21:22 BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems David Meleedy
                   ` (2 preceding siblings ...)
  2005-01-12 16:13 ` Dwight Marzolf
@ 2005-08-25 22:14 ` Rob Sims
  2005-08-26  3:44   ` Ian Kent
  3 siblings, 1 reply; 37+ messages in thread
From: Rob Sims @ 2005-08-25 22:14 UTC (permalink / raw)
  To: autofs

> THE PROBLEM DESCRIPTION:

> Autofs hangs and refuses to mount any directories for a period of time
> after cd'ing to /net/<Netapp>/vol/vol[0-3] and waiting a while.
> The only way to clear this is to reboot the client.

I didn't see a resolution to this in the archive - was it resolved?
How?

Thanks,
Rob

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-08-25 22:14 ` Rob Sims
@ 2005-08-26  3:44   ` Ian Kent
  2005-08-26 16:14     ` Rob Sims
  0 siblings, 1 reply; 37+ messages in thread
From: Ian Kent @ 2005-08-26  3:44 UTC (permalink / raw)
  To: Rob Sims; +Cc: autofs

On Thu, 25 Aug 2005, Rob Sims wrote:

> > THE PROBLEM DESCRIPTION:
> 
> > Autofs hangs and refuses to mount any directories for a period of time
> > after cd'ing to /net/<Netapp>/vol/vol[0-3] and waiting a while.
> > The only way to clear this is to reboot the client.
> 
> I didn't see a resolution to this in the archive - was it resolved?
> How?

This has come up several times and as far as I know we can resolve or 
work around these problems.

If you have a problem then we need to establish versions and symptoms to 
know what needs to be done to resolve it.

Ian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-08-26  3:44   ` Ian Kent
@ 2005-08-26 16:14     ` Rob Sims
  2005-08-27  3:34       ` Ian Kent
  0 siblings, 1 reply; 37+ messages in thread
From: Rob Sims @ 2005-08-26 16:14 UTC (permalink / raw)
  To: autofs

On Fri, Aug 26, 2005 at 11:44:21AM +0800, Ian Kent wrote:
> On Thu, 25 Aug 2005, Rob Sims wrote:
 
> > > THE PROBLEM DESCRIPTION:

> > > Autofs hangs and refuses to mount any directories for a period of time
> > > after cd'ing to /net/<Netapp>/vol/vol[0-3] and waiting a while.
> > > The only way to clear this is to reboot the client.

> > I didn't see a resolution to this in the archive - was it resolved?
> > How?
 
> This has come up several times and as far as I know we can resolve or 
> work around these problems.
 
> If you have a problem then we need to establish versions and symptoms to 
> know what needs to be done to resolve it.
 
What we're seeing is very similar to the original poster's description.
Netapp, 2.4 kernel, hierarchical mounts like:
vol0
vol0/a
vol0/b
vol1
vol1/c
etc.

The problem seems to have disappeared when we dropped the number of
exports to under 32.  The log indicates that vol2 couldn't be unmounted
because it was busy.  My guess is that 33+ unmount requests were issued
for the children, at least one failed due to lack of resources, and then
the parent unmount failed because of the failed child unmount.  The
parent directory is then mounted mupltiple times (one extra per
expiration?)


Relevant log messages:
Aug 25 06:27:35 mvweb automount[18208]: running expiration on path /net/goodserver
Aug 25 06:27:36 mvweb automount[18208]: expired /net/goodserver
Aug 25 06:27:36 mvweb automount[18218]: running expiration on path /net/netapp
Aug 25 06:27:36 mvweb automount[18218]: >> umount: /net/netapp/vol/vol2: device is busy
Aug 25 06:27:52 mvweb last message repeated 10919 times
Aug 25 06:27:52 mvweb automount[18218]: lookup(program): looking up netapp
Aug 25 06:27:52 mvweb automount[18218]: lookup(program): netapp -> -fstype=nfs,hard,intr,nodev,nosuid,nonstrict,rsize=8192,wsize=8192,async  ^I/vol/vol0 netapp:/vol/vol0  ^I/vol/vol0/mounta netapp:/vol/vol0/mounta  ^I/vol/vol0/restricted netapp:/vol/vol0/restricted  ^I/vol/vol0/mountb netapp:/vol/vol0/mountb  ^I/vol/vol0/mountc netapp:/vol/vol0/mountc  ^I/vol/vol0/mountd netapp:/vol/vol0/mountd  ^I/vol/vol1 netapp:/vol/vol1  ^I/vol/vol1/backb netapp:/vol/vol1/backb  ^I/vol/vol1/mounte netapp:/vol/vol1/mounte  ^I/vol/vol1/mountf netapp:/vol/vol1/mountf  ^I/vol/vol1/mountg netapp:/vol/vol1/mountg  ^I/vol/vol1/mounth netapp:/vol/vol1/mounth  ^I/vol/vol1/mounti netapp:/vol/vol1/mounti  ^I/vol/vol1/mountj netapp:/vol/vol1/mountj  ^I/vol/vol1/mountk netapp:/vol/vol1/mountk  ^I/vol/vol1/mountl netapp:/vol/vol1/mountl  ^I/vol/vol1/mountm netapp:/vol/vol1/mountm  ^I/vol/vol1/mountn netapp:/vol/vo
 l1/mountn  ^I/vol/vol2 netapp:/vol/vol2  ^I/vol/vol2/mounto$ netapp:/vol/vol2/mounto$  ^I!
 /vol/
Aug 25 06:27:52 mvweb automount[18218]: parse(sun): expanded entry: -fstype=nfs,hard,intr,nodev,nosuid,nonstrict,rsize=8192,wsize=8192,async  ^I/vol/vol0 netapp:/vol/vol0  ^I/vol/vol0/mounta netapp:/vol/vol0/mounta  ^I/vol/vol0/restricted netapp:/vol/vol0/restricted  ^I/vol/vol0/mountb netapp:/vol/vol0/mountb  ^I/vol/vol0/mountc netapp:/vol/vol0/mountc  ^I/vol/vol0/mountd netapp:/vol/vol0/mountd  ^I/vol/vol1 netapp:/vol/vol1  ^I/vol/vol1/backb netapp:/vol/vol1/backb  ^I/vol/vol1/mounte netapp:/vol/vol1/mounte  ^I/vol/vol1/mountf netapp:/vol/vol1/mountf  ^I/vol/vol1/mountg netapp:/vol/vol1/mountg  ^I/vol/vol1/mounth netapp:/vol/vol1/mounth  ^I/vol/vol1/mounti netapp:/vol/vol1/mounti  ^I/vol/vol1/mountj netapp:/vol/vol1/mountj  ^I/vol/vol1/mountk netapp:/vol/vol1/mountk  ^I/vol/vol1/mountl netapp:/vol/vol1/mountl  ^I/vol/vol1/mountm netapp:/vol/vol1/mountm  ^I/vol/vol1/mountn netapp:/vol/v
 ol1/mountn  ^I/vol/vol2 netapp:/vol/vol2  ^I/vol/vol2/mounto netapp:/vol/vol2/mounto  ^I/!
 vol/vo
Aug 25 06:27:52 mvweb automount[18218]: parse(sun): dequote("fstype=nfs,hard,intr,nodev,nosuid,nonstrict,rsize=8192,wsize=8192,async") -> fstype=nfs,hard,intr,nodev,nosuid,nonstric
t,rsize=8192,wsize=8192,async
Aug 25 06:27:52 mvweb automount[18218]: parse(sun): gathered options: fstype=nfs,hard,intr,nodev,nosuid,nonstrict,rsize=8192,wsize=8192,async
Aug 25 06:27:52 mvweb automount[18218]: parse(sun): dequote("/vol/vol0") -> /vol/vol0
Aug 25 06:27:52 mvweb automount[18218]: parse(sun): dequote("netapp:/vol/vol0") -> netapp:/vol/vol0
Aug 25 06:27:52 mvweb automount[18218]: parse(sun): multimount: netapp:/vol/vol0 on /vol/vol0 with options fstype=nfs,hard,intr,nodev,nosuid,nonstrict,rsize=8192,wsize=8192,async
Aug 25 06:27:52 mvweb automount[18218]: parse(sun): mounting root /net, mountpoint netapp/vol/vol0, what netapp:/vol/vol0, fstype nfs, options hard,intr,nodev,nosuid,rsize=8192,w
size=8192,async
Aug 25 06:27:52 mvweb automount[18218]: mount(nfs):  root=/net name=netapp/vol/vol0 what=netapp:/vol/vol0, fstype=nfs, options=hard,intr,nodev,nosuid,rsize=8192,wsize=8192,async
Aug 25 06:27:52 mvweb automount[18218]: mount(nfs): nfs options="hard,intr,nodev,nosuid,rsize=8192,wsize=8192,async", nosymlink=0
Aug 25 06:27:52 mvweb automount[18218]: mount(nfs): calling mkdir_path /net/netapp/vol/vol0
Aug 25 06:27:52 mvweb automount[18218]: mount(nfs): calling mount -t nfs -s -o hard,intr,nodev,nosuid,rsize=8192,wsize=8192,async netapp:/vol/vol0 /net/netapp/vol/vol0
Aug 25 06:27:52 mvweb automount[18218]: mount(nfs): mounted netapp:/vol/vol0 on /net/netapp/vol/vol0
Aug 25 06:27:52 mvweb automount[18218]: parse(sun): dequote("/vol/vol0/mounta") -> /vol/vol0/mounta
Aug 25 06:27:52 mvweb automount[18218]: parse(sun): dequote("netapp:/vol/vol0/mounta") -> netapp:/vol/vol0/mounta
Aug 25 06:27:52 mvweb automount[18218]: parse(sun): multimount: netapp:/vol/vol0/mounta on /vol/vol0/mounta with options fstype=nfs,hard,intr,nodev,nosuid,nonstrict,rsize=8192,wsize
=8192,async
Aug 25 06:27:52 mvweb automount[18218]: parse(sun): mounting root /net, mountpoint netapp/vol/vol0/mounta, what netapp:/vol/vol0/mounta, fstype nfs, options hard,intr,nodev,nosuid,
rsize=8192,wsize=8192,async
Aug 25 06:27:52 mvweb automount[18218]: mount(nfs):  root=/net name=netapp/vol/vol0/mounta what=netapp:/vol/vol0/mounta, fstype=nfs, options=hard,intr,nodev,nosuid,rsize=8192,wsize
=8192,async
Aug 25 06:27:52 mvweb automount[18218]: mount(nfs): nfs options="hard,intr,nodev,nosuid,rsize=8192,wsize=8192,async", nosymlink=0
Aug 25 06:27:52 mvweb automount[18218]: mount(nfs): calling mkdir_path /net/netapp/vol/vol0/mounta
Aug 25 06:27:52 mvweb automount[18218]: mount(nfs): calling mount -t nfs -s -o hard,intr,nodev,nosuid,rsize=8192,wsize=8192,async netapp:/vol/vol0/mounta /net/netapp/vol/vol0/back
a
Aug 25 06:27:52 mvweb automount[18218]: mount(nfs): mounted netapp:/vol/vol0/mounta on /net/netapp/vol/vol0/mounta
...

-- 
Rob

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-08-26 16:14     ` Rob Sims
@ 2005-08-27  3:34       ` Ian Kent
  2005-08-29 15:20         ` Rob Sims
  0 siblings, 1 reply; 37+ messages in thread
From: Ian Kent @ 2005-08-27  3:34 UTC (permalink / raw)
  To: Rob Sims; +Cc: autofs

On Fri, 26 Aug 2005, Rob Sims wrote:

> On Fri, Aug 26, 2005 at 11:44:21AM +0800, Ian Kent wrote:
> > On Thu, 25 Aug 2005, Rob Sims wrote:
>  
> > > > THE PROBLEM DESCRIPTION:
> 
> > > > Autofs hangs and refuses to mount any directories for a period of time
> > > > after cd'ing to /net/<Netapp>/vol/vol[0-3] and waiting a while.
> > > > The only way to clear this is to reboot the client.
> 
> > > I didn't see a resolution to this in the archive - was it resolved?
> > > How?
>  
> > This has come up several times and as far as I know we can resolve or 
> > work around these problems.
>  
> > If you have a problem then we need to establish versions and symptoms to 
> > know what needs to be done to resolve it.
>  
> What we're seeing is very similar to the original poster's description.
> Netapp, 2.4 kernel, hierarchical mounts like:
> vol0
> vol0/a
> vol0/b
> vol1
> vol1/c
> etc.
> 
> The problem seems to have disappeared when we dropped the number of
> exports to under 32.  The log indicates that vol2 couldn't be unmounted
> because it was busy.  My guess is that 33+ unmount requests were issued
> for the children, at least one failed due to lack of resources, and then
> the parent unmount failed because of the failed child unmount.  The
> parent directory is then mounted mupltiple times (one extra per
> expiration?)

Or perhaps the order the export list is returned is no longer a problem.

The attempt to remount on umount fail has always been a contentious in my 
opinion, however I`ve yet to see a situation that hasn't been caused by 
something else that needs to be fixed.

I thought I asked for versions?
Can we have`em.

If neccessary looking at the code will tell if you have what's needed to 
avoid this. Send me a copy of parse_sun.c from the source you are using 
and I'll check.

Ian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-08-27  3:34       ` Ian Kent
@ 2005-08-29 15:20         ` Rob Sims
  2005-08-30  1:16           ` Ian Kent
  0 siblings, 1 reply; 37+ messages in thread
From: Rob Sims @ 2005-08-29 15:20 UTC (permalink / raw)
  To: autofs

On Sat, Aug 27, 2005 at 11:34:09AM +0800, Ian Kent wrote:
> The attempt to remount on umount fail has always been a contentious in my 
> opinion, however I`ve yet to see a situation that hasn't been caused by 
> something else that needs to be fixed.

I agree - remounting something that is already mounted should be a no-op
in the nfs system.

> I thought I asked for versions?
> Can we have`em.

Sorry - locally-compiled kernel 2.4.23, using autofs4 as a module.
Debian packaging of autofs, 3.9.99-4.0.0pre10-1.  Have unconfirmed
sightings on kernel 2.6.8, autofs 4.1.3+4.1.4beta2-10.  I don't think
these sightings are credible.  No reports since dropping the number of
exports.

> If neccessary looking at the code will tell if you have what's needed to 
> avoid this. Send me a copy of parse_sun.c from the source you are using 
> and I'll check.

I'm using the Debian woody package.  The following is from the source
package after unpacking and building (which applies any patches).
http://www.robsims.com/parse_sun.c

I'll poke around the source some more.  What I want to know is:
1) Are all the child mounts unmounted before unmounting the parent?
2) If not, were the system calls successful?
3) If the system calls failed, what were the error codes?
4) If all child mounts were in fact unmounted, why is the parent busy?

I'll write back what I find, but may not have time for serious digging
for the next couple of weeks.
-- 
Rob

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Re: BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems
  2005-08-29 15:20         ` Rob Sims
@ 2005-08-30  1:16           ` Ian Kent
  0 siblings, 0 replies; 37+ messages in thread
From: Ian Kent @ 2005-08-30  1:16 UTC (permalink / raw)
  To: Rob Sims; +Cc: autofs mailing list

On Mon, 29 Aug 2005, Rob Sims wrote:

> On Sat, Aug 27, 2005 at 11:34:09AM +0800, Ian Kent wrote:
> > The attempt to remount on umount fail has always been a contentious in my 
> > opinion, however I`ve yet to see a situation that hasn't been caused by 
> > something else that needs to be fixed.
> 
> I agree - remounting something that is already mounted should be a no-op
> in the nfs system.
> 
> > I thought I asked for versions?
> > Can we have`em.
> 
> Sorry - locally-compiled kernel 2.4.23, using autofs4 as a module.
> Debian packaging of autofs, 3.9.99-4.0.0pre10-1.  Have unconfirmed
> sightings on kernel 2.6.8, autofs 4.1.3+4.1.4beta2-10.  I don't think
> these sightings are credible.  No reports since dropping the number of
> exports.

Have you patched your 2.4 kernel with the autofs kernel patch?
As the versions of autofs increase it's more likely not to work properly.

There are kernel patches for 2.4 and 2.6. There are a couple of unreleased 
bug fixes in addition to these.

Try http://www.kernel.org/pub/linux/daemons/autofs/v4.

This version of user space autofs is way old.

The fix for the problem you describe was introduced late in 4.1.3 before 
going to 4.1.4 and I believe it is included in the Sarge version.

4.1.3+4.1.4beta2-10 was included in 3.1. Steinar Gunderson has put in 
quite a bit of effort on this. It should, almost certainly, be better 
than the version you are running now.

> 
> > If neccessary looking at the code will tell if you have what's needed to 
> > avoid this. Send me a copy of parse_sun.c from the source you are using 
> > and I'll check.
> 
> I'm using the Debian woody package.  The following is from the source
> package after unpacking and building (which applies any patches).
> http://www.robsims.com/parse_sun.c
> 

Thanks but I don't think I need to look at it now.

> I'll poke around the source some more.  What I want to know is:
> 1) Are all the child mounts unmounted before unmounting the parent?
> 2) If not, were the system calls successful?
> 3) If the system calls failed, what were the error codes?
> 4) If all child mounts were in fact unmounted, why is the parent busy?
> 

Don't waste your time. Your version is to old.

The only thing I would say about the questions above is that autofs takes 
advantage of mounts ability to mount stuff. So to some extent mounting is 
a pass or fail activity. OTOH autofs seems to get buy well enough using 
this and the pain of keeping up with new filesystems and mount options is 
.

Ian

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2005-08-30  1:16 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-11 21:22 BUG: autofs4 + cd /net/<Netapp>/vol/vol[0-3] = port usage problems David Meleedy
2005-01-12  5:38 ` Ian Kent
2005-01-12 16:55   ` Mike Waychison
2005-01-12 20:43     ` David Meleedy
2005-01-13  0:37     ` David Meleedy
2005-01-13  1:05       ` Mike Waychison
2005-01-13  1:07       ` Ian Kent
2005-01-14 14:35       ` raven
2005-01-14 22:38         ` David Meleedy
2005-01-15  2:50           ` raven
2005-01-17 14:52             ` Jeff Moyer
2005-01-18  1:31               ` Ian Kent
2005-01-18 14:18                 ` Jeff Moyer
2005-01-18 17:00                   ` Ian Kent
2005-01-18 17:05                     ` Jeff Moyer
2005-01-19  1:25                       ` Ian Kent
2005-01-18 14:20                 ` Jeff Moyer
2005-01-18 17:04                   ` Ian Kent
2005-01-18 17:07                     ` Jeff Moyer
2005-01-18 17:32                     ` Mike Waychison
2005-01-19  4:21                       ` Ian Kent
2005-01-19  5:00                         ` Re: [autofs] " Trond Myklebust
2005-01-17 14:01         ` raven
2005-01-17 16:19           ` David Meleedy
2005-01-18  1:33             ` Ian Kent
2005-01-13  8:13     ` Ian Kent
2005-01-12 14:50 ` raven
2005-01-12 22:22   ` David Meleedy
2005-01-12 23:01     ` Jeff Moyer
2005-01-12 16:13 ` Dwight Marzolf
2005-01-12 20:55   ` David Meleedy
2005-08-25 22:14 ` Rob Sims
2005-08-26  3:44   ` Ian Kent
2005-08-26 16:14     ` Rob Sims
2005-08-27  3:34       ` Ian Kent
2005-08-29 15:20         ` Rob Sims
2005-08-30  1:16           ` Ian Kent

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.