* something changed from 4 to 5
@ 2008-08-12 16:53 Joe Pruett
2008-08-13 2:58 ` Ian Kent
0 siblings, 1 reply; 13+ messages in thread
From: Joe Pruett @ 2008-08-12 16:53 UTC (permalink / raw)
To: autofs
i have been using mounts like this:
auto.master:
/disks auto.disks
/home auto.home
auto.disks:
server.1 server:/disk/1
server.2 server:/disk/2
other.1 other:/disk/1
other.2 other:/disk/2
auto.home:
user1 :/disks/server.1/user1
user2 :/disks/server.2/user2
user3 :/disks/other.1/user3
user4 :/disks/other.2/user4
this is to avoid ending up with 100s or 1000s of nfs mounts and instead
just have a few nfs mounts and then lots of bind mounts. it has been
working pretty well until the last few months. i am using a mix of centos
4 and centos 5 systems, and they all are using the supplied autofs 5
binaries (autofs5-5.0.1-0.rc2.55.el4_6.2 and autofs-5.0.1-0.rc2.88).
what has started happening is that occasionally an automount child will
get stuck and all access to /disks/something will stop. other /disks/
entries are fine. when i strace the confused child, it shows exit_group
and then exits and then things are fine. i think the parameter to
exit_group is 4, but i haven't kept a log (i will now).
so here are a few questions:
1. is there a better way to do this? in the old sun days, it was like:
user1 server:/disk/1:user1
user2 server:/disk/1:user2
and server:/disk/1 would only get mounted once with symlinks for user1 and
user2.
2. i see some comments about submounts in the 5.0.3 changelog. is this a
submount? or is that something different?
3. is this a known issue? aside from installing a 5.0.3 version (which
i'd like to avoid), might there be a suggested workaround?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-12 16:53 something changed from 4 to 5 Joe Pruett
@ 2008-08-13 2:58 ` Ian Kent
2008-08-13 4:11 ` Joe Pruett
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Ian Kent @ 2008-08-13 2:58 UTC (permalink / raw)
To: Joe Pruett; +Cc: autofs
On Tue, 2008-08-12 at 09:53 -0700, Joe Pruett wrote:
> i have been using mounts like this:
>
> auto.master:
> /disks auto.disks
> /home auto.home
>
> auto.disks:
> server.1 server:/disk/1
> server.2 server:/disk/2
> other.1 other:/disk/1
> other.2 other:/disk/2
>
> auto.home:
> user1 :/disks/server.1/user1
> user2 :/disks/server.2/user2
> user3 :/disks/other.1/user3
> user4 :/disks/other.2/user4
>
> this is to avoid ending up with 100s or 1000s of nfs mounts and instead
> just have a few nfs mounts and then lots of bind mounts. it has been
> working pretty well until the last few months. i am using a mix of centos
> 4 and centos 5 systems, and they all are using the supplied autofs 5
> binaries (autofs5-5.0.1-0.rc2.55.el4_6.2 and autofs-5.0.1-0.rc2.88).
>
> what has started happening is that occasionally an automount child will
> get stuck and all access to /disks/something will stop. other /disks/
> entries are fine. when i strace the confused child, it shows exit_group
> and then exits and then things are fine. i think the parameter to
> exit_group is 4, but i haven't kept a log (i will now).
Install the autofs-debuginfo package and when you see this use gdb to
get a backtrace of the running threads.
gdb -p <automount pid> /usr/sbin/automount
gdb> thr a a bt
>
> so here are a few questions:
>
> 1. is there a better way to do this? in the old sun days, it was like:
>
> user1 server:/disk/1:user1
> user2 server:/disk/1:user2
When was this syntax used, I don't remember it?
And exactly how is it supposed to work?
>
> and server:/disk/1 would only get mounted once with symlinks for user1 and
> user2.
>
> 2. i see some comments about submounts in the 5.0.3 changelog. is this a
> submount? or is that something different?
It's not a submount.
>
> 3. is this a known issue? aside from installing a 5.0.3 version (which
> i'd like to avoid), might there be a suggested workaround?
>
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-13 2:58 ` Ian Kent
@ 2008-08-13 4:11 ` Joe Pruett
2008-08-23 14:45 ` Joe Pruett
2008-09-02 22:07 ` Joe Pruett
2 siblings, 0 replies; 13+ messages in thread
From: Joe Pruett @ 2008-08-13 4:11 UTC (permalink / raw)
To: Ian Kent; +Cc: autofs
> Install the autofs-debuginfo package and when you see this use gdb to
> get a backtrace of the running threads.
>
> gdb -p <automount pid> /usr/sbin/automount
> gdb> thr a a bt
will do.
>> 1. is there a better way to do this? in the old sun days, it was like:
>>
>> user1 server:/disk/1:user1
>> user2 server:/disk/1:user2
>
> When was this syntax used, I don't remember it?
> And exactly how is it supposed to work?
that was the original automount in sunos. i'm pretty sure that sun
deprecated it as well and just treated the extra : as a slash. but it
worked almost exactly like i want. a single nfs mount is made, no matter
how many subdirectories were referenced. i can't recall exactly how the
mounts were laid out to make it all happen. but it was extremely useful
in the days when a system could only handle 32 or some really small number
of nfs mounted file systems. that isn't such an issue today, but i still
worry about it for a mail or web server that may touch 1000s of home
directories at once so that is why i use the recursive automounts. it
also makes df output much more readable. like i say, it has been working
pretty well until recently (i think when i started using autofs v5).
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-13 2:58 ` Ian Kent
2008-08-13 4:11 ` Joe Pruett
@ 2008-08-23 14:45 ` Joe Pruett
2008-08-24 4:32 ` Ian Kent
2008-08-24 4:54 ` Ian Kent
2008-09-02 22:07 ` Joe Pruett
2 siblings, 2 replies; 13+ messages in thread
From: Joe Pruett @ 2008-08-23 14:45 UTC (permalink / raw)
To: autofs
i had one of my servers get into the mode where automount is hung up doing
something. i started attaching to each one and doing the gdb stack trace
you asked for. here are the results. after looking at the third one,
things cleared up. hopefully we can figure something out on this.
Script started on Fri 22 Aug 2008 01:11:35 PM PDT
[root@titan ~]# ps axf | grep auto
1741 ? Ssl 116:45 automount
16772 ? S 0:00 \_ automount
16777 ? S 0:00 \_ automount
20963 ? S 0:00 \_ automount
21865 ? S 0:00 \_ automount
23136 ? S 0:00 \_ automount
25322 pts/0 S+ 0:00 \_ grep auto
[root@titan ~]# gdb -p 1741 /usr/sbin/automount
GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
Attaching to program: /usr/sbin/automount, process 1741
Loaded symbols for /usr/sbin/automount
Reading symbols from /lib/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread -1208218944 (LWP 1741)]
[New Thread -1228944496 (LWP 23135)]
[New Thread -1212564592 (LWP 21864)]
[New Thread -1222640752 (LWP 20962)]
[New Thread -1218868336 (LWP 18488)]
[New Thread -1216767088 (LWP 16774)]
[New Thread -1214665840 (LWP 16773)]
[New Thread -1224742000 (LWP 16771)]
[New Thread -1210463344 (LWP 1749)]
[New Thread -1208362096 (LWP 1746)]
[New Thread -1208292464 (LWP 1743)]
[New Thread -1208222832 (LWP 1742)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libxml2.so...done.
Loaded symbols for /usr/lib/libxml2.so
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /usr/lib/autofs/lookup_nis.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/lookup_yp.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/lookup_nis.so
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /usr/lib/autofs/parse_sun.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/parse_sun.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/parse_sun.so
Reading symbols from /usr/lib/autofs/mount_nfs.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/mount_nfs.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/mount_nfs.so
Reading symbols from /usr/lib/autofs/mount_bind.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/mount_bind.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/mount_bind.so
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
0x002ba402 in __kernel_vsyscall ()
(gdb) thr a a bt
Thread 12 (Thread -1208222832 (LWP 1742)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x0070f4ec in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
#2 0x0025a9af in alarm_handler (arg=0x0) at alarm.c:227
#3 0x0070b46b in start_thread () from /lib/libpthread.so.0
#4 0x00424dbe in clone () from /lib/libc.so.6
Thread 11 (Thread -1208292464 (LWP 1743)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x0070f4ec in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
#2 0x0025477e in st_queue_handler (arg=0x0) at state.c:965
#3 0x0070b46b in start_thread () from /lib/libpthread.so.0
#4 0x00424dbe in clone () from /lib/libc.so.6
Thread 10 (Thread -1208362096 (LWP 1746)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x0041b1c3 in poll () from /lib/libc.so.6
#2 0x00245bc0 in handle_packet (ap=0x8c64bc0) at automount.c:883
#3 0x002473e7 in handle_mounts (arg=0x8c64bc0) at automount.c:1519
#4 0x0070b46b in start_thread () from /lib/libpthread.so.0
#5 0x00424dbe in clone () from /lib/libc.so.6
---Type <return> to continue, or q <return> to quit---
Thread 9 (Thread -1210463344 (LWP 1749)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x0041b1c3 in poll () from /lib/libc.so.6
#2 0x00245bc0 in handle_packet (ap=0x8c651b8) at automount.c:883
#3 0x002473e7 in handle_mounts (arg=0x8c651b8) at automount.c:1519
#4 0x0070b46b in start_thread () from /lib/libpthread.so.0
#5 0x00424dbe in clone () from /lib/libc.so.6
Thread 8 (Thread -1224742000 (LWP 16771)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x00711e2b in read () from /lib/libpthread.so.0
#2 0x0024ef62 in do_spawn (logopt=0, options=2, prog=0xb6ffab26 "/bin/mount",
argv=0xb6ffaac0) at /usr/include/bits/unistd.h:35
#3 0x0024f6c9 in spawn_bind_mount (logopt=0) at spawn.c:400
#4 0x0051c6e4 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb6ffae40 "octc", name_len=4,
what=0x8cd05c0 "/disks/hyperion.0/home/octc", fstype=0x691d2d "bind",
options=0x51f106 "defaults", context=0x18) at mount_bind.c:140
#5 0x0068bcd0 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb6ffae40 "octc", name_len=4,
what=0xb6ffae10 ":/disks/hyperion.0/home/octc", fstype=0x1379e4 "nfs",
options=0x0, context=0x215280) at mount_nfs.c:225
#6 0x00128b85 in sun_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
---Type <return> to continue, or q <return> to quit---
name=0xb6ffd108 "octc", namelen=4,
loc=0x8cd0130 ":/disks/hyperion.0/home/octc", loclen=28,
options=0x8ccfe80 "", ctxt=0x8c5e668) at parse_sun.c:638
#7 0x00129ebc in parse_mount (ap=0x8c64bc0, name=0xb6ffd108 "octc",
name_len=4, mapent=0xb6ffd0a0 ":/disks/hyperion.0/home/octc",
context=0x8c5e668) at parse_sun.c:1452
#8 0x00ce8c6c in lookup_mount (ap=0x8c64bc0, name=0xb7200680 "octc",
name_len=4, context=0x8c5eb20) at lookup_yp.c:646
#9 0x00250d99 in do_lookup_mount (ap=0x8c64bc0, map=0x8c5e6b0,
name=0xb7200680 "octc", name_len=4) at lookup.c:669
#10 0x00251f13 in lookup_nss_mount (ap=0x8c64bc0, source=0x0,
name=0xb7200680 "octc", name_len=4) at lookup.c:731
#11 0x00249e9a in do_mount_indirect (arg=0xb7200620) at indirect.c:835
#12 0x0070b46b in start_thread () from /lib/libpthread.so.0
#13 0x00424dbe in clone () from /lib/libc.so.6
Thread 7 (Thread -1214665840 (LWP 16773)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x00711e2b in read () from /lib/libpthread.so.0
#2 0x0024ef62 in do_spawn (logopt=0, options=0, prog=0xb7996bdd "/bin/mount",
argv=0xb7996b60) at /usr/include/bits/unistd.h:35
#3 0x0024f8f5 in spawn_mount (logopt=0) at spawn.c:301
#4 0x0068bb4d in mount_mount (ap=0x8c651b8, root=0x8c65298 "/disks",
name=0xb7996dd0 "hyperion.0", name_len=10,
---Type <return> to continue, or q <return> to quit---
what=0xb7996da0 "hyperion.spiretech.com:/disk/0", fstype=0x1379e4 "nfs",
options=0xb7996df0 "udp,rsize=32768,wsize=32768", context=0x215280)
at mount_nfs.c:259
#5 0x00128b85 in sun_mount (ap=0x8c651b8, root=0x8c65298 "/disks",
name=0xb7999108 "hyperion.0", namelen=10,
loc=0x8cd0420 "hyperion.spiretech.com:/disk/0", loclen=30,
options=0x8cd00e0 "udp,rsize=32768,wsize=32768", ctxt=0x8c5db38)
at parse_sun.c:638
#6 0x00129ebc in parse_mount (ap=0x8c651b8, name=0xb7999108 "hyperion.0",
name_len=10,
mapent=0xb7999080 "-udp,rsize=32768,wsize=32768 hyperion.spiretech.com:/disk/0", context=0x8c5db38) at parse_sun.c:1452
#7 0x00ce8c6c in lookup_mount (ap=0x8c651b8, name=0x8ccfb50 "hyperion.0",
name_len=10, context=0x8c5db08) at lookup_yp.c:646
#8 0x00250d99 in do_lookup_mount (ap=0x8c651b8, map=0x8c5dac0,
name=0x8ccfb50 "hyperion.0", name_len=10) at lookup.c:669
#9 0x00251f13 in lookup_nss_mount (ap=0x8c651b8, source=0x0,
name=0x8ccfb50 "hyperion.0", name_len=10) at lookup.c:731
#10 0x00249e9a in do_mount_indirect (arg=0x8ccfaf0) at indirect.c:835
#11 0x0070b46b in start_thread () from /lib/libpthread.so.0
#12 0x00424dbe in clone () from /lib/libc.so.6
Thread 6 (Thread -1216767088 (LWP 16774)):
#0 0x002ba402 in __kernel_vsyscall ()
---Type <return> to continue, or q <return> to quit---
#1 0x00711e2b in read () from /lib/libpthread.so.0
#2 0x0024ef62 in do_spawn (logopt=0, options=2, prog=0xb7795b26 "/bin/mount",
argv=0xb7795ac0) at /usr/include/bits/unistd.h:35
#3 0x0024f6c9 in spawn_bind_mount (logopt=0) at spawn.c:400
#4 0x0051c6e4 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb7795e40 "shipper", name_len=7,
what=0x8cd03a0 "/disks/hyperion.0/home/shipper", fstype=0x691d2d "bind",
options=0x51f106 "defaults", context=0x18) at mount_bind.c:140
#5 0x0068bcd0 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb7795e40 "shipper", name_len=7,
what=0xb7795e10 ":/disks/hyperion.0/home/shipper", fstype=0x1379e4 "nfs",
options=0x0, context=0x215280) at mount_nfs.c:225
#6 0x00128b85 in sun_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb7798108 "shipper", namelen=7,
loc=0x8cb3d30 ":/disks/hyperion.0/home/shipper", loclen=31,
options=0x8c6a508 "", ctxt=0x8c5e668) at parse_sun.c:638
#7 0x00129ebc in parse_mount (ap=0x8c64bc0, name=0xb7798108 "shipper",
name_len=7, mapent=0xb77980a0 ":/disks/hyperion.0/home/shipper",
context=0x8c5e668) at parse_sun.c:1452
#8 0x00ce8c6c in lookup_mount (ap=0x8c64bc0, name=0xb7200b40 "shipper",
name_len=7, context=0x8c5eb20) at lookup_yp.c:646
#9 0x00250d99 in do_lookup_mount (ap=0x8c64bc0, map=0x8c5e6b0,
name=0xb7200b40 "shipper", name_len=7) at lookup.c:669
#10 0x00251f13 in lookup_nss_mount (ap=0x8c64bc0, source=0x0,
---Type <return> to continue, or q <return> to quit---
name=0xb7200b40 "shipper", name_len=7) at lookup.c:731
#11 0x00249e9a in do_mount_indirect (arg=0xb7200ae0) at indirect.c:835
#12 0x0070b46b in start_thread () from /lib/libpthread.so.0
#13 0x00424dbe in clone () from /lib/libc.so.6
Thread 5 (Thread -1218868336 (LWP 18488)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x0041d219 in ioctl () from /lib/libc.so.6
#2 0x00248743 in expire_indirect (ap=0x8c651b8, ioctlfd=15,
path=<value optimized out>, when=0) at indirect.c:341
#3 0x0024947b in expire_proc_indirect (arg=0x8cd08a0) at indirect.c:459
#4 0x0070b46b in start_thread () from /lib/libpthread.so.0
#5 0x00424dbe in clone () from /lib/libc.so.6
Thread 4 (Thread -1222640752 (LWP 20962)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x00711e2b in read () from /lib/libpthread.so.0
#2 0x0024ef62 in do_spawn (logopt=0, options=2, prog=0xb71fbb26 "/bin/mount",
argv=0xb71fbac0) at /usr/include/bits/unistd.h:35
#3 0x0024f6c9 in spawn_bind_mount (logopt=0) at spawn.c:400
#4 0x0051c6e4 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb71fbe40 "hemphill", name_len=8,
what=0x8cd0750 "/disks/hyperion.0/home/hemphill", fstype=0x691d2d "bind",
options=0x51f106 "defaults", context=0x18) at mount_bind.c:140
---Type <return> to continue, or q <return> to quit---
#5 0x0068bcd0 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb71fbe40 "hemphill", name_len=8,
what=0xb71fbe10 ":/disks/hyperion.0/home/hemphill", fstype=0x1379e4 "nfs",
options=0x0, context=0x215280) at mount_nfs.c:225
#6 0x00128b85 in sun_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb71fe108 "hemphill", namelen=8,
loc=0x8cd0700 ":/disks/hyperion.0/home/hemphill", loclen=32,
options=0x8ccda20 "", ctxt=0x8c5e668) at parse_sun.c:638
#7 0x00129ebc in parse_mount (ap=0x8c64bc0, name=0xb71fe108 "hemphill",
name_len=8, mapent=0xb71fe0a0 ":/disks/hyperion.0/home/hemphill",
context=0x8c5e668) at parse_sun.c:1452
#8 0x00ce8c6c in lookup_mount (ap=0x8c64bc0, name=0xb7200cc0 "hemphill",
name_len=8, context=0x8c5eb20) at lookup_yp.c:646
#9 0x00250d99 in do_lookup_mount (ap=0x8c64bc0, map=0x8c5e6b0,
name=0xb7200cc0 "hemphill", name_len=8) at lookup.c:669
#10 0x00251f13 in lookup_nss_mount (ap=0x8c64bc0, source=0x0,
name=0xb7200cc0 "hemphill", name_len=8) at lookup.c:731
#11 0x00249e9a in do_mount_indirect (arg=0xb7200c60) at indirect.c:835
#12 0x0070b46b in start_thread () from /lib/libpthread.so.0
#13 0x00424dbe in clone () from /lib/libc.so.6
Thread 3 (Thread -1212564592 (LWP 21864)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x00711e2b in read () from /lib/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#2 0x0024ef62 in do_spawn (logopt=0, options=2, prog=0xb7b97b26 "/bin/mount",
argv=0xb7b97ac0) at /usr/include/bits/unistd.h:35
#3 0x0024f6c9 in spawn_bind_mount (logopt=0) at spawn.c:400
#4 0x0051c6e4 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb7b97e40 "havelkaj", name_len=8,
what=0x8cd0b50 "/disks/hyperion.0/home/havelkaj", fstype=0x691d2d "bind",
options=0x51f106 "defaults", context=0x18) at mount_bind.c:140
#5 0x0068bcd0 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb7b97e40 "havelkaj", name_len=8,
what=0xb7b97e10 ":/disks/hyperion.0/home/havelkaj", fstype=0x1379e4 "nfs",
options=0x0, context=0x215280) at mount_nfs.c:225
#6 0x00128b85 in sun_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb7b9a108 "havelkaj", namelen=8,
loc=0x8cd0728 ":/disks/hyperion.0/home/havelkaj", loclen=32,
options=0x8c6a0a0 "", ctxt=0x8c5e668) at parse_sun.c:638
#7 0x00129ebc in parse_mount (ap=0x8c64bc0, name=0xb7b9a108 "havelkaj",
name_len=8, mapent=0xb7b9a0a0 ":/disks/hyperion.0/home/havelkaj",
context=0x8c5e668) at parse_sun.c:1452
#8 0x00ce8c6c in lookup_mount (ap=0x8c64bc0, name=0xb7200e40 "havelkaj",
name_len=8, context=0x8c5eb20) at lookup_yp.c:646
#9 0x00250d99 in do_lookup_mount (ap=0x8c64bc0, map=0x8c5e6b0,
name=0xb7200e40 "havelkaj", name_len=8) at lookup.c:669
#10 0x00251f13 in lookup_nss_mount (ap=0x8c64bc0, source=0x0,
name=0xb7200e40 "havelkaj", name_len=8) at lookup.c:731
---Type <return> to continue, or q <return> to quit---
#11 0x00249e9a in do_mount_indirect (arg=0xb7200de0) at indirect.c:835
#12 0x0070b46b in start_thread () from /lib/libpthread.so.0
#13 0x00424dbe in clone () from /lib/libc.so.6
Thread 2 (Thread -1228944496 (LWP 23135)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x00711e2b in read () from /lib/libpthread.so.0
#2 0x0024ef62 in do_spawn (logopt=0, options=2, prog=0xb6bf8af6 "/bin/mount",
argv=0xb6bf8a90) at /usr/include/bits/unistd.h:35
#3 0x0024f6c9 in spawn_bind_mount (logopt=0) at spawn.c:400
#4 0x0051c6e4 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb6bf8e20 "jason_not", name_len=9,
what=0x8cd0bc8 "/disks/hyperion.0/home/jason_not", fstype=0x691d2d "bind",
options=0x51f106 "defaults", context=0x18) at mount_bind.c:140
#5 0x0068bcd0 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb6bf8e20 "jason_not", name_len=9,
what=0xb6bf8de0 ":/disks/hyperion.0/home/jason_not",
fstype=0x1379e4 "nfs", options=0x0, context=0x215280) at mount_nfs.c:225
#6 0x00128b85 in sun_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb6bfb108 "jason_not", namelen=9,
loc=0x8cd0b28 ":/disks/hyperion.0/home/jason_not", loclen=33,
options=0x8c6b680 "", ctxt=0x8c5e668) at parse_sun.c:638
#7 0x00129ebc in parse_mount (ap=0x8c64bc0, name=0xb6bfb108 "jason_not",
name_len=9, mapent=0xb6bfb090 ":/disks/hyperion.0/home/jason_not",
---Type <return> to continue, or q <return> to quit---
context=0x8c5e668) at parse_sun.c:1452
#8 0x00ce8c6c in lookup_mount (ap=0x8c64bc0, name=0xb7200fc0 "jason_not",
name_len=9, context=0x8c5eb20) at lookup_yp.c:646
#9 0x00250d99 in do_lookup_mount (ap=0x8c64bc0, map=0x8c5e6b0,
name=0xb7200fc0 "jason_not", name_len=9) at lookup.c:669
#10 0x00251f13 in lookup_nss_mount (ap=0x8c64bc0, source=0x0,
name=0xb7200fc0 "jason_not", name_len=9) at lookup.c:731
#11 0x00249e9a in do_mount_indirect (arg=0xb7200f60) at indirect.c:835
#12 0x0070b46b in start_thread () from /lib/libpthread.so.0
#13 0x00424dbe in clone () from /lib/libc.so.6
Thread 1 (Thread -1208218944 (LWP 1741)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x00712f7e in do_sigwait () from /lib/libpthread.so.0
#2 0x0071301f in sigwait () from /lib/libpthread.so.0
#3 0x00243e38 in statemachine (arg=<value optimized out>) at automount.c:1343
#4 0x002452e0 in main (argc=0, argv=0xbf9a5208) at automount.c:2064
#0 0x002ba402 in __kernel_vsyscall ()
(gdb) q
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/sbin/automount, process 1741
[root@titan ~]# ps axf | grep auto
1741 ? Ssl 116:45 automount
16772 ? S 0:00 \_ automount
16777 ? S 0:00 \_ automount
20963 ? S 0:00 \_ automount
21865 ? S 0:00 \_ automount
23136 ? S 0:00 \_ automount
25333 pts/0 S+ 0:00 \_ grep auto
[root@titan ~]# gdb -p 16772 /usr/sbin/automount
GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
Attaching to program: /usr/sbin/automount, process 16772
Loaded symbols for /usr/sbin/automount
Reading symbols from /lib/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread -1224742000 (LWP 16772)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libxml2.so...done.
Loaded symbols for /usr/lib/libxml2.so
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /usr/lib/autofs/lookup_nis.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/lookup_yp.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/lookup_nis.so
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /usr/lib/autofs/parse_sun.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/parse_sun.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/parse_sun.so
Reading symbols from /usr/lib/autofs/mount_nfs.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/mount_nfs.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/mount_nfs.so
Reading symbols from /usr/lib/autofs/mount_bind.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/mount_bind.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/mount_bind.so
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
0x002ba402 in __kernel_vsyscall ()
(gdb) thr a a bt
Thread 1 (Thread -1224742000 (LWP 16772)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x00415b76 in access () from /lib/libc.so.6
#2 0x0024ee66 in do_spawn (logopt=0, options=<value optimized out>,
prog=0xb6ffab26 "/bin/mount", argv=0xb6ffaac0) at spawn.c:156
#3 0x0024f6c9 in spawn_bind_mount (logopt=0) at spawn.c:400
#4 0x0051c6e4 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb6ffae40 "octc", name_len=4,
what=0x8cd05c0 "/disks/hyperion.0/home/octc", fstype=0x691d2d "bind",
options=0x51f106 "defaults", context=0x18) at mount_bind.c:140
#5 0x0068bcd0 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb6ffae40 "octc", name_len=4,
what=0xb6ffae10 ":/disks/hyperion.0/home/octc", fstype=0x1379e4 "nfs",
options=0x0, context=0x215280) at mount_nfs.c:225
#6 0x00128b85 in sun_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb6ffd108 "octc", namelen=4,
loc=0x8cd0130 ":/disks/hyperion.0/home/octc", loclen=28,
options=0x8ccfe80 "", ctxt=0x8c5e668) at parse_sun.c:638
#7 0x00129ebc in parse_mount (ap=0x8c64bc0, name=0xb6ffd108 "octc",
name_len=4, mapent=0xb6ffd0a0 ":/disks/hyperion.0/home/octc",
context=0x8c5e668) at parse_sun.c:1452
#8 0x00ce8c6c in lookup_mount (ap=0x8c64bc0, name=0xb7200680 "octc",
name_len=4, context=0x8c5eb20) at lookup_yp.c:646
---Type <return> to continue, or q <return> to quit---
#9 0x00250d99 in do_lookup_mount (ap=0x8c64bc0, map=0x8c5e6b0,
name=0xb7200680 "octc", name_len=4) at lookup.c:669
#10 0x00251f13 in lookup_nss_mount (ap=0x8c64bc0, source=0x0,
name=0xb7200680 "octc", name_len=4) at lookup.c:731
#11 0x00249e9a in do_mount_indirect (arg=0xb7200620) at indirect.c:835
#12 0x0070b46b in start_thread () from /lib/libpthread.so.0
#13 0x00424dbe in clone () from /lib/libc.so.6
#0 0x002ba402 in __kernel_vsyscall ()
(gdb) q
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/sbin/automount, process 16772
[root@titan ~]# ps axf | grep auto
1741 ? Ssl 116:46 automount
16777 ? S 0:00 \_ automount
20963 ? S 0:00 \_ automount
21865 ? S 0:00 \_ automount
23136 ? S 0:00 \_ automount
25346 pts/0 S+ 0:00 \_ grep auto
[root@titan ~]# gdb -p 16777 /usr/sbin/automount
GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
Attaching to program: /usr/sbin/automount, process 16777
Loaded symbols for /usr/sbin/automount
Reading symbols from /lib/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread -1216767088 (LWP 16777)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libxml2.so...done.
Loaded symbols for /usr/lib/libxml2.so
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /usr/lib/autofs/lookup_nis.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/lookup_yp.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/lookup_nis.so
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /usr/lib/autofs/parse_sun.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/parse_sun.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/parse_sun.so
Reading symbols from /usr/lib/autofs/mount_nfs.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/mount_nfs.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/mount_nfs.so
Reading symbols from /usr/lib/autofs/mount_bind.so...Reading symbols from /usr/lib/debug/usr/lib/autofs/mount_bind.so.debug...done.
done.
Loaded symbols for /usr/lib/autofs/mount_bind.so
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
0x002ba402 in __kernel_vsyscall ()
(gdb) thr a a bt
Thread 1 (Thread -1216767088 (LWP 16777)):
#0 0x002ba402 in __kernel_vsyscall ()
#1 0x00415b76 in access () from /lib/libc.so.6
#2 0x0024ee66 in do_spawn (logopt=0, options=<value optimized out>,
prog=0xb7795b26 "/bin/mount", argv=0xb7795ac0) at spawn.c:156
#3 0x0024f6c9 in spawn_bind_mount (logopt=0) at spawn.c:400
#4 0x0051c6e4 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb7795e40 "shipper", name_len=7,
what=0x8cd03a0 "/disks/hyperion.0/home/shipper", fstype=0x691d2d "bind",
options=0x51f106 "defaults", context=0x18) at mount_bind.c:140
#5 0x0068bcd0 in mount_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb7795e40 "shipper", name_len=7,
what=0xb7795e10 ":/disks/hyperion.0/home/shipper", fstype=0x1379e4 "nfs",
options=0x0, context=0x215280) at mount_nfs.c:225
#6 0x00128b85 in sun_mount (ap=0x8c64bc0, root=0x8c5eb80 "/home",
name=0xb7798108 "shipper", namelen=7,
loc=0x8cb3d30 ":/disks/hyperion.0/home/shipper", loclen=31,
options=0x8c6a508 "", ctxt=0x8c5e668) at parse_sun.c:638
#7 0x00129ebc in parse_mount (ap=0x8c64bc0, name=0xb7798108 "shipper",
name_len=7, mapent=0xb77980a0 ":/disks/hyperion.0/home/shipper",
context=0x8c5e668) at parse_sun.c:1452
#8 0x00ce8c6c in lookup_mount (ap=0x8c64bc0, name=0xb7200b40 "shipper",
name_len=7, context=0x8c5eb20) at lookup_yp.c:646
---Type <return> to continue, or q <return> to quit---
#9 0x00250d99 in do_lookup_mount (ap=0x8c64bc0, map=0x8c5e6b0,
name=0xb7200b40 "shipper", name_len=7) at lookup.c:669
#10 0x00251f13 in lookup_nss_mount (ap=0x8c64bc0, source=0x0,
name=0xb7200b40 "shipper", name_len=7) at lookup.c:731
#11 0x00249e9a in do_mount_indirect (arg=0xb7200ae0) at indirect.c:835
#12 0x0070b46b in start_thread () from /lib/libpthread.so.0
#13 0x00424dbe in clone () from /lib/libc.so.6
#0 0x002ba402 in __kernel_vsyscall ()
(gdb) q
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/sbin/automount, process 16777
[root@titan ~]# ps axf | grep auto
1741 ? Ssl 116:46 automount
25359 pts/0 S+ 0:00 \_ grep auto
[root@titan ~]#
Script done on Fri 22 Aug 2008 01:12:55 PM PDT
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-23 14:45 ` Joe Pruett
@ 2008-08-24 4:32 ` Ian Kent
2008-08-24 4:58 ` Joe Pruett
2008-08-24 4:54 ` Ian Kent
1 sibling, 1 reply; 13+ messages in thread
From: Ian Kent @ 2008-08-24 4:32 UTC (permalink / raw)
To: Joe Pruett; +Cc: autofs
Joe Pruett wrote:
> i had one of my servers get into the mode where automount is hung up doing
> something. i started attaching to each one and doing the gdb stack trace
> you asked for. here are the results. after looking at the third one,
> things cleared up. hopefully we can figure something out on this.
>
> Script started on Fri 22 Aug 2008 01:11:35 PM PDT
> [root@titan ~]# ps axf | grep auto
> 1741 ? Ssl 116:45 automount
> 16772 ? S 0:00 \_ automount
> 16777 ? S 0:00 \_ automount
> 20963 ? S 0:00 \_ automount
> 21865 ? S 0:00 \_ automount
> 23136 ? S 0:00 \_ automount
> 25322 pts/0 S+ 0:00 \_ grep auto
> [root@titan ~]# gdb -p 1741 /usr/sbin/automount
>
snip ..
> 0x002ba402 in __kernel_vsyscall ()
> (gdb) thr a a bt
>
Took me a while to work out what you had done here.
To get the backtrace you need only connect to the main thread (in this
case 1741) and do the "thr a a bt".
That gives backtrace info for all the sub-threads at once, which is what
we need.
Ian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-23 14:45 ` Joe Pruett
2008-08-24 4:32 ` Ian Kent
@ 2008-08-24 4:54 ` Ian Kent
2008-08-24 5:27 ` Ian Kent
1 sibling, 1 reply; 13+ messages in thread
From: Ian Kent @ 2008-08-24 4:54 UTC (permalink / raw)
To: Joe Pruett; +Cc: autofs
Joe Pruett wrote:
> i had one of my servers get into the mode where automount is hung up doing
> something. i started attaching to each one and doing the gdb stack trace
> you asked for. here are the results. after looking at the third one,
> things cleared up. hopefully we can figure something out on this.
>
> Script started on Fri 22 Aug 2008 01:11:35 PM PDT
> [root@titan ~]# ps axf | grep auto
> 1741 ? Ssl 116:45 automount
> 16772 ? S 0:00 \_ automount
> 16777 ? S 0:00 \_ automount
> 20963 ? S 0:00 \_ automount
> 21865 ? S 0:00 \_ automount
> 23136 ? S 0:00 \_ automount
> 25322 pts/0 S+ 0:00 \_ grep auto
> [root@titan ~]# gdb -p 1741 /usr/sbin/automount
>
snip ...
> Attaching to program: /usr/sbin/automount, process 1741
> Loaded symbols for /usr/sbin/automount
> Reading symbols from /lib/libpthread.so.0...done.
> [Thread debugging using libthread_db enabled]
> [New Thread -1208218944 (LWP 1741)]
> [New Thread -1228944496 (LWP 23135)]
> [New Thread -1212564592 (LWP 21864)]
> [New Thread -1222640752 (LWP 20962)]
> [New Thread -1218868336 (LWP 18488)]
> [New Thread -1216767088 (LWP 16774)]
> [New Thread -1214665840 (LWP 16773)]
> [New Thread -1224742000 (LWP 16771)]
> [New Thread -1210463344 (LWP 1749)]
> [New Thread -1208362096 (LWP 1746)]
> [New Thread -1208292464 (LWP 1743)]
> [New Thread -1208222832 (LWP 1742)]
>
snip ...
> 0x002ba402 in __kernel_vsyscall ()
> (gdb) thr a a bt
>
snip ...
>
> Thread 7 (Thread -1214665840 (LWP 16773)):
> #0 0x002ba402 in __kernel_vsyscall ()
> #1 0x00711e2b in read () from /lib/libpthread.so.0
> #2 0x0024ef62 in do_spawn (logopt=0, options=0, prog=0xb7996bdd "/bin/mount",
> argv=0xb7996b60) at /usr/include/bits/unistd.h:35
> #3 0x0024f8f5 in spawn_mount (logopt=0) at spawn.c:301
> #4 0x0068bb4d in mount_mount (ap=0x8c651b8, root=0x8c65298 "/disks",
> name=0xb7996dd0 "hyperion.0", name_len=10,
> ---Type <return> to continue, or q <return> to quit---
> what=0xb7996da0 "hyperion.spiretech.com:/disk/0", fstype=0x1379e4 "nfs",
> options=0xb7996df0 "udp,rsize=32768,wsize=32768", context=0x215280)
> at mount_nfs.c:259
> #5 0x00128b85 in sun_mount (ap=0x8c651b8, root=0x8c65298 "/disks",
> name=0xb7999108 "hyperion.0", namelen=10,
> loc=0x8cd0420 "hyperion.spiretech.com:/disk/0", loclen=30,
> options=0x8cd00e0 "udp,rsize=32768,wsize=32768", ctxt=0x8c5db38)
> at parse_sun.c:638
> #6 0x00129ebc in parse_mount (ap=0x8c651b8, name=0xb7999108 "hyperion.0",
> name_len=10,
> mapent=0xb7999080 "-udp,rsize=32768,wsize=32768 hyperion.spiretech.com:/disk/0", context=0x8c5db38) at parse_sun.c:1452
> #7 0x00ce8c6c in lookup_mount (ap=0x8c651b8, name=0x8ccfb50 "hyperion.0",
> name_len=10, context=0x8c5db08) at lookup_yp.c:646
> #8 0x00250d99 in do_lookup_mount (ap=0x8c651b8, map=0x8c5dac0,
> name=0x8ccfb50 "hyperion.0", name_len=10) at lookup.c:669
> #9 0x00251f13 in lookup_nss_mount (ap=0x8c651b8, source=0x0,
> name=0x8ccfb50 "hyperion.0", name_len=10) at lookup.c:731
> #10 0x00249e9a in do_mount_indirect (arg=0x8ccfaf0) at indirect.c:835
> #11 0x0070b46b in start_thread () from /lib/libpthread.so.0
> #12 0x00424dbe in clone () from /lib/libc.so.6
>
It looks like this is what's blocking the rest and it looks OK.
AFAICT there's no evidence in the backtrace that autofs itself is
deadlocked or waiting on a completion message that has been missed.
If mount(8) is waiting for a mount that's higher up in the tree then
everything else should also wait.
Without more information I'd have to say there's not much autofs can do
here.
Ian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-24 4:32 ` Ian Kent
@ 2008-08-24 4:58 ` Joe Pruett
2008-08-24 5:11 ` Ian Kent
0 siblings, 1 reply; 13+ messages in thread
From: Joe Pruett @ 2008-08-24 4:58 UTC (permalink / raw)
To: Ian Kent; +Cc: autofs
> Took me a while to work out what you had done here.
> To get the backtrace you need only connect to the main thread (in this case
> 1741) and do the "thr a a bt".
> That gives backtrace info for all the sub-threads at once, which is what we
> need.
this is a centos 5 box so the other pids from ps are standalone processes
(look at the lwp list from the first gdb to see that they aren't the
same).
the first gdb should have been the data you want. by connecting to the
other pids, it eventually unblocks whatever is stuck and then the world
starts working again. oddly, it seems to be a call to access that is hung.
in the past i've just used strace and could never tell what was stuck.
just to refresh how i have things:
auto.master:
/disks auto.disks
/home auto.home
auto.disks:
hyperion.0 hyperion:/disk/0
jupiter.0 jupiter:/disk/0
mir.0 mir:/disk/0
auto.home: (examples)
joey :/disks/jupiter.0/joey
admin :/disks/mir.0/admin
shipper :/disks/hyperion.0/shipper
so tickling /home/shipper should tickle /disks/hyperion.0 to create an nfs
mount for hyperion:/disk/0 and then a bind mount to
/disks/hyperion.0/shipper.
this works 99% of the time. the machine that acts up the most is the
least used (ssh and cron server), so my guess is that things in /disks
actually get unmounted between times that are using /home.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-24 4:58 ` Joe Pruett
@ 2008-08-24 5:11 ` Ian Kent
2008-08-24 14:48 ` Joe Pruett
0 siblings, 1 reply; 13+ messages in thread
From: Ian Kent @ 2008-08-24 5:11 UTC (permalink / raw)
To: Joe Pruett; +Cc: autofs
Joe Pruett wrote:
>> Took me a while to work out what you had done here.
>> To get the backtrace you need only connect to the main thread (in
>> this case 1741) and do the "thr a a bt".
>> That gives backtrace info for all the sub-threads at once, which is
>> what we need.
>
> this is a centos 5 box so the other pids from ps are standalone
> processes (look at the lwp list from the first gdb to see that they
> aren't the same).
>
> the first gdb should have been the data you want. by connecting to the
> other pids, it eventually unblocks whatever is stuck and then the
> world starts working again. oddly, it seems to be a call to access
> that is hung. in the past i've just used strace and could never tell
> what was stuck.
And the call to access(2) is seen in the first backtrace as waiting on
an nfs mount and any other processes calling to access(2) should wait on
the first in this case, since the path includes the same mount.
When this happens you could also try looking for mount processes to
match them to those in the backtraces but if any have died that should
result in the daemon continuing. So it's a bit puzzling.
Ian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-24 4:54 ` Ian Kent
@ 2008-08-24 5:27 ` Ian Kent
0 siblings, 0 replies; 13+ messages in thread
From: Ian Kent @ 2008-08-24 5:27 UTC (permalink / raw)
To: Joe Pruett; +Cc: autofs
Ian Kent wrote:
> Joe Pruett wrote:
>
>> i had one of my servers get into the mode where automount is hung up doing
>> something. i started attaching to each one and doing the gdb stack trace
>> you asked for. here are the results. after looking at the third one,
>> things cleared up. hopefully we can figure something out on this.
>>
>> Script started on Fri 22 Aug 2008 01:11:35 PM PDT
>> [root@titan ~]# ps axf | grep auto
>> 1741 ? Ssl 116:45 automount
>> 16772 ? S 0:00 \_ automount
>> 16777 ? S 0:00 \_ automount
>> 20963 ? S 0:00 \_ automount
>> 21865 ? S 0:00 \_ automount
>> 23136 ? S 0:00 \_ automount
>> 25322 pts/0 S+ 0:00 \_ grep auto
>> [root@titan ~]# gdb -p 1741 /usr/sbin/automount
>>
>>
>
> snip ...
>
>
>> Attaching to program: /usr/sbin/automount, process 1741
>> Loaded symbols for /usr/sbin/automount
>> Reading symbols from /lib/libpthread.so.0...done.
>> [Thread debugging using libthread_db enabled]
>> [New Thread -1208218944 (LWP 1741)]
>> [New Thread -1228944496 (LWP 23135)]
>> [New Thread -1212564592 (LWP 21864)]
>> [New Thread -1222640752 (LWP 20962)]
>> [New Thread -1218868336 (LWP 18488)]
>> [New Thread -1216767088 (LWP 16774)]
>> [New Thread -1214665840 (LWP 16773)]
>> [New Thread -1224742000 (LWP 16771)]
>> [New Thread -1210463344 (LWP 1749)]
>> [New Thread -1208362096 (LWP 1746)]
>> [New Thread -1208292464 (LWP 1743)]
>> [New Thread -1208222832 (LWP 1742)]
>>
>>
>
> snip ...
>
>
>> 0x002ba402 in __kernel_vsyscall ()
>> (gdb) thr a a bt
>>
>>
>
> snip ...
>
>
>> Thread 7 (Thread -1214665840 (LWP 16773)):
>> #0 0x002ba402 in __kernel_vsyscall ()
>> #1 0x00711e2b in read () from /lib/libpthread.so.0
>> #2 0x0024ef62 in do_spawn (logopt=0, options=0, prog=0xb7996bdd "/bin/mount",
>> argv=0xb7996b60) at /usr/include/bits/unistd.h:35
>> #3 0x0024f8f5 in spawn_mount (logopt=0) at spawn.c:301
>> #4 0x0068bb4d in mount_mount (ap=0x8c651b8, root=0x8c65298 "/disks",
>> name=0xb7996dd0 "hyperion.0", name_len=10,
>> ---Type <return> to continue, or q <return> to quit---
>> what=0xb7996da0 "hyperion.spiretech.com:/disk/0", fstype=0x1379e4 "nfs",
>> options=0xb7996df0 "udp,rsize=32768,wsize=32768", context=0x215280)
>> at mount_nfs.c:259
>> #5 0x00128b85 in sun_mount (ap=0x8c651b8, root=0x8c65298 "/disks",
>> name=0xb7999108 "hyperion.0", namelen=10,
>> loc=0x8cd0420 "hyperion.spiretech.com:/disk/0", loclen=30,
>> options=0x8cd00e0 "udp,rsize=32768,wsize=32768", ctxt=0x8c5db38)
>> at parse_sun.c:638
>> #6 0x00129ebc in parse_mount (ap=0x8c651b8, name=0xb7999108 "hyperion.0",
>> name_len=10,
>> mapent=0xb7999080 "-udp,rsize=32768,wsize=32768 hyperion.spiretech.com:/disk/0", context=0x8c5db38) at parse_sun.c:1452
>> #7 0x00ce8c6c in lookup_mount (ap=0x8c651b8, name=0x8ccfb50 "hyperion.0",
>> name_len=10, context=0x8c5db08) at lookup_yp.c:646
>> #8 0x00250d99 in do_lookup_mount (ap=0x8c651b8, map=0x8c5dac0,
>> name=0x8ccfb50 "hyperion.0", name_len=10) at lookup.c:669
>> #9 0x00251f13 in lookup_nss_mount (ap=0x8c651b8, source=0x0,
>> name=0x8ccfb50 "hyperion.0", name_len=10) at lookup.c:731
>> #10 0x00249e9a in do_mount_indirect (arg=0x8ccfaf0) at indirect.c:835
>> #11 0x0070b46b in start_thread () from /lib/libpthread.so.0
>> #12 0x00424dbe in clone () from /lib/libc.so.6
>>
>>
>
> It looks like this is what's blocking the rest and it looks OK.
> AFAICT there's no evidence in the backtrace that autofs itself is
> deadlocked or waiting on a completion message that has been missed.
> If mount(8) is waiting for a mount that's higher up in the tree then
> everything else should also wait.
> Without more information I'd have to say there's not much autofs can do
> here.
>
Or this may be a different example of a kernel lookup bug I've worked on
recently and I've just not seen it in this context before.
Perhaps a debug log of this happening would provide more info.
Ian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-24 5:11 ` Ian Kent
@ 2008-08-24 14:48 ` Joe Pruett
2008-08-25 2:14 ` Ian Kent
0 siblings, 1 reply; 13+ messages in thread
From: Joe Pruett @ 2008-08-24 14:48 UTC (permalink / raw)
To: Ian Kent; +Cc: autofs
> And the call to access(2) is seen in the first backtrace as waiting on an nfs
> mount and any other processes calling to access(2) should wait on the first
> in this case, since the path includes the same mount.
>
> When this happens you could also try looking for mount processes to match
> them to those in the backtraces but if any have died that should result in
> the daemon continuing. So it's a bit puzzling.
i don't think there are any mount processes hanging around. i'll look the
next time it happens.
what is the best way to get debugging output? -l debug? or -d? or both?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-24 14:48 ` Joe Pruett
@ 2008-08-25 2:14 ` Ian Kent
2008-08-25 2:42 ` Joe Pruett
0 siblings, 1 reply; 13+ messages in thread
From: Ian Kent @ 2008-08-25 2:14 UTC (permalink / raw)
To: Joe Pruett; +Cc: autofs
Joe Pruett wrote:
>> And the call to access(2) is seen in the first backtrace as waiting
>> on an nfs mount and any other processes calling to access(2) should
>> wait on the first in this case, since the path includes the same mount.
>>
>> When this happens you could also try looking for mount processes to
>> match them to those in the backtraces but if any have died that
>> should result in the daemon continuing. So it's a bit puzzling.
>
> i don't think there are any mount processes hanging around. i'll look
> the next time it happens.
>
> what is the best way to get debugging output? -l debug? or -d? or
> both?
You need to be concerned with what is being sent to syslog as well.
Have a look at http://people.redhat.com/jmoyer.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-25 2:14 ` Ian Kent
@ 2008-08-25 2:42 ` Joe Pruett
0 siblings, 0 replies; 13+ messages in thread
From: Joe Pruett @ 2008-08-25 2:42 UTC (permalink / raw)
To: Ian Kent; +Cc: autofs
> You need to be concerned with what is being sent to syslog as well.
> Have a look at http://people.redhat.com/jmoyer.
i have debugging enabled now. it may be another week or so before it acts
up again.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: something changed from 4 to 5
2008-08-13 2:58 ` Ian Kent
2008-08-13 4:11 ` Joe Pruett
2008-08-23 14:45 ` Joe Pruett
@ 2008-09-02 22:07 ` Joe Pruett
2 siblings, 0 replies; 13+ messages in thread
From: Joe Pruett @ 2008-09-02 22:07 UTC (permalink / raw)
To: autofs
recap: automount gets hung waiting for something to happen and never times
out so user processes get hung forever. an strace or gdb of the automount
processes will clear things up and then it runs fine for days.
of course the problem has not come back now that i have debugging enabled.
that certainly makes it seem like a timing issue of some sort. i'll keep
it running this way and hope that we'll see something happen.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-09-02 22:07 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-12 16:53 something changed from 4 to 5 Joe Pruett
2008-08-13 2:58 ` Ian Kent
2008-08-13 4:11 ` Joe Pruett
2008-08-23 14:45 ` Joe Pruett
2008-08-24 4:32 ` Ian Kent
2008-08-24 4:58 ` Joe Pruett
2008-08-24 5:11 ` Ian Kent
2008-08-24 14:48 ` Joe Pruett
2008-08-25 2:14 ` Ian Kent
2008-08-25 2:42 ` Joe Pruett
2008-08-24 4:54 ` Ian Kent
2008-08-24 5:27 ` Ian Kent
2008-09-02 22:07 ` Joe Pruett
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.