* hung server again
@ 2008-11-18 15:40 Joe Pruett
2008-11-18 15:57 ` Ian Kent
2008-11-18 18:31 ` Joe Pruett
0 siblings, 2 replies; 7+ messages in thread
From: Joe Pruett @ 2008-11-18 15:40 UTC (permalink / raw)
To: autofs
a few months back i had a situation where a child of automount wouldn't
exit properly and that would cause the parent to get locked up and all
automount would stop working. if i could get onto the system, i could
strace the child automounts and watch them exit and eventually i'd hit the
one that was the cause and then everything would start working again.
there have been a number of kernel updates since and it looked like things
were working normally again. but i was wrong, our web server locked up
last night from the same issue again. given the length of time between
issues and the infeasibility of debugging this on a live server i want to
ask about my setup and see if i should change things.
i use a two level automount hierarchy. /home entries are all designed to
be bind mounts, like:
joey :/disks/hyperion.0/home/joey
and then /disks is a normal automount map:
hyperion.0 hyperion:/disk/0
repeat this for a few thousand /home entries, but only a handful of
entries in /disks. i have done this to avoid thousands of nfs mounts on a
busy server (like a mail server). in the past with udp it was bad enough,
but now with tcp based nfs, each mount uses even more resources (sockets,
tcp buffers, etc). but given this lockup bug, i am thinking of going to a
straight single level /home setup.
i did some testing with a single level /home and if i touch all entries in
/home, i eventually ran into some resource limits. maybe there is some
tuning i can do? i could go back to /home just being full of symlinks,
but that has so many issues of it's own. i'm looking for feedback of other
users that might have servers with a thousand or more active mounts and
how that works for you. or some other good ideas of how people handle
this kind of thing.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: hung server again
2008-11-18 15:40 hung server again Joe Pruett
@ 2008-11-18 15:57 ` Ian Kent
2008-11-18 16:39 ` Joe Pruett
2008-11-18 18:31 ` Joe Pruett
1 sibling, 1 reply; 7+ messages in thread
From: Ian Kent @ 2008-11-18 15:57 UTC (permalink / raw)
To: Joe Pruett; +Cc: autofs
On Tue, 2008-11-18 at 07:40 -0800, Joe Pruett wrote:
> a few months back i had a situation where a child of automount wouldn't
> exit properly and that would cause the parent to get locked up and all
> automount would stop working. if i could get onto the system, i could
> strace the child automounts and watch them exit and eventually i'd hit the
> one that was the cause and then everything would start working again.
> there have been a number of kernel updates since and it looked like things
> were working normally again. but i was wrong, our web server locked up
> last night from the same issue again. given the length of time between
> issues and the infeasibility of debugging this on a live server i want to
> ask about my setup and see if i should change things.
How about version of autofs and kernel?
>
> i use a two level automount hierarchy. /home entries are all designed to
> be bind mounts, like:
>
> joey :/disks/hyperion.0/home/joey
>
> and then /disks is a normal automount map:
>
> hyperion.0 hyperion:/disk/0
>
> repeat this for a few thousand /home entries, but only a handful of
> entries in /disks. i have done this to avoid thousands of nfs mounts on a
> busy server (like a mail server). in the past with udp it was bad enough,
> but now with tcp based nfs, each mount uses even more resources (sockets,
> tcp buffers, etc). but given this lockup bug, i am thinking of going to a
> straight single level /home setup.
>
> i did some testing with a single level /home and if i touch all entries in
> /home, i eventually ran into some resource limits. maybe there is some
> tuning i can do? i could go back to /home just being full of symlinks,
> but that has so many issues of it's own. i'm looking for feedback of other
> users that might have servers with a thousand or more active mounts and
> how that works for you. or some other good ideas of how people handle
> this kind of thing.
>
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: hung server again
2008-11-18 15:57 ` Ian Kent
@ 2008-11-18 16:39 ` Joe Pruett
0 siblings, 0 replies; 7+ messages in thread
From: Joe Pruett @ 2008-11-18 16:39 UTC (permalink / raw)
To: Ian Kent; +Cc: autofs
>> a few months back i had a situation where a child of automount wouldn't
>> exit properly and that would cause the parent to get locked up and all
>> automount would stop working. if i could get onto the system, i could
>> strace the child automounts and watch them exit and eventually i'd hit the
>> one that was the cause and then everything would start working again.
>> there have been a number of kernel updates since and it looked like things
>> were working normally again. but i was wrong, our web server locked up
>> last night from the same issue again. given the length of time between
>> issues and the infeasibility of debugging this on a live server i want to
>> ask about my setup and see if i should change things.
>
> How about version of autofs and kernel?
sorry about that. it's clear in my mind :-).
this is on centos 4 and 5. using almost standard autofs5 (5.0.1-0.rc2.88)
on both. on centos 4 i have applied a patch from the list to correct the
problem with finding local ip information with lots of interface for bind
mounts to function correctly. standard kernel as well.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: hung server again
2008-11-18 15:40 hung server again Joe Pruett
2008-11-18 15:57 ` Ian Kent
@ 2008-11-18 18:31 ` Joe Pruett
2008-11-18 19:51 ` Jeff Moyer
1 sibling, 1 reply; 7+ messages in thread
From: Joe Pruett @ 2008-11-18 18:31 UTC (permalink / raw)
To: autofs
> i did some testing with a single level /home and if i touch all entries in
> /home, i eventually ran into some resource limits. maybe there is some
> tuning i can do? i could go back to /home just being full of symlinks, but
> that has so many issues of it's own. i'm looking for feedback of other users
> that might have servers with a thousand or more active mounts and how that
> works for you. or some other good ideas of how people handle this kind of
> thing.
it looks like centos 4 (2.6.9 based) uses one tcp connection per mount,
but centos 5 (2.6.19) uses one tcp connection per server. without delving
into the kernel source, i haven't come up with any answers from google
yet. can anyone on this list speak to this issue?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: hung server again
2008-11-18 18:31 ` Joe Pruett
@ 2008-11-18 19:51 ` Jeff Moyer
2008-11-18 20:08 ` Joe Pruett
0 siblings, 1 reply; 7+ messages in thread
From: Jeff Moyer @ 2008-11-18 19:51 UTC (permalink / raw)
To: Joe Pruett; +Cc: autofs
Joe Pruett <joey@clean.q7.com> writes:
>> i did some testing with a single level /home and if i touch all entries in
>> /home, i eventually ran into some resource limits. maybe there is some
>> tuning i can do? i could go back to /home just being full of symlinks, but
>> that has so many issues of it's own. i'm looking for feedback of other users
>> that might have servers with a thousand or more active mounts and how that
>> works for you. or some other good ideas of how people handle this kind of
>> thing.
>
> it looks like centos 4 (2.6.9 based) uses one tcp connection per mount,
> but centos 5 (2.6.19) uses one tcp connection per server. without delving
> into the kernel source, i haven't come up with any answers from google
> yet. can anyone on this list speak to this issue?
This is the much debated superblock sharing. Did you have a more
specific question, other than to verify your findings?
Cheers,
Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: hung server again
2008-11-18 19:51 ` Jeff Moyer
@ 2008-11-18 20:08 ` Joe Pruett
2008-11-18 20:17 ` Jeff Moyer
0 siblings, 1 reply; 7+ messages in thread
From: Joe Pruett @ 2008-11-18 20:08 UTC (permalink / raw)
To: Jeff Moyer; +Cc: autofs
>>> i did some testing with a single level /home and if i touch all entries in
>>> /home, i eventually ran into some resource limits. maybe there is some
>>> tuning i can do? i could go back to /home just being full of symlinks, but
>>> that has so many issues of it's own. i'm looking for feedback of other users
>>> that might have servers with a thousand or more active mounts and how that
>>> works for you. or some other good ideas of how people handle this kind of
>>> thing.
>>
>> it looks like centos 4 (2.6.9 based) uses one tcp connection per mount,
>> but centos 5 (2.6.19) uses one tcp connection per server. without delving
>> into the kernel source, i haven't come up with any answers from google
>> yet. can anyone on this list speak to this issue?
>
> This is the much debated superblock sharing. Did you have a more
> specific question, other than to verify your findings?
yes, i'm curious if my findings are right. and i guess since you
characterize it as much debated, is it now settled? i've seen some
discussions (about other oses) of making it tunable in some fashion, and i
could see it being nice to allow up to N mounts to share a single tcp
connection. but for my setup, just a single one will probably be good
enough, although with my recent nss netid discovery, a lot of my resource
issues may go away.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: hung server again
2008-11-18 20:08 ` Joe Pruett
@ 2008-11-18 20:17 ` Jeff Moyer
0 siblings, 0 replies; 7+ messages in thread
From: Jeff Moyer @ 2008-11-18 20:17 UTC (permalink / raw)
To: Joe Pruett; +Cc: autofs
Joe Pruett <joey@clean.q7.com> writes:
>>>> i did some testing with a single level /home and if i touch all entries in
>>>> /home, i eventually ran into some resource limits. maybe there is some
>>>> tuning i can do? i could go back to /home just being full of symlinks, but
>>>> that has so many issues of it's own. i'm looking for feedback of other users
>>>> that might have servers with a thousand or more active mounts and how that
>>>> works for you. or some other good ideas of how people handle this kind of
>>>> thing.
>>>
>>> it looks like centos 4 (2.6.9 based) uses one tcp connection per mount,
>>> but centos 5 (2.6.19) uses one tcp connection per server. without delving
>>> into the kernel source, i haven't come up with any answers from google
>>> yet. can anyone on this list speak to this issue?
>>
>> This is the much debated superblock sharing. Did you have a more
>> specific question, other than to verify your findings?
>
> yes, i'm curious if my findings are right. and i guess since you
> characterize it as much debated, is it now settled? i've seen some
Yes. I mention it as much debated as the initial implementation caused
some (arguably broken) configurations to change behaviour (break).
> discussions (about other oses) of making it tunable in some fashion,
> and i could see it being nice to allow up to N mounts to share a
> single tcp connection. but for my setup, just a single one will
> probably be good enough, although with my recent nss netid discovery,
> a lot of my resource issues may go away.
I don't think Linux allows tuning of this (aside from explicitly stating
that two mounts should not share a cache, which is dangerous).
Cheers,
Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-11-18 20:17 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-18 15:40 hung server again Joe Pruett
2008-11-18 15:57 ` Ian Kent
2008-11-18 16:39 ` Joe Pruett
2008-11-18 18:31 ` Joe Pruett
2008-11-18 19:51 ` Jeff Moyer
2008-11-18 20:08 ` Joe Pruett
2008-11-18 20:17 ` Jeff Moyer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.