All of lore.kernel.org
 help / color / mirror / Atom feed
* hung server again
@ 2008-11-18 15:40 Joe Pruett
  2008-11-18 15:57 ` Ian Kent
  2008-11-18 18:31 ` Joe Pruett
  0 siblings, 2 replies; 7+ messages in thread
From: Joe Pruett @ 2008-11-18 15:40 UTC (permalink / raw)
  To: autofs

a few months back i had a situation where a child of automount wouldn't 
exit properly and that would cause the parent to get locked up and all 
automount would stop working.  if i could get onto the system, i could 
strace the child automounts and watch them exit and eventually i'd hit the 
one that was the cause and then everything would start working again. 
there have been a number of kernel updates since and it looked like things 
were working normally again.  but i was wrong, our web server locked up 
last night from the same issue again.  given the length of time between 
issues and the infeasibility of debugging this on a live server i want to 
ask about my setup and see if i should change things.

i use a two level automount hierarchy.  /home entries are all designed to 
be bind mounts, like:

joey	:/disks/hyperion.0/home/joey

and then /disks is a normal automount map:

hyperion.0	hyperion:/disk/0

repeat this for a few thousand /home entries, but only a handful of 
entries in /disks.  i have done this to avoid thousands of nfs mounts on a 
busy server (like a mail server).  in the past with udp it was bad enough, 
but now with tcp based nfs, each mount uses even more resources (sockets, 
tcp buffers, etc).  but given this lockup bug, i am thinking of going to a 
straight single level /home setup.

i did some testing with a single level /home and if i touch all entries in 
/home, i eventually ran into some resource limits.  maybe there is some 
tuning i can do?  i could go back to /home just being full of symlinks, 
but that has so many issues of it's own. i'm looking for feedback of other 
users that might have servers with a thousand or more active mounts and 
how that works for you.  or some other good ideas of how people handle 
this kind of thing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hung server again
  2008-11-18 15:40 hung server again Joe Pruett
@ 2008-11-18 15:57 ` Ian Kent
  2008-11-18 16:39   ` Joe Pruett
  2008-11-18 18:31 ` Joe Pruett
  1 sibling, 1 reply; 7+ messages in thread
From: Ian Kent @ 2008-11-18 15:57 UTC (permalink / raw)
  To: Joe Pruett; +Cc: autofs

On Tue, 2008-11-18 at 07:40 -0800, Joe Pruett wrote:
> a few months back i had a situation where a child of automount wouldn't 
> exit properly and that would cause the parent to get locked up and all 
> automount would stop working.  if i could get onto the system, i could 
> strace the child automounts and watch them exit and eventually i'd hit the 
> one that was the cause and then everything would start working again. 
> there have been a number of kernel updates since and it looked like things 
> were working normally again.  but i was wrong, our web server locked up 
> last night from the same issue again.  given the length of time between 
> issues and the infeasibility of debugging this on a live server i want to 
> ask about my setup and see if i should change things.

How about version of autofs and kernel?

> 
> i use a two level automount hierarchy.  /home entries are all designed to 
> be bind mounts, like:
> 
> joey	:/disks/hyperion.0/home/joey
> 
> and then /disks is a normal automount map:
> 
> hyperion.0	hyperion:/disk/0
> 
> repeat this for a few thousand /home entries, but only a handful of 
> entries in /disks.  i have done this to avoid thousands of nfs mounts on a 
> busy server (like a mail server).  in the past with udp it was bad enough, 
> but now with tcp based nfs, each mount uses even more resources (sockets, 
> tcp buffers, etc).  but given this lockup bug, i am thinking of going to a 
> straight single level /home setup.
> 
> i did some testing with a single level /home and if i touch all entries in 
> /home, i eventually ran into some resource limits.  maybe there is some 
> tuning i can do?  i could go back to /home just being full of symlinks, 
> but that has so many issues of it's own. i'm looking for feedback of other 
> users that might have servers with a thousand or more active mounts and 
> how that works for you.  or some other good ideas of how people handle 
> this kind of thing.
> 
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hung server again
  2008-11-18 15:57 ` Ian Kent
@ 2008-11-18 16:39   ` Joe Pruett
  0 siblings, 0 replies; 7+ messages in thread
From: Joe Pruett @ 2008-11-18 16:39 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs

>> a few months back i had a situation where a child of automount wouldn't
>> exit properly and that would cause the parent to get locked up and all
>> automount would stop working.  if i could get onto the system, i could
>> strace the child automounts and watch them exit and eventually i'd hit the
>> one that was the cause and then everything would start working again.
>> there have been a number of kernel updates since and it looked like things
>> were working normally again.  but i was wrong, our web server locked up
>> last night from the same issue again.  given the length of time between
>> issues and the infeasibility of debugging this on a live server i want to
>> ask about my setup and see if i should change things.
>
> How about version of autofs and kernel?

sorry about that.  it's clear in my mind :-).

this is on centos 4 and 5.  using almost standard autofs5 (5.0.1-0.rc2.88) 
on both.  on centos 4 i have applied a patch from the list to correct the 
problem with finding local ip information with lots of interface for bind 
mounts to function correctly.  standard kernel as well.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hung server again
  2008-11-18 15:40 hung server again Joe Pruett
  2008-11-18 15:57 ` Ian Kent
@ 2008-11-18 18:31 ` Joe Pruett
  2008-11-18 19:51   ` Jeff Moyer
  1 sibling, 1 reply; 7+ messages in thread
From: Joe Pruett @ 2008-11-18 18:31 UTC (permalink / raw)
  To: autofs

> i did some testing with a single level /home and if i touch all entries in 
> /home, i eventually ran into some resource limits.  maybe there is some 
> tuning i can do?  i could go back to /home just being full of symlinks, but 
> that has so many issues of it's own. i'm looking for feedback of other users 
> that might have servers with a thousand or more active mounts and how that 
> works for you.  or some other good ideas of how people handle this kind of 
> thing.

it looks like centos 4 (2.6.9 based) uses one tcp connection per mount, 
but centos 5 (2.6.19) uses one tcp connection per server.  without delving 
into the kernel source, i haven't come up with any answers from google 
yet.  can anyone on this list speak to this issue?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hung server again
  2008-11-18 18:31 ` Joe Pruett
@ 2008-11-18 19:51   ` Jeff Moyer
  2008-11-18 20:08     ` Joe Pruett
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Moyer @ 2008-11-18 19:51 UTC (permalink / raw)
  To: Joe Pruett; +Cc: autofs

Joe Pruett <joey@clean.q7.com> writes:

>> i did some testing with a single level /home and if i touch all entries in 
>> /home, i eventually ran into some resource limits.  maybe there is some 
>> tuning i can do?  i could go back to /home just being full of symlinks, but 
>> that has so many issues of it's own. i'm looking for feedback of other users 
>> that might have servers with a thousand or more active mounts and how that 
>> works for you.  or some other good ideas of how people handle this kind of 
>> thing.
>
> it looks like centos 4 (2.6.9 based) uses one tcp connection per mount, 
> but centos 5 (2.6.19) uses one tcp connection per server.  without delving 
> into the kernel source, i haven't come up with any answers from google 
> yet.  can anyone on this list speak to this issue?

This is the much debated superblock sharing.  Did you have a more
specific question, other than to verify your findings?

Cheers,

Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hung server again
  2008-11-18 19:51   ` Jeff Moyer
@ 2008-11-18 20:08     ` Joe Pruett
  2008-11-18 20:17       ` Jeff Moyer
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Pruett @ 2008-11-18 20:08 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: autofs

>>> i did some testing with a single level /home and if i touch all entries in
>>> /home, i eventually ran into some resource limits.  maybe there is some
>>> tuning i can do?  i could go back to /home just being full of symlinks, but
>>> that has so many issues of it's own. i'm looking for feedback of other users
>>> that might have servers with a thousand or more active mounts and how that
>>> works for you.  or some other good ideas of how people handle this kind of
>>> thing.
>>
>> it looks like centos 4 (2.6.9 based) uses one tcp connection per mount,
>> but centos 5 (2.6.19) uses one tcp connection per server.  without delving
>> into the kernel source, i haven't come up with any answers from google
>> yet.  can anyone on this list speak to this issue?
>
> This is the much debated superblock sharing.  Did you have a more
> specific question, other than to verify your findings?

yes, i'm curious if my findings are right.  and i guess since you 
characterize it as much debated, is it now settled?  i've seen some 
discussions (about other oses) of making it tunable in some fashion, and i 
could see it being nice to allow up to N mounts to share a single tcp 
connection.  but for my setup, just a single one will probably be good 
enough, although with my recent nss netid discovery, a lot of my resource 
issues may go away.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hung server again
  2008-11-18 20:08     ` Joe Pruett
@ 2008-11-18 20:17       ` Jeff Moyer
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Moyer @ 2008-11-18 20:17 UTC (permalink / raw)
  To: Joe Pruett; +Cc: autofs

Joe Pruett <joey@clean.q7.com> writes:

>>>> i did some testing with a single level /home and if i touch all entries in
>>>> /home, i eventually ran into some resource limits.  maybe there is some
>>>> tuning i can do?  i could go back to /home just being full of symlinks, but
>>>> that has so many issues of it's own. i'm looking for feedback of other users
>>>> that might have servers with a thousand or more active mounts and how that
>>>> works for you.  or some other good ideas of how people handle this kind of
>>>> thing.
>>>
>>> it looks like centos 4 (2.6.9 based) uses one tcp connection per mount,
>>> but centos 5 (2.6.19) uses one tcp connection per server.  without delving
>>> into the kernel source, i haven't come up with any answers from google
>>> yet.  can anyone on this list speak to this issue?
>>
>> This is the much debated superblock sharing.  Did you have a more
>> specific question, other than to verify your findings?
>
> yes, i'm curious if my findings are right.  and i guess since you
> characterize it as much debated, is it now settled?  i've seen some

Yes.  I mention it as much debated as the initial implementation caused
some (arguably broken) configurations to change behaviour (break).

> discussions (about other oses) of making it tunable in some fashion,
> and i could see it being nice to allow up to N mounts to share a
> single tcp connection.  but for my setup, just a single one will
> probably be good enough, although with my recent nss netid discovery,
> a lot of my resource issues may go away.

I don't think Linux allows tuning of this (aside from explicitly stating
that two mounts should not share a cache, which is dangerous).

Cheers,

Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-11-18 20:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-18 15:40 hung server again Joe Pruett
2008-11-18 15:57 ` Ian Kent
2008-11-18 16:39   ` Joe Pruett
2008-11-18 18:31 ` Joe Pruett
2008-11-18 19:51   ` Jeff Moyer
2008-11-18 20:08     ` Joe Pruett
2008-11-18 20:17       ` Jeff Moyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.