public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* server migration
@ 2004-03-05 18:13 Lawrence Walton
  2004-03-05 18:21 ` Danny ter Haar
  2004-03-06 23:33 ` Denis Vlasenko
  0 siblings, 2 replies; 7+ messages in thread
From: Lawrence Walton @ 2004-03-05 18:13 UTC (permalink / raw)
  To: linux-kernel

Hi all! 

I tried about four months ago to migrate a busy server to 2.6.0-test9,
and failed miserably. Lightly loaded it worked well but as the number
of users increased, the number of processes in uninterruptible sleep
increased to the hundreds and then the server fell on it's face. I never
found out exactly why or what processes where hanging if I guessed it
would be openldap.

I'd like to take another shot at it with 2.6.3, I'd also like to get
some hints on how better to debug the problem; remember it is a live
server with live users, I can't spend much time before rebooting back to
a 2.4 kernel and yes 2.4.25 runs fine.

Things that are non-standard

Lots of open files, it's not unusual to have 50000 open files.
ext3 is mounted noatime,data=writeback on /home and /var 
Total processes are usually around 300 to 350.

Main applications are:

imap, exim and openldap running on Debian.


Questions, comments, flames are welcome.



-- 
*--* Mail: lawrence@otak.com
*--* Voice: 425.739.4247
*--* Fax: 425.827.9577
*--* HTTP://the-penguin.otak.com/~lawrence
--------------------------------------
- - - - - - O t a k  i n c . - - - - - 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: server migration
  2004-03-05 18:13 server migration Lawrence Walton
@ 2004-03-05 18:21 ` Danny ter Haar
  2004-03-08 21:50   ` Mike Fedyk
  2004-03-06 23:33 ` Denis Vlasenko
  1 sibling, 1 reply; 7+ messages in thread
From: Danny ter Haar @ 2004-03-05 18:21 UTC (permalink / raw)
  To: linux-kernel

Lawrence Walton  <lawrence@the-penguin.otak.com> wrote:
>I'd like to take another shot at it with 2.6.3,

Don't!

<personal experience, ymmv!>
Problems after sync, difficulties in the blocklayer/queuing/plugging.
Our newsgateway has gone back to 2.6.0-test11 since that's the
only one that seems to survive "hard-work".

2.6.4-rc1(-mm1) crashed hard on me, doing single-user stuff.
_i_ would wait a while if i were in your position.

Danny
-- 
 /"\                        | Dying is to be avoided because
 \ /  ASCII RIBBON CAMPAIGN | it can ruin your whole career 
  X   against HTML MAIL     | 
 / \  and POSTINGS          | - Bob Hope


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: server migration
  2004-03-05 18:13 server migration Lawrence Walton
  2004-03-05 18:21 ` Danny ter Haar
@ 2004-03-06 23:33 ` Denis Vlasenko
  2004-03-07  1:35   ` Lawrence Walton
  1 sibling, 1 reply; 7+ messages in thread
From: Denis Vlasenko @ 2004-03-06 23:33 UTC (permalink / raw)
  To: Lawrence Walton, linux-kernel

On Friday 05 March 2004 20:13, Lawrence Walton wrote:
> Hi all!
>
> I tried about four months ago to migrate a busy server to 2.6.0-test9,
> and failed miserably. Lightly loaded it worked well but as the number
> of users increased, the number of processes in uninterruptible sleep
> increased to the hundreds and then the server fell on it's face. I never
> found out exactly why or what processes where hanging if I guessed it
> would be openldap.

Why do you guess? Determine what processes are stuck.

> I'd like to take another shot at it with 2.6.3, I'd also like to get
> some hints on how better to debug the problem; remember it is a live
> server with live users, I can't spend much time before rebooting back to
> a 2.4 kernel and yes 2.4.25 runs fine.
>
> Things that are non-standard
>
> Lots of open files, it's not unusual to have 50000 open files.
> ext3 is mounted noatime,data=writeback on /home and /var
> Total processes are usually around 300 to 350.
>
> Main applications are:
>
> imap, exim and openldap running on Debian.
>
> Questions, comments, flames are welcome.

Compile with stack pointers, capture SysRq-T, post stack traces
of D processes to lkml.
--
vda



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: server migration
  2004-03-06 23:33 ` Denis Vlasenko
@ 2004-03-07  1:35   ` Lawrence Walton
  2004-03-07 10:31     ` Denis Vlasenko
  0 siblings, 1 reply; 7+ messages in thread
From: Lawrence Walton @ 2004-03-07  1:35 UTC (permalink / raw)
  To: linux-kernel

Denis Vlasenko [vda@port.imtp.ilyichevsk.odessa.ua] wrote:
> On Friday 05 March 2004 20:13, Lawrence Walton wrote:
> > Hi all!
> >
> > I tried about four months ago to migrate a busy server to 2.6.0-test9,
> > and failed miserably. Lightly loaded it worked well but as the number
> > of users increased, the number of processes in uninterruptible sleep
> > increased to the hundreds and then the server fell on it's face. I never
> > found out exactly why or what processes where hanging if I guessed it
> > would be openldap.
> 
> Why do you guess? Determine what processes are stuck.
> 
Because I did not expect it to happen, I had lots of users screaming at
me to fix it now, when it did happen. The server had been up sense the
night before. It was not until users started showing up in the morning
that the problem manifested itself.

The point is I was hoping to get a list of things to try to capture in
case it happened again, testing is all well and good, but getting
information from a production box can be valuable, as long as it's not
some odd corner case.

Capturing SysRq-T was on my list to do.
I'll investigate stack pointers, and If I can post stack traces.

I was hoping to get pointers like below before I tried it again.

<snip>
> Compile with stack pointers, capture SysRq-T, post stack traces
> of D processes to lkml.
> --
> vda
> 

-- 
*--* Mail: lawrence@otak.com
*--* Voice: 425.739.4247
*--* Fax: 425.827.9577
*--* HTTP://the-penguin.otak.com/~lawrence
--------------------------------------
- - - - - - O t a k  i n c . - - - - - 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: server migration
  2004-03-07  1:35   ` Lawrence Walton
@ 2004-03-07 10:31     ` Denis Vlasenko
  2004-03-07 12:21       ` Michael Frank
  0 siblings, 1 reply; 7+ messages in thread
From: Denis Vlasenko @ 2004-03-07 10:31 UTC (permalink / raw)
  To: Lawrence Walton, linux-kernel

On Sunday 07 March 2004 03:35, Lawrence Walton wrote:
> Denis Vlasenko [vda@port.imtp.ilyichevsk.odessa.ua] wrote:
> > On Friday 05 March 2004 20:13, Lawrence Walton wrote:
> > > Hi all!
> > >
> > > I tried about four months ago to migrate a busy server to 2.6.0-test9,
> > > and failed miserably. Lightly loaded it worked well but as the number
> > > of users increased, the number of processes in uninterruptible sleep
> > > increased to the hundreds and then the server fell on it's face. I
> > > never found out exactly why or what processes where hanging if I
> > > guessed it would be openldap.
> >
> > Why do you guess? Determine what processes are stuck.
>
> Because I did not expect it to happen, I had lots of users screaming at
> me to fix it now, when it did happen. The server had been up sense the
> night before. It was not until users started showing up in the morning
> that the problem manifested itself.
>
> The point is I was hoping to get a list of things to try to capture in
> case it happened again, testing is all well and good, but getting
> information from a production box can be valuable, as long as it's not
> some odd corner case.
>
> Capturing SysRq-T was on my list to do.
> I'll investigate stack pointers, and If I can post stack traces.

Well. That's easy. Just press SysRq-T and look into syslog.
--
vda


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: server migration
  2004-03-07 10:31     ` Denis Vlasenko
@ 2004-03-07 12:21       ` Michael Frank
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Frank @ 2004-03-07 12:21 UTC (permalink / raw)
  To: Lawrence Walton, linux-kernel

>> > On Friday 05 March 2004 20:13, Lawrence Walton wrote:
>> > > Hi all!
>> > >
>> > > I tried about four months ago to migrate a busy server to 2.6.0-test9,
>> > > and failed miserably. Lightly loaded it worked well but as the number
>> > > of users increased, the number of processes in uninterruptible sleep
>> > > increased to the hundreds and then the server fell on it's face. I
>> > > never found out exactly why or what processes where hanging if I
>> > > guessed it would be openldap.

-Test9 was the "oddest" kernel I ever ran (since 2.2.x) - even got it
repeatably to hardlock lock by loading it a bit with dd ;)

Since then, Nick Pigin has put a hell of an effort into the
anticipatory scheduler and much else all over has been refined too.

I have done a bit of stress testing of io, network and cpu and
IMO, 2.6.3 will perform nicely in a server environment and there
will be no significant problems.

Input from production use is essential though and it would be much
appreciated if you would go for it :)

Regards
Michael




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: server migration
  2004-03-05 18:21 ` Danny ter Haar
@ 2004-03-08 21:50   ` Mike Fedyk
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Fedyk @ 2004-03-08 21:50 UTC (permalink / raw)
  To: Danny ter Haar; +Cc: linux-kernel

Danny ter Haar wrote:
> Lawrence Walton  <lawrence@the-penguin.otak.com> wrote:
> 
>>I'd like to take another shot at it with 2.6.3,
> 
> 
> Don't!
> 
> <personal experience, ymmv!>
> Problems after sync, difficulties in the blocklayer/queuing/plugging.
> Our newsgateway has gone back to 2.6.0-test11 since that's the
> only one that seems to survive "hard-work".
> 
> 2.6.4-rc1(-mm1) crashed hard on me, doing single-user stuff.
> _i_ would wait a while if i were in your position.

I have everything except for my GW/Firewall running 2.6.3 + two NFS 
patches and everything is working great.

Maybe you should find out which driver is giving you trouble, and help 
debug that.

Did you enable the NMI watchdog?
What about sysrq, did that still respond during your "hang"?

Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-03-08 21:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-05 18:13 server migration Lawrence Walton
2004-03-05 18:21 ` Danny ter Haar
2004-03-08 21:50   ` Mike Fedyk
2004-03-06 23:33 ` Denis Vlasenko
2004-03-07  1:35   ` Lawrence Walton
2004-03-07 10:31     ` Denis Vlasenko
2004-03-07 12:21       ` Michael Frank

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox