2.4.17 still croaks under heavy load

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.4.17 still croaks under heavy load
@ 2001-12-28  7:06 Phil Oester
  2001-12-28 10:01 ` Matti Aarnio
  2001-12-31 10:32 ` Florian Lohoff
  0 siblings, 2 replies; 6+ messages in thread
From: Phil Oester @ 2001-12-28  7:06 UTC (permalink / raw)
  To: linux-kernel

Have a webserver running Zope (specifically the ZEO db) which dies every
few days with no messages in syslog.  Locks up so tight a powercycle is
required to recover.  System has 1gb RAM, 2xSMP, kernel configured with
4gb highmem.  

Since the kernel doesn't provide any info in syslog when it dies, I just
ran a vmstat 30 to a file and waited for the next untimely demise.
Here's what happened when it died last time.  Note the sudden surge in
disk activity (bi) 

   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy  id
 0  1  0  33312  10356   5252 696940   0   0    20     1 1649  1603   6
2  92
 0  1  0  33312  10352   5236 696564   0   0    26     3 1593  1548   3
2  94
 0  1  0  33312  10276   5236 696600   0   0     9     1 1639  1596   5
2  92
 0  2  0  33312  16932   5236 694304   0   0    11     3 1702  1709   7
4  89
 0  1  0  33312  12644   5236 698784   0   0    10     2 1560  1513   3
2  95
 0  1  0  33312  10456   5236 700600   0   0    14     3 1487  1443   6
2  93
 0  1  0  33504  10452   5236 700652   0   1     3     1 1806  1785   5
2  92
 0  1  0  33504  10468   5236 699992   0   0     5     3 2118  2116   6
3  91
 0  1  0  33692  10484   5232 699312   0   5     1     8 3146  3215   7
5  88
 1  0  0  33692  10544   5232 698832   0   1     0     3 3377  3457  10
5  85
 1  0  0  33692  10468   5232 697804   0   3     2     3 3636  3721   8
5  87
 1  0  0  33692  10420   5232 697876   0   2     2     4 1662  1609  35
3  63
 1  0  0  33692   9540   5232 698940   0   0     7     1  752   624  46
2  52
 1  0  0  33692   9592   5232 698900   0   1     6     4  397   372  50
1  49
 1  0  0  33692   9504   5232 698980   0   0     2     1  136   284  49
1  49
 1  0  0  33692   9492   5292 698992   0   3   741     4  215   467  50
1  49
 1  0  0  34000  12624   5296 695936   0   6   547    10  236   408  49
1  49
 1  0  0  34000  21912   5300 678984   1   0   499     6 1992  2112  55
7  38
 1  0  0  34000   9976   5300 693104   0   0   517    13  320   413  49
1  49
 1  0  0  34000  11916   5300 691128   1   0   561     3  289   413  53
1  46
 1  0  0  34000  10172   5296 692100   0   0   497     5  288   374  49
1  49
 1  0  0  34000  22012   5296 680216   0   0   556     1  309   421  50
1  49
 1  0  0  34000   9544   5296 692804   0   0   584     3  306   433  50
1  49
 1  0  0  34000  10816   5296 696748   0   0   469     1  414   522  51
3  46
<death>

I'd be more than willing to collect any other data required here, just
let me know what would be of assistance.  Note though that I only have
remote access to this box, so getting magic sysrq info could be
difficult/impossible (tho I do have console access if that helps).

Thanks,

Phil Oester


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.17 still croaks under heavy load
  2001-12-28  7:06 2.4.17 still croaks under heavy load Phil Oester
@ 2001-12-28 10:01 ` Matti Aarnio
  2001-12-28 17:16   ` Phil Oester
  2001-12-31 10:32 ` Florian Lohoff
  1 sibling, 1 reply; 6+ messages in thread
From: Matti Aarnio @ 2001-12-28 10:01 UTC (permalink / raw)
  To: Phil Oester; +Cc: linux-kernel

On Thu, Dec 27, 2001 at 11:06:50PM -0800, Phil Oester wrote:
> Have a webserver running Zope (specifically the ZEO db) which dies every
> few days with no messages in syslog.  Locks up so tight a powercycle is
> required to recover.  System has 1gb RAM, 2xSMP, kernel configured with
> 4gb highmem.  

  Do you have RAID1 on the disks ?
  Apparently "noapic" option helps, e.g. breaking the SYMMETRIC part of SMP.
  You may also try "nmi_watchdog=1", if you have serial console attached
  to the box for kernel message logging (and command).

> Since the kernel doesn't provide any info in syslog when it dies, I just
> ran a vmstat 30 to a file and waited for the next untimely demise.
> Here's what happened when it died last time.  Note the sudden surge in
> disk activity (bi) 

   Yes, looks familiar.  My hangups have been during high disc activity too.
   My box is located into a place into which I have difficult access, e.g.
   I can't use it to collect the debug data, and do magics (press reset)
   to recover.

> I'd be more than willing to collect any other data required here, just
> let me know what would be of assistance.  Note though that I only have
> remote access to this box, so getting magic sysrq info could be
> difficult/impossible (tho I do have console access if that helps).
> 
> Thanks,
> 
> Phil Oester

/Matti Aarnio

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: 2.4.17 still croaks under heavy load
  2001-12-28 10:01 ` Matti Aarnio
@ 2001-12-28 17:16   ` Phil Oester
  0 siblings, 0 replies; 6+ messages in thread
From: Phil Oester @ 2001-12-28 17:16 UTC (permalink / raw)
  To: linux-kernel

No RAID1 on disks.

Here's /proc/meminfo within 1 minute of the box dying last night:

        total:    used:    free:  shared: buffers:  cached:
Mem:  1054371840 1044684800  9687040        0  7802880 834752512
Swap: 535797760  7626752 528171008
MemTotal:      1029660 kB
MemFree:          9460 kB
MemShared:           0 kB
Buffers:          7620 kB
Cached:         811872 kB
SwapCached:       3316 kB
Active:         231880 kB
Inactive:       747344 kB
HighTotal:      131072 kB
HighFree:         1028 kB  <---------  See comment below
LowTotal:       898588 kB
LowFree:          8432 kB
SwapTotal:      523240 kB
SwapFree:       515792 kB

The HighFree value was at 2044 for the prior hour.  It went to 1028
within 1 minute of the box freezing.  Out of HighMem???

Here's vmstat within 30 seconds of freezing:

   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy  id
 1  0  0   7448   9536   7616 812092   0   0   932     2  235   577  50
1  49

Seems VM related.

-Phil

-----Original Message-----
From: Matti Aarnio [mailto:matti.aarnio@zmailer.org] 
Sent: Friday, December 28, 2001 2:02 AM
To: Phil Oester
Cc: linux-kernel@vger.kernel.org
Subject: Re: 2.4.17 still croaks under heavy load


On Thu, Dec 27, 2001 at 11:06:50PM -0800, Phil Oester wrote:
> Have a webserver running Zope (specifically the ZEO db) which dies
every
> few days with no messages in syslog.  Locks up so tight a powercycle
is
> required to recover.  System has 1gb RAM, 2xSMP, kernel configured
with
> 4gb highmem.  

  Do you have RAID1 on the disks ?
  Apparently "noapic" option helps, e.g. breaking the SYMMETRIC part of
SMP.
  You may also try "nmi_watchdog=1", if you have serial console attached
  to the box for kernel message logging (and command).

> Since the kernel doesn't provide any info in syslog when it dies, I
just
> ran a vmstat 30 to a file and waited for the next untimely demise.
> Here's what happened when it died last time.  Note the sudden surge in
> disk activity (bi) 

   Yes, looks familiar.  My hangups have been during high disc activity
too.
   My box is located into a place into which I have difficult access,
e.g.
   I can't use it to collect the debug data, and do magics (press reset)
   to recover.

> I'd be more than willing to collect any other data required here, just
> let me know what would be of assistance.  Note though that I only have
> remote access to this box, so getting magic sysrq info could be
> difficult/impossible (tho I do have console access if that helps).
> 
> Thanks,
> 
> Phil Oester

/Matti Aarnio



^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: 2.4.17 still croaks under heavy load
@ 2001-12-29  2:20 Dieter Nützel
  2001-12-29  2:57 ` Alan Cox
  0 siblings, 1 reply; 6+ messages in thread
From: Dieter Nützel @ 2001-12-29  2:20 UTC (permalink / raw)
  To: Phil Oester; +Cc: Matti Aarnio, Andrea Arcangeli, Linux Kernel List

Phil Oester worte:
>
> No RAID1 on disks.
>
> Here's /proc/meminfo within 1 minute of the box dying last night:
>
>         total:    used:    free:  shared: buffers:  cached:
> Mem:  1054371840 1044684800  9687040        0  7802880 834752512
> Swap: 535797760  7626752 528171008
> MemTotal:      1029660 kB
> MemFree:          9460 kB
> MemShared:           0 kB
> Buffers:          7620 kB
> Cached:         811872 kB
> SwapCached:       3316 kB
> Active:         231880 kB
> Inactive:       747344 kB
> HighTotal:      131072 kB
> HighFree:         1028 kB  <---------  See comment below
> LowTotal:       898588 kB
> LowFree:          8432 kB
> SwapTotal:      523240 kB
> SwapFree:       515792 kB
>
> The HighFree value was at 2044 for the prior hour.  It went to 1028
> within 1 minute of the box freezing.  Out of HighMem???
>
> Here's vmstat within 30 seconds of freezing:
>
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
> sy  id
>  1  0  0   7448   9536   7616 812092   0   0   932     2  235   577  50
> 1  49
>
> Seems VM related.

Hello Phil,

can you please try Andrea Arcangeli's 10_vm-21?
ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.17rc2aa2/10_vm-21

I think we need more "pressure" to get these "fixes" into 2.4.18...

Regards,
	Dieter
-- 
Dieter Nützel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
@home: Dieter.Nuetzel@hamburg.de


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.17 still croaks under heavy load
  2001-12-29  2:20 Dieter Nützel
@ 2001-12-29  2:57 ` Alan Cox
  0 siblings, 0 replies; 6+ messages in thread
From: Alan Cox @ 2001-12-29  2:57 UTC (permalink / raw)
  To: Dieter Nützel
  Cc: Phil Oester, Matti Aarnio, Andrea Arcangeli, Linux Kernel List

> ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.17=
> rc2aa2/10_vm-21
> 
> I think we need more "pressure" to get these "fixes" into 2.4.18...

Andrea has been asked several times to feed the patches into the tree in
small managable chunks each explained. I don't think its pressure to get
them in you have to worry about 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.17 still croaks under heavy load
  2001-12-28  7:06 2.4.17 still croaks under heavy load Phil Oester
  2001-12-28 10:01 ` Matti Aarnio
@ 2001-12-31 10:32 ` Florian Lohoff
  1 sibling, 0 replies; 6+ messages in thread
From: Florian Lohoff @ 2001-12-31 10:32 UTC (permalink / raw)
  To: Phil Oester; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1633 bytes --]

On Thu, Dec 27, 2001 at 11:06:50PM -0800, Phil Oester wrote:
> Have a webserver running Zope (specifically the ZEO db) which dies every
> few days with no messages in syslog.  Locks up so tight a powercycle is
> required to recover.  System has 1gb RAM, 2xSMP, kernel configured with
> 4gb highmem.  
> 
> Since the kernel doesn't provide any info in syslog when it dies, I just
> ran a vmstat 30 to a file and waited for the next untimely demise.
> Here's what happened when it died last time.  Note the sudden surge in
> disk activity (bi) 

I am seeing the same kind of deaths on multiple very different SMP boxes
since 2.2 days. They do not die in the "high load" case but only the
high load boxes are unstable. I am having on "testcase" where the box
crashes at least every 24 hours (mutella). Boxes i have seen this
happening on are

Box1:
Dual Celeron 400
IDE Raid 
SCSI System disk
1GB Ram (No Highmem) (Used to have 512M)
EEPro 100

Box2:
Dual PIII 1Ghz
Serverworks Board
1GB Ram (No Highmem)
ICP Vortex Raid
EEPro 100

I have 3 machines of the exakt same type of the latter type. All are
unstable and tend to crash depending on application every 24 hours to
every 2-3 Weeks.

No notice in the syslog, nothing on the serial console. There are
completly dead without any sign before. I have tried to capture
informations about processes, swap, memory etc - Within 1 minute
prior to crash the boxes are basically idle.

Flo
-- 
Florian Lohoff                  flo@rfc822.org             +49-5201-669912
Nine nineth on september the 9th              Welcome to the new billenium

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-12-31 10:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-28  7:06 2.4.17 still croaks under heavy load Phil Oester
2001-12-28 10:01 ` Matti Aarnio
2001-12-28 17:16   ` Phil Oester
2001-12-31 10:32 ` Florian Lohoff
  -- strict thread matches above, loose matches on Subject: below --
2001-12-29  2:20 Dieter Nützel
2001-12-29  2:57 ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox