public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.4.20, .text.lock.swap cpu usage? (ibm x440)
@ 2003-01-06 23:35 Chris Wood
  2003-01-06 23:50 ` William Lee Irwin III
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Chris Wood @ 2003-01-06 23:35 UTC (permalink / raw)
  To: linux-kernel

Due to kswapd problems in Redhat's 2.4.9 kernel, I have had to upgrade 
to the 2.4.20 kernel with the IBM Summit Patches for our IBM x440.  It 
has run very well with one exception, between 8:00am and 9:00am our 
server will see a cpu usage hit under the system resources (in top) and 
start to drag the server to a very slow situation where people can't 
access the server.

See the following jpg of top as an example of the system usage.  It 
doesn't seem to be any one program.

http://www.wencor.com/slow2.4.20.jpg

When we start to have users log off the server (we have 300 telnet users 
that login) the system usually bounces right back to normal.  We have 
had to reboot once or twice to get it fully working again (lpd went into 
limbo and wouldn't come back).  After the server bounces back to normal, 
we can run the rest of the day without any trouble and under full heavy 
load.  I have never seen it happen at any other time of day and it 
doesn't happen every day.

With some tips from James Cleverdon (IBM), I turned on some kernel 
debugging and got the following from readprofile when the server was 
having problems (truncated to the first 22 lines):
16480 total                                      0.0138
   6383 .text.lock.swap                          110.0517
   4689 .text.lock.vmscan                         28.2470
   4486 shrink_cache                               4.6729
    168 rw_swap_page_base                          0.6176
    124 prune_icache                               0.5167
     81 statm_pgd_range                            0.1534
     51 .text.lock.inode                           0.0966
     38 system_call                                0.6786
     31 .text.lock.tty_io                          0.0951
     31 .text.lock.locks                           0.1435
     18 .text.lock.sched                           0.0373
     16 _stext                                     0.2000
     15 fput                                       0.0586
     11 .text.lock.read_write                      0.0924
      9 strnicmp                                   0.0703
      9 do_wp_page                                 0.0110
      9 do_page_fault                              0.0066
      9 .text.lock.namei                           0.0073
      9 .text.lock.fcntl                           0.0714
      8 sys_read                                   0.0294

Here is a snapshot when the server is fine, no problems (truncated):
1715833 total                                      1.4317
1677712 default_idle                             26214.2500
   4355 system_call                               77.7679
   2654 file_read_actor                           11.0583
   2159 bounce_end_io_read                         5.8668
   1752 put_filp                                  18.2500
   1664 do_page_fault                              1.2137
   1294 fget                                      20.2188
   1246 do_wp_page                                 1.5270
   1233 fput                                       4.8164
   1138 posix_lock_file                            0.7903
   1120 kmem_cache_alloc                           3.6842
   1098 do_softirq                                 4.9018
   1042 statm_pgd_range                            1.9735
    882 kfree                                      6.1250
    732 __loop_delay                              15.2500
    673 flush_tlb_mm                               6.0089
    610 fcntl_setlk64                              1.3616
    554 __kill_fasync                              4.9464
    498 zap_page_range                             0.4716
    414 do_generic_file_read                       0.3696
    409 __free_pages                               8.5208
    401 sys_semop                                  0.3530

I have to admit that most of this doesn't make a lot of sense to me and 
I don't know what the .text.lock.* processes are doing.  Any ideas? 
Anything I can try?

Chris Wood
Wencor West, Inc.

-----------------------------------
System Info From Here Down:
IBM x440 - Dual Xeon 1.4ghz MP, with Hyperthreading turned on
6 gig RAM
2 internal 36gig drives mirrored
1 additional intel e1000 network card
2 IBM fibre adapters (QLA2300s) connected to a FastT700 SAN
RedHat Advanced Server 2.1
2.4.20 kernel built using the RH 2.4.9e8summit .config file as template

These things are listed below (hopefully this isn't overkill):
x440:/proc$ cat modules (see results below)
x440:/proc$ cat scsi/scsi (see results below)
x440:/proc$ cat cpuinfo (see results below)
x440:/proc$ cat ioports (see results below)
x440:/proc$ cat iomem (see results below)

x440:/proc$ cat modules
autofs                 11876   0 (autoclean) (unused)
e1000                  59280   1
bcm5700                95076   1
ipchains               50728  28
usb-uhci               26724   0 (unused)
usbcore                76448   1 [usb-uhci]
ext3                   69888   7
jbd                    51808   7 [ext3]
qla2300               236608   2
ips                    45184   6
aic7xxx               133376   0
sd_mod                 13020  16
scsi_mod              121304   4 [qla2300 ips aic7xxx sd_mod]

x440:/proc$ cat scsi/scsi
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
   Vendor: IBM      Model: SERVERAID        Rev: 1.00
   Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 15 Lun: 00
   Vendor: IBM      Model: SERVERAID        Rev: 1.00
   Type:   Processor                        ANSI SCSI revision: 02
Host: scsi2 Channel: 01 Id: 09 Lun: 00
   Vendor: IBM      Model: GNHv1 S2         Rev: 0
   Type:   Processor                        ANSI SCSI revision: 02
Host: scsi3 Channel: 00 Id: 00 Lun: 00
   Vendor: IBM      Model: 1742             Rev: 0520
   Type:   Direct-Access                    ANSI SCSI revision: 03


x440:/proc$ cat cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Xeon(TM) CPU 1.40GHz
stepping        : 1
cpu MHz         : 1397.190
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 2785.28

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Xeon(TM) CPU 1.40GHz
stepping        : 1
cpu MHz         : 1397.190
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 2791.83

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Xeon(TM) CPU 1.40GHz
stepping        : 1
cpu MHz         : 1397.190
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 2791.83

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Xeon(TM) CPU 1.40GHz
stepping        : 1
cpu MHz         : 1397.190
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 2791.83

x440:/proc$ cat ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
03c0-03df : vga+
03f6-03f6 : ide0
0440-044f : VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
0700-070f : VIA Technologies, Inc. VT82C586B PIPC Bus Master IDE
   0700-0707 : ide0
   0708-070f : ide1
0cf8-0cff : PCI conf1
1800-187f : PCI device 1014:010f (IBM)
1880-189f : VIA Technologies, Inc. USB
   1880-189f : usb-uhci
18a0-18bf : VIA Technologies, Inc. USB (#2)
   18a0-18bf : usb-uhci
2000-20ff : Adaptec AIC-7899P U160/m
2100-21ff : Adaptec AIC-7899P U160/m (#2)
2800-28ff : QLogic Corp. QLA2300 64-bit FC-AL Adapter
   2800-28fe : qla2300
4000-40ff : QLogic Corp. QLA2300 64-bit FC-AL Adapter (#2)
   4000-40fe : qla2300
7000-701f : Intel Corp. 82544EI Gigabit Ethernet Controller
   7000-701f : e1000

x440:/proc$ cat iomem
00000000-0009c7ff : System RAM
0009c800-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000c8000-000cffff : Extension ROM
000d0000-000d01ff : Extension ROM
000f0000-000fffff : System ROM
00100000-dffb707f : System RAM
   00100000-0022b615 : Kernel code
   0022b616-002a525f : Kernel data
dffb7080-dffbf7ff : ACPI Tables
dffbf800-dfffffff : reserved
e0000000-e7ffffff : S3 Inc. Savage 4
e8400000-e8401fff : IBM Netfinity ServeRAID controller
   e8400000-e8401fff : ips
f0c20000-f0c3ffff : Intel Corp. 82544EI Gigabit Ethernet Controller
   f0c20000-f0c3ffff : e1000
f0c40000-f0c5ffff : Intel Corp. 82544EI Gigabit Ethernet Controller
   f0c40000-f0c5ffff : e1000
f1000000-f11fffff : PCI device 1014:010f (IBM)
f1200000-f127ffff : S3 Inc. Savage 4
f1600000-f160ffff : Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet
   f1600000-f160ffff : bcm5700
f1610000-f1610fff : Adaptec AIC-7899P U160/m
   f1610000-f1610fff : aic7xxx
f1611000-f1611fff : Adaptec AIC-7899P U160/m (#2)
   f1611000-f1611fff : aic7xxx
f1820000-f1820fff : QLogic Corp. QLA2300 64-bit FC-AL Adapter
f1920000-f1920fff : QLogic Corp. QLA2300 64-bit FC-AL Adapter (#2)
fec00000-ffffffff : reserved



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-06 23:35 Chris Wood
@ 2003-01-06 23:50 ` William Lee Irwin III
  2003-01-06 23:52 ` Andrew Morton
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: William Lee Irwin III @ 2003-01-06 23:50 UTC (permalink / raw)
  To: Chris Wood; +Cc: linux-kernel

On Mon, Jan 06, 2003 at 04:35:17PM -0700, Chris Wood wrote:
>   6383 .text.lock.swap                          110.0517
>   4689 .text.lock.vmscan                         28.2470
>   4486 shrink_cache                               4.6729
>    168 rw_swap_page_base                          0.6176
>    124 prune_icache                               0.5167
>     81 statm_pgd_range                            0.1534
>     51 .text.lock.inode                           0.0966
>     38 system_call                                0.6786
>     31 .text.lock.tty_io                          0.0951
>     31 .text.lock.locks                           0.1435
>     18 .text.lock.sched                           0.0373
>     16 _stext                                     0.2000
>     15 fput                                       0.0586
>     11 .text.lock.read_write                      0.0924
>      9 strnicmp                                   0.0703
>      9 do_wp_page                                 0.0110
>      9 do_page_fault                              0.0066
>      9 .text.lock.namei                           0.0073
>      9 .text.lock.fcntl                           0.0714
>      8 sys_read                                   0.0294

This is really bad lock contention. You may need 2.5.x.


Bill

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-06 23:35 Chris Wood
  2003-01-06 23:50 ` William Lee Irwin III
@ 2003-01-06 23:52 ` Andrew Morton
  2003-01-09 17:17   ` Chris Wood
  2003-01-09  2:20 ` James Cleverdon
  2003-01-09  2:57 ` William Lee Irwin III
  3 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2003-01-06 23:52 UTC (permalink / raw)
  To: Chris Wood; +Cc: linux-kernel

Chris Wood wrote:
> 
> Due to kswapd problems in Redhat's 2.4.9 kernel, I have had to upgrade
> to the 2.4.20 kernel with the IBM Summit Patches for our IBM x440.
> ...
> 16480 total                                      0.0138
>    6383 .text.lock.swap                          110.0517
>    4689 .text.lock.vmscan                         28.2470
>    4486 shrink_cache                               4.6729
>     168 rw_swap_page_base                          0.6176
>     124 prune_icache                               0.5167

With six gigs of memory, it looks like the VM has gone nuts
trying to locate some reclaimable lowmem.

Suggest you send the contents of /proc/meminfo and /proc/slabinfo,
captured during a period of misbehaviour.

Then please apply 
http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1.bz2
and send a report on the outcome.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-06 23:35 Chris Wood
  2003-01-06 23:50 ` William Lee Irwin III
  2003-01-06 23:52 ` Andrew Morton
@ 2003-01-09  2:20 ` James Cleverdon
  2003-01-09  2:57 ` William Lee Irwin III
  3 siblings, 0 replies; 21+ messages in thread
From: James Cleverdon @ 2003-01-09  2:20 UTC (permalink / raw)
  To: Chris Wood, linux-kernel; +Cc: Andrea Arcangeli, Andrew Morton

On Monday 06 January 2003 03:35 pm, Chris Wood wrote:
> Due to kswapd problems in Redhat's 2.4.9 kernel, I have had to upgrade
> to the 2.4.20 kernel with the IBM Summit Patches for our IBM x440.  It
> has run very well with one exception, between 8:00am and 9:00am our
> server will see a cpu usage hit under the system resources (in top) and
> start to drag the server to a very slow situation where people can't
> access the server.
>
> See the following jpg of top as an example of the system usage.  It
> doesn't seem to be any one program.
>
> http://www.wencor.com/slow2.4.20.jpg
>
> When we start to have users log off the server (we have 300 telnet users
> that login) the system usually bounces right back to normal.  We have
> had to reboot once or twice to get it fully working again (lpd went into
> limbo and wouldn't come back).  After the server bounces back to normal,
> we can run the rest of the day without any trouble and under full heavy
> load.  I have never seen it happen at any other time of day and it
> doesn't happen every day.
>
> With some tips from James Cleverdon (IBM), I turned on some kernel
> debugging and got the following from readprofile when the server was
> having problems (truncated to the first 22 lines):
> 16480 total                                      0.0138
>    6383 .text.lock.swap                          110.0517
>    4689 .text.lock.vmscan                         28.2470
>    4486 shrink_cache                               4.6729
>     168 rw_swap_page_base                          0.6176
>     124 prune_icache                               0.5167
>      81 statm_pgd_range                            0.1534
>      51 .text.lock.inode                           0.0966
>      38 system_call                                0.6786
>      31 .text.lock.tty_io                          0.0951
>      31 .text.lock.locks                           0.1435
>      18 .text.lock.sched                           0.0373
>      16 _stext                                     0.2000
>      15 fput                                       0.0586
>      11 .text.lock.read_write                      0.0924
>       9 strnicmp                                   0.0703
>       9 do_wp_page                                 0.0110
>       9 do_page_fault                              0.0066
>       9 .text.lock.namei                           0.0073
>       9 .text.lock.fcntl                           0.0714
>       8 sys_read                                   0.0294
>
> Here is a snapshot when the server is fine, no problems (truncated):
> 1715833 total                                      1.4317
> 1677712 default_idle                             26214.2500
>    4355 system_call                               77.7679
>    2654 file_read_actor                           11.0583
>    2159 bounce_end_io_read                         5.8668
>    1752 put_filp                                  18.2500
>    1664 do_page_fault                              1.2137
>    1294 fget                                      20.2188
>    1246 do_wp_page                                 1.5270
>    1233 fput                                       4.8164
>    1138 posix_lock_file                            0.7903
>    1120 kmem_cache_alloc                           3.6842
>    1098 do_softirq                                 4.9018
>    1042 statm_pgd_range                            1.9735
>     882 kfree                                      6.1250
>     732 __loop_delay                              15.2500
>     673 flush_tlb_mm                               6.0089
>     610 fcntl_setlk64                              1.3616
>     554 __kill_fasync                              4.9464
>     498 zap_page_range                             0.4716
>     414 do_generic_file_read                       0.3696
>     409 __free_pages                               8.5208
>     401 sys_semop                                  0.3530
>
> I have to admit that most of this doesn't make a lot of sense to me and
> I don't know what the .text.lock.* processes are doing.  Any ideas?
> Anything I can try?
>
> Chris Wood
> Wencor West, Inc.

Chris,

You're showing all the signs of the "kswapd" bug present in v2.4 kernels.  
Well, kswapd gets blamed for the problem.  It is actually caused by using up 
nearly all of low memory with the buffer header and/or inode slab caches.  
(Cat /proc/slabinfo when kswapd is running >= 99% and see if those two caches 
have grown extra large.)  Anyway, kswapd gets triggered because a zone has 
hit its low memory threshold.  But kswapd can't swap buffer headers or 
inodes.  The situation is hopeless, yet kswapd presses on anyway, scouring 
every memory zone for pages to free, all the while holding important memory 
locks.

Meanwhile, every program that wants more memory will spin on those locks.  
That's what the .text.lock.* entries are:  the out-of-line spin code for each 
lock; it is used when the lock is already owned by some other CPU.

Net result:  a computer that runs like molasses in January.

Of the several proposed patches for this bug, Andrea Archangeli's and Andrew 
Morton's worked best in our tests.  I believe that Andrea was going to add in 
some of Andrew's code for the final fix.  The kernel that is on the SLES 8 / 
UL 1.0 gold CDs works fine so I assume the Vulcan Mind Meld on the patches 
went well.

Unfortunately, I don't have any references to the final patch set.

> -----------------------------------
> System Info From Here Down:
> IBM x440 - Dual Xeon 1.4ghz MP, with Hyperthreading turned on
> 6 gig RAM
> 2 internal 36gig drives mirrored
> 1 additional intel e1000 network card
> 2 IBM fibre adapters (QLA2300s) connected to a FastT700 SAN
> RedHat Advanced Server 2.1
> 2.4.20 kernel built using the RH 2.4.9e8summit .config file as template
>
[ Snip! ]

Our customers have seen this on large Dell boxes too.  I strongly suspect that 
any v2.4 system with lots of physical memory and high I/O bandwidth can cause 
this bug.


-- 
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-06 23:35 Chris Wood
                   ` (2 preceding siblings ...)
  2003-01-09  2:20 ` James Cleverdon
@ 2003-01-09  2:57 ` William Lee Irwin III
  3 siblings, 0 replies; 21+ messages in thread
From: William Lee Irwin III @ 2003-01-09  2:57 UTC (permalink / raw)
  To: Chris Wood; +Cc: linux-kernel

[-- Attachment #1: brief message --]
[-- Type: text/plain, Size: 760 bytes --]

On Mon, Jan 06, 2003 at 04:35:17PM -0700, Chris Wood wrote:
> With some tips from James Cleverdon (IBM), I turned on some kernel 
> debugging and got the following from readprofile when the server was 
> having problems (truncated to the first 22 lines):
> 16480 total                                      0.0138

Here are some monitoring tools that might help detect the cause of
the situation.

bloatmon is the "back end"; there's no reason to run it directly.

bloatmeter shows the "least utilized" slabs.

bloatmost shows the largest slabs.

These sort of make for a top(1) for "lowmem pressure". Not everything
is accounted there, though. The missing pieces are largely

(1) simultaneous temporary poll table allocations
(2) pmd's
(3) kernel stacks

Bill

[-- Attachment #2: bloatmost --]
[-- Type: text/plain, Size: 110 bytes --]

#!/bin/sh

while true
do
	bloatmon < /proc/slabinfo \
		| sort -rn -k 3,3 \
		| head -22
	sleep 60
	echo
done

[-- Attachment #3: bloatmeter --]
[-- Type: text/plain, Size: 133 bytes --]

#!/bin/sh
while : ; do
	grep -v '^slabinfo' /proc/slabinfo	\
		| bloatmon			\
		| sort -n -k 4,4		\
		| head -22
	sleep 5
	echo
done

[-- Attachment #4: bloatmon --]
[-- Type: text/plain, Size: 413 bytes --]

#!/usr/bin/awk -f
BEGIN {
	printf "%18s    %8s %8s %8s\n", "cache", "active", "alloc", "%util";
}

{
	if ($3 != 0.0) {
		pct  = 100.0 * $2 / $3;
		frac = (10000.0 * $2 / $3) % 100;
	} else {
		pct  = 100.0;
		frac = 0.0;
	}
	active = ($2 * $4)/1024;
	alloc  = ($3 * $4)/1024;
	if ((alloc - active) < 1.0) {
		pct  = 100.0;
		frac = 0.0;
	}
	printf "%18s: %8dKB %8dKB  %3d.%-2d\n", $1, active, alloc, pct, frac;
}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-06 23:52 ` Andrew Morton
@ 2003-01-09 17:17   ` Chris Wood
  2003-01-09 20:18     ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Wood @ 2003-01-09 17:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
> Chris Wood wrote:
> 
>>Due to kswapd problems in Redhat's 2.4.9 kernel, I have had to upgrade
>>to the 2.4.20 kernel with the IBM Summit Patches for our IBM x440.
>>...
>>16480 total                                      0.0138
>>   6383 .text.lock.swap                          110.0517
>>   4689 .text.lock.vmscan                         28.2470
>>   4486 shrink_cache                               4.6729
>>    168 rw_swap_page_base                          0.6176
>>    124 prune_icache                               0.5167
> 
> 
> With six gigs of memory, it looks like the VM has gone nuts
> trying to locate some reclaimable lowmem.
> 
> Suggest you send the contents of /proc/meminfo and /proc/slabinfo,
> captured during a period of misbehaviour.

The server ran fine for 3 days, so it took a bit to get this info.

Is there a list of which patches I can apply if I don't want to apply 
the entire 2.4.20aa1?  I'm nervous about breaking other things, but may 
give it a try anyway.

Thanks for the help!

Here is a /proc/meminfo when it is running fine:

         total:    used:    free:  shared: buffers:  cached:
Mem:  6356955136 6035910656 321044480        0 206626816 5301600256
Swap: 2146529280 41652224 2104877056
MemTotal:      6207964 kB
MemFree:        313520 kB
MemShared:           0 kB
Buffers:        201784 kB
Cached:        5171716 kB
SwapCached:       5628 kB
Active:        3667492 kB
Inactive:      1912544 kB
HighTotal:     5373660 kB
HighFree:       203952 kB
LowTotal:       834304 kB
LowFree:        109568 kB
SwapTotal:     2096220 kB
SwapFree:      2055544 kB

Here is a /proc/meminfo when it is having problems:

         total:    used:    free:  shared: buffers:  cached:
Mem:  6356955136 6337114112 19841024        0 369520640 5160353792
Swap: 2146529280 96501760 2050027520
MemTotal:      6207964 kB
MemFree:         19376 kB
MemShared:           0 kB
Buffers:        360860 kB
Cached:        5023300 kB
SwapCached:      16108 kB
Active:        2551264 kB
Inactive:      3291804 kB
HighTotal:     5373660 kB
HighFree:        15404 kB
LowTotal:       834304 kB
LowFree:          3972 kB
SwapTotal:     2096220 kB
SwapFree:      2001980 kB

Here is a /proc/slabinfo when it is fine:

slabinfo - version: 1.1 (SMP)
kmem_cache            64     64    244    4    4    1 :  252  126
ip_fib_hash           14    224     32    2    2    1 :  252  126
ip_conntrack           0      0    384    0    0    1 :  124   62
urb_priv               0      0     64    0    0    1 :  252  126
journal_head        1141   5929     48   33   77    1 :  252  126
revoke_table           7    250     12    1    1    1 :  252  126
revoke_record        448    448     32    4    4    1 :  252  126
clip_arp_cache         0      0    128    0    0    1 :  252  126
ip_mrt_cache           0      0    128    0    0    1 :  252  126
tcp_tw_bucket        384    510    128   13   17    1 :  252  126
tcp_bind_bucket      442   1008     32    9    9    1 :  252  126
tcp_open_request     570    570    128   19   19    1 :  252  126
inet_peer_cache      232    232     64    4    4    1 :  252  126
ip_dst_cache         807   1185    256   79   79    1 :  252  126
arp_cache            354    480    128   16   16    1 :  252  126
blkdev_requests      768    810    128   27   27    1 :  252  126
dnotify_cache        500    664     20    4    4    1 :  252  126
file_lock_cache     1157   2120     96   53   53    1 :  252  126
fasync_cache         565    600     16    3    3    1 :  252  126
uid_cache            419    448     32    4    4    1 :  252  126
skbuff_head_cache    780   1410    256   65   94    1 :  252  126
sock                 426   1671   1280  288  557    1 :   60   30
sigqueue             725    725    132   25   25    1 :  252  126
kiobuf                 0      0     64    0    0    1 :  252  126
cdev_cache           703    870     64   15   15    1 :  252  126
bdev_cache             9    116     64    2    2    1 :  252  126
mnt_cache             18    116     64    2    2    1 :  252  126
inode_cache        50995  50995    512 7285 7285    1 :  124   62
dentry_cache       71760  71760    128 2392 2392    1 :  252  126
dquot                  0      0    128    0    0    1 :  252  126
filp               52314  52380    128 1746 1746    1 :  252  126
names_cache           28     28   4096   28   28    1 :   60   30
buffer_head       1342242 1486740    128 49558 49558    1 :  252  126
mm_struct            701   2355    256  155  157    1 :  252  126
vm_area_struct     11887  58530    128 1793 1951    1 :  252  126
fs_cache             831   2378     64   41   41    1 :  252  126
files_cache          597   2184    512  246  312    1 :  124   62
signal_act           501   2112   1408  168  192    4 :   60   30
pae_pgd              699   2378     64   41   41    1 :  252  126
size-131072(DMA)       0      0 131072    0    0   32 :    0    0
size-131072            0      0 131072    0    0   32 :    0    0
size-65536(DMA)        0      0  65536    0    0   16 :    0    0
size-65536             0      0  65536    0    0   16 :    0    0
size-32768(DMA)        0      0  32768    0    0    8 :    0    0
size-32768             1      5  32768    1    5    8 :    0    0
size-16384(DMA)        0      0  16384    0    0    4 :    0    0
size-16384             5     12  16384    5   12    4 :    0    0
size-8192(DMA)         0      0   8192    0    0    2 :    0    0
size-8192              5      7   8192    5    7    2 :    0    0
size-4096(DMA)         0      0   4096    0    0    1 :   60   30
size-4096            437   1127   4096  437 1127    1 :   60   30
size-2048(DMA)         0      0   2048    0    0    1 :   60   30
size-2048            314    434   2048  170  217    1 :   60   30
size-1024(DMA)         0      0   1024    0    0    1 :  124   62
size-1024            567   1464   1024  240  366    1 :  124   62
size-512(DMA)          0      0    512    0    0    1 :  124   62
size-512             906    968    512  120  121    1 :  124   62
size-256(DMA)          0      0    256    0    0    1 :  252  126
size-256            8724   8850    256  583  590    1 :  252  126
size-128(DMA)          2     60    128    2    2    1 :  252  126
size-128            3198   3450    128  114  115    1 :  252  126
size-64(DMA)           0      0    128    0    0    1 :  252  126
size-64             3486   4050    128  135  135    1 :  252  126
size-32(DMA)          34    116     64    2    2    1 :  252  126
size-32            22446  22446     64  387  387    1 :  252  126

Here is a /proc/slabinfo when it is having problems:

slabinfo - version: 1.1 (SMP)
kmem_cache            64     64    244    4    4    1 :  252  126
ip_fib_hash           14    224     32    2    2    1 :  252  126
ip_conntrack           0      0    384    0    0    1 :  124   62
urb_priv               0      0     64    0    0    1 :  252  126
journal_head        1660   3773     48   49   49    1 :  252  126
revoke_table           7    250     12    1    1    1 :  252  126
revoke_record          0      0     32    0    0    1 :  252  126
clip_arp_cache         0      0    128    0    0    1 :  252  126
ip_mrt_cache           0      0    128    0    0    1 :  252  126
tcp_tw_bucket        148    150    128    5    5    1 :  252  126
tcp_bind_bucket      696    896     32    8    8    1 :  252  126
tcp_open_request     120    120    128    4    4    1 :  252  126
inet_peer_cache      107    232     64    4    4    1 :  252  126
ip_dst_cache         960    960    256   64   64    1 :  252  126
arp_cache            232    360    128   12   12    1 :  252  126
blkdev_requests      768    810    128   27   27    1 :  252  126
dnotify_cache        238    332     20    2    2    1 :  252  126
file_lock_cache     1776   2040     96   51   51    1 :  252  126
fasync_cache         273    400     16    2    2    1 :  252  126
uid_cache            501    560     32    5    5    1 :  252  126
skbuff_head_cache    685   1020    256   68   68    1 :  252  126
sock                1095   1095   1280  365  365    1 :   60   30
sigqueue             203    203    132    7    7    1 :  252  126
kiobuf                 0      0     64    0    0    1 :  252  126
cdev_cache           725    754     64   13   13    1 :  252  126
bdev_cache             9    116     64    2    2    1 :  252  126
mnt_cache             18    116     64    2    2    1 :  252  126
inode_cache        13808  20755    512 2965 2965    1 :  124   62
dentry_cache        5976  14070    128  469  469    1 :  252  126
dquot                  0      0    128    0    0    1 :  252  126
filp               52314  52380    128 1746 1746    1 :  252  126
names_cache            8      8   4096    8    8    1 :   60   30
buffer_head       1335952 1470150    128 49005 49005    1 :  252  126
mm_struct           1620   1620    256  108  108    1 :  252  126
vm_area_struct     39180  39180    128 1306 1306    1 :  252  126
fs_cache            1815   1972     64   34   34    1 :  252  126
files_cache         1477   1477    512  211  211    1 :  124   62
signal_act          1430   1430   1408  130  130    4 :   60   30
pae_pgd             1798   1798     64   31   31    1 :  252  126
size-131072(DMA)       0      0 131072    0    0   32 :    0    0
size-131072            0      0 131072    0    0   32 :    0    0
size-65536(DMA)        0      0  65536    0    0   16 :    0    0
size-65536             0      0  65536    0    0   16 :    0    0
size-32768(DMA)        0      0  32768    0    0    8 :    0    0
size-32768             1      1  32768    1    1    8 :    0    0
size-16384(DMA)        0      0  16384    0    0    4 :    0    0
size-16384             5      5  16384    5    5    4 :    0    0
size-8192(DMA)         0      0   8192    0    0    2 :    0    0
size-8192              5      5   8192    5    5    2 :    0    0
size-4096(DMA)         0      0   4096    0    0    1 :   60   30
size-4096            981   1011   4096  981 1011    1 :   60   30
size-2048(DMA)         0      0   2048    0    0    1 :   60   30
size-2048            312    342   2048  167  171    1 :   60   30
size-1024(DMA)         0      0   1024    0    0    1 :  124   62
size-1024           1080   1080   1024  270  270    1 :  124   62
size-512(DMA)          0      0    512    0    0    1 :  124   62
size-512             832    832    512  104  104    1 :  124   62
size-256(DMA)          0      0    256    0    0    1 :  252  126
size-256            8550   8550    256  570  570    1 :  252  126
size-128(DMA)          2     60    128    2    2    1 :  252  126
size-128            2850   2850    128   95   95    1 :  252  126
size-64(DMA)           0      0    128    0    0    1 :  252  126
size-64             2591   4200    128  140  140    1 :  252  126
size-32(DMA)          34    116     64    2    2    1 :  252  126
size-32             2536   7134     64  123  123    1 :  252  126

> 
> Then please apply 
> http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1.bz2
> and send a report on the outcome.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-09 17:17   ` Chris Wood
@ 2003-01-09 20:18     ` Andrew Morton
  2003-01-10  0:25       ` William Lee Irwin III
  2003-01-10 20:42       ` Chris Wood
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2003-01-09 20:18 UTC (permalink / raw)
  To: Chris Wood, William Lee Irwin III; +Cc: linux-kernel

Chris Wood wrote:
> 
> ..
> The server ran fine for 3 days, so it took a bit to get this info.

Is appreciated, thanks.
 
> Is there a list of which patches I can apply if I don't want to apply
> the entire 2.4.20aa1?  I'm nervous about breaking other things, but may
> give it a try anyway.

http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1/05_vm_16_active_free_zone_bhs-1
http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1/10_inode-highmem-2

The former is the most important and, alas, has dependencies on
earlier patches.

hm, OK.  I've pulled all Andrea's VM changes and the inode-highmem fix
into a standalone diff.  I'll beat on that a bit tonight before unleashing
it.

> Thanks for the help!
> 
> Here is a /proc/meminfo when it is running fine:

These numbers are a little odd.  You seem to have only lost 200M of
lowmem to buffer_heads.  Bill, what's your take on this?

Maybe we're looking at the wrong thing.  Are any of your applications
using mlock(), mlockall(), etc?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-09 20:18     ` Andrew Morton
@ 2003-01-10  0:25       ` William Lee Irwin III
  2003-01-10  0:44         ` Brian Tinsley
  2003-01-10 20:42       ` Chris Wood
  1 sibling, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2003-01-10  0:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chris Wood, linux-kernel

On Thu, Jan 09, 2003 at 12:18:27PM -0800, Andrew Morton wrote:
> These numbers are a little odd.  You seem to have only lost 200M of
> lowmem to buffer_heads.  Bill, what's your take on this?

He's really low on lowmem. It's <= 16GB or so so mem_map is near-
irrelevant, say 60MB.

My interpretation of the numbers is as follows, where pae_pmd and
kernel_stack are both guessed from pae_pgd:

       buffer_head:   166994KB   183768KB   90.87
           pae_pmd:    21576KB    21576KB  100.0 
      kernel_stack:    14384KB    14384KB  100.0 
       inode_cache:     6904KB    10377KB   66.52
              filp:     6539KB     6547KB   99.87
    vm_area_struct:     4897KB     4897KB  100.0 
         size-4096:     3924KB     4044KB   97.3 
          size-256:     2137KB     2137KB  100.0 
        signal_act:     1966KB     1966KB  100.0 
      dentry_cache:      747KB     1758KB   42.47
              sock:     1368KB     1368KB  100.0 
         size-1024:     1080KB     1080KB  100.0 
       files_cache:      738KB      738KB  100.0 
         size-2048:      624KB      684KB   91.22
           size-64:      323KB      525KB   61.69
           size-32:      158KB      445KB   35.54
          size-512:      416KB      416KB  100.0 
         mm_struct:      405KB      405KB  100.0 
          size-128:      356KB      356KB  100.0 
 skbuff_head_cache:      171KB      255KB   67.15
      ip_dst_cache:      240KB      240KB  100.0 
   file_lock_cache:      166KB      191KB   87.5 
      journal_head:       77KB      176KB   43.99
          fs_cache:      113KB      123KB   92.3 
           pae_pgd:      112KB      112KB  100.0 
   blkdev_requests:       96KB      101KB   94.81
        size-16384:       80KB       80KB  100.0 
        cdev_cache:       45KB       47KB   96.15
         arp_cache:       29KB       45KB   64.44
         size-8192:       40KB       40KB  100.0 
       names_cache:       32KB       32KB  100.0 
        size-32768:       32KB       32KB  100.0 
   tcp_bind_bucket:       21KB       28KB   77.67
          sigqueue:       26KB       26KB  100.0 
     tcp_tw_bucket:       18KB       18KB  100.0 
         uid_cache:       15KB       17KB   89.46
  tcp_open_request:       15KB       15KB  100.0 
        kmem_cache:       15KB       15KB  100.0 
   inet_peer_cache:        6KB       14KB   46.12
     size-128(DMA):        0KB        7KB    3.33
      size-32(DMA):        2KB        7KB   29.31
       ip_fib_hash:        0KB        7KB    6.25
        bdev_cache:        0KB        7KB    7.75
         mnt_cache:        1KB        7KB   15.51
     dnotify_cache:        4KB        6KB   71.68
      fasync_cache:        4KB        6KB   68.25
      revoke_table:        0KB        2KB    2.80

== grand total of 253.015MB, fragmentation included.
	+ 60MB mem_map
== grand total of 313MB or so


Either pollwait tables (invisible in 2.4 and 2.5), kernel stacks of
threads (which don't get pae_pgd's and are hence invisible in 2.4
and 2.5), or pagecache, with a much higher likelihood of pagecache.

Or there might be dark matter in the universe, and he's being bitten by 
unaccounted !__GFP_HIGHMEM allocations, e.g. stock 2.4.x pagetables,
which aren't predictable from pae_pgd etc. highpte of any flavor (aa or
otherwise) should fix that. But there's no way to guess, as there's zero
2.4.x PTE accounting or even any hints from this report, like average
RSS and VSZ (which are still underestimates, as 2.4.x pagetables are
leaked over the lifetime of the process vs. 2.5.x's reap-on-munmap()).


Bill

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-10  0:25       ` William Lee Irwin III
@ 2003-01-10  0:44         ` Brian Tinsley
  2003-01-10  0:55           ` William Lee Irwin III
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Tinsley @ 2003-01-10  0:44 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, Chris Wood, linux-kernel

>
>
>Either pollwait tables (invisible in 2.4 and 2.5), kernel stacks of
>threads (which don't get pae_pgd's and are hence invisible in 2.4
>and 2.5), or pagecache, with a much higher likelihood of pagecache.
>
The "kernel stacks of threads" may have some bearing on my incarnation 
of this problem. We have several heavily threaded Java applications 
running at the time the live-locks occur. At our most problematic site, 
one application has a bug that can cause hundreds of timer threads (I 
mean like 800 or so!) to be "accidentally" created. This site is 
scheduled for an upgrade either tonight or tomorrow, so I will leave the 
system as it is and see if I can still cause the live-lock to manifest 
itself after the upgrade.

-- 

-[========================]-
-[      Brian Tinsley     ]-
-[ Chief Systems Engineer ]-
-[        Emageon         ]-
-[========================]-




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-10  0:44         ` Brian Tinsley
@ 2003-01-10  0:55           ` William Lee Irwin III
  0 siblings, 0 replies; 21+ messages in thread
From: William Lee Irwin III @ 2003-01-10  0:55 UTC (permalink / raw)
  To: Brian Tinsley; +Cc: Andrew Morton, Chris Wood, linux-kernel

At some point in the past, I wrote:
>> Either pollwait tables (invisible in 2.4 and 2.5), kernel stacks of
>> threads (which don't get pae_pgd's and are hence invisible in 2.4
>> and 2.5), or pagecache, with a much higher likelihood of pagecache.

On Thu, Jan 09, 2003 at 06:44:10PM -0600, Brian Tinsley wrote:
> The "kernel stacks of threads" may have some bearing on my incarnation 
> of this problem. We have several heavily threaded Java applications 
> running at the time the live-locks occur. At our most problematic site, 
> one application has a bug that can cause hundreds of timer threads (I 
> mean like 800 or so!) to be "accidentally" created. This site is 
> scheduled for an upgrade either tonight or tomorrow, so I will leave the 
> system as it is and see if I can still cause the live-lock to manifest 
> itself after the upgrade.

There is no extant implementation of paged stacks yet. I'm working on
a different problem (mem_map on 64GB on 2.5.x). I probably won't have
time to implement it in the near future, I probably won't be doing it
vs. 2.4.x, and I won't have to if someone else does it first.


Bill

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
@ 2003-01-10  3:17 Brian Tinsley
  2003-01-10  3:29 ` William Lee Irwin III
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Tinsley @ 2003-01-10  3:17 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

 >>>At some point in the past, I wrote:
 >>> Either pollwait tables (invisible in 2.4 and 2.5), kernel stacks of
 >>> threads (which don't get pae_pgd's and are hence invisible in 2.4
 >>> and 2.5), or pagecache, with a much higher likelihood of pagecache.

 >>On Thu, Jan 09, 2003 at 06:44:10PM -0600, Brian Tinsley wrote:
 >> The "kernel stacks of threads" may have some bearing on my incarnation
 >> of this problem. We have several heavily threaded Java applications
 >> running at the time the live-locks occur. At our most problematic site,
 >> one application has a bug that can cause hundreds of timer threads (I
 >> mean like 800 or so!) to be "accidentally" created. This site is
 >> scheduled for an upgrade either tonight or tomorrow, so I will leave the
 >> system as it is and see if I can still cause the live-lock to manifest
 >> itself after the upgrade.

 >There is no extant implementation of paged stacks yet.

For the most part, this is probably a boundary condition, right? Anyone 
that intentionally has 800+ threads in a single application probably 
needs to reevaluate their design :)

 >I'm working on a different problem (mem_map on 64GB on 2.5.x). I probably
 > won't have time to implement it in the near future, I probably won't 
be doing it
 >vs. 2.4.x, and I won't have to if someone else does it first.

Is that a hint to someone in particular?



-- 

-[========================]-
-[      Brian Tinsley     ]-
-[ Chief Systems Engineer ]-
-[        Emageon         ]-
-[========================]-



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-10  3:17 2.4.20, .text.lock.swap cpu usage? (ibm x440) Brian Tinsley
@ 2003-01-10  3:29 ` William Lee Irwin III
  2003-01-10  3:42   ` Brian Tinsley
  0 siblings, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2003-01-10  3:29 UTC (permalink / raw)
  To: Brian Tinsley; +Cc: linux-kernel

At some point in the past, I wrote:
>> There is no extant implementation of paged stacks yet.

On Thu, Jan 09, 2003 at 09:17:56PM -0600, Brian Tinsley wrote:
> For the most part, this is probably a boundary condition, right? Anyone 
> that intentionally has 800+ threads in a single application probably 
> needs to reevaluate their design :)

IMHO multiprogramming is as valid a use for memory as any other. Or
even otherwise, it's not something I care to get in design debates
about, it's just how the things are used.

The only trouble is support for what you're doing is unimplemented.


At some point in the past, I wrote:
>> I'm working on a different problem (mem_map on 64GB on 2.5.x). I
>> probably won't have time to implement it in the near future, I
>> probably won't be doing it vs. 2.4.x, and I won't have to if someone
>> else does it first.

On Thu, Jan 09, 2003 at 09:17:56PM -0600, Brian Tinsley wrote:
> Is that a hint to someone in particular?

Only you, if anyone. My intentions and patchwriting efforts on the 64GB
and highmem multiprogramming fronts are long since public, and publicly
stated to be targeted at 2.7. Since there isn't a 2.7 yet, 2.5-CURRENT
must suffice until there is.


Bill

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-10  3:29 ` William Lee Irwin III
@ 2003-01-10  3:42   ` Brian Tinsley
  2003-01-10  3:54     ` William Lee Irwin III
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Tinsley @ 2003-01-10  3:42 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

William Lee Irwin III wrote:

>At some point in the past, I wrote:
>  
>
>>>There is no extant implementation of paged stacks yet.
>>>      
>>>
>
>On Thu, Jan 09, 2003 at 09:17:56PM -0600, Brian Tinsley wrote:
>  
>
>>For the most part, this is probably a boundary condition, right? Anyone 
>>that intentionally has 800+ threads in a single application probably 
>>needs to reevaluate their design :)
>>    
>>
>
>IMHO multiprogramming is as valid a use for memory as any other. Or
>even otherwise, it's not something I care to get in design debates
>about, it's just how the things are used.
>
I agree with the philosophy in general, but if I sit down to write a 
threaded application for Linux on IA-32 and wind up with a design that 
uses 800+ threads in any instance (other than a bug, which was our 
case), it's time to give up the day job and start riding on the back of 
the garbage truck ;)

>The only trouble is support for what you're doing is unimplemented.
>
You mean the 800+ threads or Java on Linux?

>At some point in the past, I wrote:
>  
>
>>>I'm working on a different problem (mem_map on 64GB on 2.5.x). I
>>>probably won't have time to implement it in the near future, I
>>>probably won't be doing it vs. 2.4.x, and I won't have to if someone
>>>else does it first.
>>>      
>>>
>
>On Thu, Jan 09, 2003 at 09:17:56PM -0600, Brian Tinsley wrote:
>  
>
>>Is that a hint to someone in particular?
>>    
>>
>
>Only you, if anyone. My intentions and patchwriting efforts on the 64GB
>and highmem multiprogramming fronts are long since public, and publicly
>stated to be targeted at 2.7. Since there isn't a 2.7 yet, 2.5-CURRENT
>must suffice until there is.
>
In all honesty, I would enjoy nothing more than contributing to kernel 
development. Unfortunately it's a bit out of my scope right now (but not 
forever). If I only believed aliens seeded our gene pool with clones, I 
could hook up with those folks that claim to have cloned a human and get 
one of me made! ;)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-10  3:42   ` Brian Tinsley
@ 2003-01-10  3:54     ` William Lee Irwin III
  2003-01-10  4:08       ` Brian Tinsley
  0 siblings, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2003-01-10  3:54 UTC (permalink / raw)
  To: Brian Tinsley; +Cc: linux-kernel

William Lee Irwin III wrote:
>> IMHO multiprogramming is as valid a use for memory as any other. Or
>> even otherwise, it's not something I care to get in design debates
>> about, it's just how the things are used.

On Thu, Jan 09, 2003 at 09:42:06PM -0600, Brian Tinsley wrote:
> I agree with the philosophy in general, but if I sit down to write a 
> threaded application for Linux on IA-32 and wind up with a design that 
> uses 800+ threads in any instance (other than a bug, which was our 
> case), it's time to give up the day job and start riding on the back of 
> the garbage truck ;)

I could care less what userspace does: mechanism, not policy. Userspace
wants, and I give if I can, just as the kernel does with system calls.

800 threads isn't even a high thread count anyway, the 2.5.x testing
was with a peak thread count of 100,000. 800 threads, even with an 8KB
stack, is no more than 6.4MB of lowmem for stacks and so shouldn't
stress the system unless many instances of it are run. I suspect your
issue is elsewhere. I'll submit accounting patches for Marcelo's and/or
Andrea's trees so you can find out what's actually going on.


William Lee Irwin III wrote:
>> Only you, if anyone. My intentions and patchwriting efforts on the 64GB
>> and highmem multiprogramming fronts are long since public, and publicly
>> stated to be targeted at 2.7. Since there isn't a 2.7 yet, 2.5-CURRENT
>> must suffice until there is.

On Thu, Jan 09, 2003 at 09:42:06PM -0600, Brian Tinsley wrote:
> In all honesty, I would enjoy nothing more than contributing to kernel 
> development. Unfortunately it's a bit out of my scope right now (but not 
> forever). If I only believed aliens seeded our gene pool with clones, I 
> could hook up with those folks that claim to have cloned a human and get 
> one of me made! ;)

I don't know what to tell you here. I'm lucky that this is my day job
and that I can contribute so much. However, there are plenty who
contribute major changes (many even more important than my own) without
any such sponsorship. Perhaps emulating them would satisfy your wish.


Bill

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-10  3:54     ` William Lee Irwin III
@ 2003-01-10  4:08       ` Brian Tinsley
  2003-01-10  4:19         ` William Lee Irwin III
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Tinsley @ 2003-01-10  4:08 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

William Lee Irwin III wrote:

>William Lee Irwin III wrote:
>  
>
>>>IMHO multiprogramming is as valid a use for memory as any other. Or
>>>even otherwise, it's not something I care to get in design debates
>>>about, it's just how the things are used.
>>>      
>>>
>
>On Thu, Jan 09, 2003 at 09:42:06PM -0600, Brian Tinsley wrote:
>  
>
>>I agree with the philosophy in general, but if I sit down to write a 
>>threaded application for Linux on IA-32 and wind up with a design that 
>>uses 800+ threads in any instance (other than a bug, which was our 
>>case), it's time to give up the day job and start riding on the back of 
>>the garbage truck ;)
>>    
>>
>
>I could care less what userspace does: mechanism, not policy. Userspace
>wants, and I give if I can, just as the kernel does with system calls.
>
>800 threads isn't even a high thread count anyway, the 2.5.x testing
>was with a peak thread count of 100,000. 800 threads, even with an 8KB
>stack, is no more than 6.4MB of lowmem for stacks and so shouldn't
>stress the system unless many instances of it are run.
>
I understand your perspective here. I won't get into application design 
issues as it is far out of context from this list.

>I suspect your issue is elsewhere. I'll submit accounting patches for Marcelo's and/or Andrea's trees so you can find out what's actually going on.
>
Much appreciated! I look forward to it.


>On Thu, Jan 09, 2003 at 09:42:06PM -0600, Brian Tinsley wrote:
>  
>
>>In all honesty, I would enjoy nothing more than contributing to kernel 
>>development. Unfortunately it's a bit out of my scope right now (but not forever). If I only believed aliens seeded our gene pool with clones, I could hook up with those folks that claim to have cloned a human and get one of me made! ;)
>>    
>>
>
>I don't know what to tell you here. I'm lucky that this is my day job
>and that I can contribute so much. However, there are plenty who
>contribute major changes (many even more important than my own) without
>any such sponsorship. Perhaps emulating them would satisfy your wish.
>
It would!

I cannot say thanks enough for the efforts of you and everyone else out 
there. Frankly, I would not have my day job and would not have been able 
to make Emageon what it is today were it not for you all!

Oh, please excuse the stupid humor tonight. I'm in a giddy mood for some 
reason. Must be the excitement from the prospect of getting resolution 
to this problem!



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-10  4:08       ` Brian Tinsley
@ 2003-01-10  4:19         ` William Lee Irwin III
  2003-01-10  4:50           ` Brian Tinsley
  0 siblings, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2003-01-10  4:19 UTC (permalink / raw)
  To: Brian Tinsley; +Cc: linux-kernel

William Lee Irwin III wrote:
>> I don't know what to tell you here. I'm lucky that this is my day job
>> and that I can contribute so much. However, there are plenty who
>> contribute major changes (many even more important than my own) without
>> any such sponsorship. Perhaps emulating them would satisfy your wish.

On Thu, Jan 09, 2003 at 10:08:55PM -0600, Brian Tinsley wrote:
> It would!
> I cannot say thanks enough for the efforts of you and everyone else out 
> there. Frankly, I would not have my day job and would not have been able 
> to make Emageon what it is today were it not for you all!
> Oh, please excuse the stupid humor tonight. I'm in a giddy mood for some 
> reason. Must be the excitement from the prospect of getting resolution 
> to this problem!

We're straying from the subject here. Please describe your machine,
in terms of how many cpus it has and how much highmem it has, and
your workload, so I can better determine the issue. Perhaps we can
cooperatively devise something that works well for you.

Or perhaps the kernel version is not up-to-date. Please also provide
the precise kernel version (and included patches). And workload too.


Thanks,
Bill

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-10  4:19         ` William Lee Irwin III
@ 2003-01-10  4:50           ` Brian Tinsley
  2003-01-10  5:17             ` Martin J. Bligh
  2003-01-10  5:24             ` William Lee Irwin III
  0 siblings, 2 replies; 21+ messages in thread
From: Brian Tinsley @ 2003-01-10  4:50 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

>
>
>We're straying from the subject here.
>
Sorry

>Please describe your machine,
>in terms of how many cpus it has and how much highmem it has, and
>your workload, so I can better determine the issue. Perhaps we can
>cooperatively devise something that works well for you.
>
IBM x360
Pentium 4 Xeon MP processors

2 processor system has 4GB RAM
4 processor system has 8GB RAM

1 IBM ServeRAID controller
2 Intel PRO/1000MT NICs
2 QLogic 2340 Fibre Channel HBAs

>Or perhaps the kernel version is not up-to-date. Please also provide
>the precise kernel version (and included patches). And workload too.
>
The kernel version is stock 2.4.20 with Chris Mason's data logging and 
journal relocation patches for ReiserFS (neither of which are actually 
in use for any mounted filesystems). It is compiled for 64GB highmem 
support. And just to refresh, I have seen this exact behavior on stock 
2.4.19 and stock 2.4.17 (no patches on either of these) also compiled 
with 64GB highmem support.

Workload:
When the live-lock occurs, the system is performing intensive network 
I/O and intensive disk reads from the fibre channel storage (i.e., the 
backup program is reading files from disk and transferring them to the 
backup server). I posted a snapshot of sar data collection earlier today 
showing selected stats leading up to and just after the live-lock occurs 
(which is noted by a ~2 minute gap in sar logging). After the live-lock 
is released, the only thing that stands out is an unusual increase in 
runtime for kswapd (as reported by ps).

The various Java programs mentioned in prior postings are *mostly* idle 
at this point in time as it is after hours for our clients.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-10  4:50           ` Brian Tinsley
@ 2003-01-10  5:17             ` Martin J. Bligh
  2003-01-10  5:24             ` William Lee Irwin III
  1 sibling, 0 replies; 21+ messages in thread
From: Martin J. Bligh @ 2003-01-10  5:17 UTC (permalink / raw)
  To: Brian Tinsley, William Lee Irwin III; +Cc: linux-kernel

> IBM x360
> Pentium 4 Xeon MP processors
> 
> 2 processor system has 4GB RAM
> 4 processor system has 8GB RAM
> 
> 1 IBM ServeRAID controller
> 2 Intel PRO/1000MT NICs
> 2 QLogic 2340 Fibre Channel HBAs
> 
>> Or perhaps the kernel version is not up-to-date. Please also provide
>> the precise kernel version (and included patches). And workload too.
>> 
> The kernel version is stock 2.4.20 with Chris Mason's data logging and journal relocation patches for ReiserFS (neither of which are actually in use for any mounted filesystems). It is compiled for 64GB highmem support. And just to refresh, I have seen this exact behavior on stock 2.4.19 and stock 2.4.17 (no patches on either of these) also compiled with 64GB highmem support.
> 
> Workload:
> When the live-lock occurs, the system is performing intensive network I/O and intensive disk reads from the fibre channel storage (i.e., the backup program is reading files from disk and transferring them to the backup server). I posted a snapshot of sar data collection earlier today showing selected stats leading up to and just after the live-lock occurs (which is noted by a ~2 minute gap in sar logging). After the live-lock is released, the only thing that stands out is an unusual increase in runtime for kswapd (as reported by ps).
> 
> The various Java programs mentioned in prior postings are *mostly* idle at this point in time as it is after hours for our clients.


If you don't have any individual processes that need to be particularly
large (eg > 1Gb of data), I suggest you just cheat^Wfinesse the problem
and move PAGE_OFFSET from C0000000 to 80000000 - will give you more than
twice as much lowmem to play with. I think this might even be a config
option in RedHat kernels.

Martin.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-10  4:50           ` Brian Tinsley
  2003-01-10  5:17             ` Martin J. Bligh
@ 2003-01-10  5:24             ` William Lee Irwin III
  2003-01-10  5:45               ` Brian Tinsley
  1 sibling, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2003-01-10  5:24 UTC (permalink / raw)
  To: Brian Tinsley; +Cc: linux-kernel

At some point in the past, I wrote:
>> Or perhaps the kernel version is not up-to-date. Please also provide
>> the precise kernel version (and included patches). And workload too.

On Thu, Jan 09, 2003 at 10:50:03PM -0600, Brian Tinsley wrote:
> The kernel version is stock 2.4.20 with Chris Mason's data logging and 
> journal relocation patches for ReiserFS (neither of which are actually 
> in use for any mounted filesystems). It is compiled for 64GB highmem 
> support. And just to refresh, I have seen this exact behavior on stock 
> 2.4.19 and stock 2.4.17 (no patches on either of these) also compiled 
> with 64GB highmem support.

Okay, can you try with either 2.4.x-aa or 2.5.x-CURRENT?

I'm suspecting either bh problems or lowpte problems.

Also, could you monitor your load with the scripts I posted?


Thanks,
Bill

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-10  5:24             ` William Lee Irwin III
@ 2003-01-10  5:45               ` Brian Tinsley
  0 siblings, 0 replies; 21+ messages in thread
From: Brian Tinsley @ 2003-01-10  5:45 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

>
>
>Okay, can you try with either 2.4.x-aa or 2.5.x-CURRENT?
>
Yes, I *just* booted a machine with 2.4.20-aa1 in our lab. I was having 
problems compiling the Linux Virtual Server code, but it's fixed now. 

>I'm suspecting either bh problems or lowpte problems.
>
>Also, could you monitor your load with the scripts I posted?
>  
>
Yes, they are already uploaded to a customer site and ready to go. I 
need to flex the -aa1 kernel a bit before I load it there as well.


Thanks!



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)
  2003-01-09 20:18     ` Andrew Morton
  2003-01-10  0:25       ` William Lee Irwin III
@ 2003-01-10 20:42       ` Chris Wood
  1 sibling, 0 replies; 21+ messages in thread
From: Chris Wood @ 2003-01-10 20:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: William Lee Irwin III, linux-kernel

Andrew Morton wrote:
> Chris Wood wrote:
> 
>>..
>>The server ran fine for 3 days, so it took a bit to get this info.
> 
> 
> Is appreciated, thanks.
>  
> 
>>Is there a list of which patches I can apply if I don't want to apply
>>the entire 2.4.20aa1?  I'm nervous about breaking other things, but may
>>give it a try anyway.
> 
> 
> http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1/05_vm_16_active_free_zone_bhs-1
> http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1/10_inode-highmem-2
> 
> The former is the most important and, alas, has dependencies on
> earlier patches.
> 
> hm, OK.  I've pulled all Andrea's VM changes and the inode-highmem fix
> into a standalone diff.  I'll beat on that a bit tonight before unleashing
> it.
> 

I tried to apply 2.4.20aa1 to my /usr/src/linux and then compile it, but 
it failed to compile.  I do have the IBM x440 (NUMA) patches applied to 
this tree, I don't know if that caused any problems but I didn't see any 
when I applied the patch.  I'll attach a snip at the end of this email 
just in case it may point to something (there was more than this).

> 
>>Thanks for the help!
>>
>>Here is a /proc/meminfo when it is running fine:
> 
> 
> These numbers are a little odd.  You seem to have only lost 200M of
> lowmem to buffer_heads.  Bill, what's your take on this?
> 
> Maybe we're looking at the wrong thing.  Are any of your applications
> using mlock(), mlockall(), etc?

I'm not sure, other than our services our main programs are in Cobol 
(iCobol and AcuCobol).  I could ask the vendors if that would help.


------- sorry if this is an ugly paste -------

/usr/src/linux-2.4.20/include/asm/pgalloc.h: In function `get_pgd_slow':
/usr/src/linux-2.4.20/include/asm/pgalloc.h:49: `PAGE_OFFSET_RAW' 
undeclared (fi
rst use in this function)
/usr/src/linux-2.4.20/include/asm/pgalloc.h:54: warning: implicit 
declaration of
  function `set_64bit'
/usr/src/linux-2.4.20/include/asm/pgalloc.h: In function `free_pgd_slow':
/usr/src/linux-2.4.20/include/asm/pgalloc.h:110: `PAGE_OFFSET_RAW' 
undeclared (f
irst use in this function)
In file included from /usr/src/linux-2.4.20/include/linux/blkdev.h:11,
                  from /usr/src/linux-2.4.20/include/linux/blk.h:4,
                  from init/main.c:25:
/usr/src/linux-2.4.20/include/asm/io.h: In function `virt_to_phys':
/usr/src/linux-2.4.20/include/asm/io.h:78: `PAGE_OFFSET_RAW' undeclared 
(first u
se in this function)
/usr/src/linux-2.4.20/include/asm/io.h:79: warning: control reaches end 
of non-v
oid function
/usr/src/linux-2.4.20/include/asm/io.h: In function `phys_to_virt':
/usr/src/linux-2.4.20/include/asm/io.h:96: `PAGE_OFFSET_RAW' undeclared 
(first u
se in this function)
/usr/src/linux-2.4.20/include/asm/io.h:97: warning: control reaches end 
of non-v
oid function
/usr/src/linux-2.4.20/include/asm/io.h: In function `isa_check_signature':
/usr/src/linux-2.4.20/include/asm/io.h:280: `PAGE_OFFSET_RAW' undeclared 
(first
use in this function)
init/main.c: In function `start_kernel':
init/main.c:381: `PAGE_OFFSET_RAW' undeclared (first use in this function)
make: *** [init/main.o] Error 1
In file included from eni.c:9:
/usr/src/linux-2.4.20/include/linux/mm.h: In function `pmd_alloc':
/usr/src/linux-2.4.20/include/linux/mm.h:521: `PAGE_OFFSET_RAW' 
undeclared (firs
t use in this function)
/usr/src/linux-2.4.20/include/linux/mm.h:521: (Each undeclared 
identifier is rep
orted only once
/usr/src/linux-2.4.20/include/linux/mm.h:521: for each function it 
appears in.)
/usr/src/linux-2.4.20/include/linux/mm.h:522: warning: control reaches 
end of no
n-void function
In file included from /usr/src/linux-2.4.20/include/linux/highmem.h:5,
                  from /usr/src/linux-2.4.20/include/linux/vmalloc.h:8,
                  from /usr/src/linux-2.4.20/include/asm/io.h:47,
                  from /usr/src/linux-2.4.20/include/asm/pci.h:35,
                  from /usr/src/linux-2.4.20/include/linux/pci.h:622,
                  from eni.c:10:
/usr/src/linux-2.4.20/include/asm/pgalloc.h: In function `get_pgd_slow':
/usr/src/linux-2.4.20/include/asm/pgalloc.h:49: `PAGE_OFFSET_RAW' 
undeclared (fi
rst use in this function)
/usr/src/linux-2.4.20/include/asm/pgalloc.h:54: warning: implicit 
declaration of
  function `set_64bit'
/usr/src/linux-2.4.20/include/asm/pgalloc.h: In function `free_pgd_slow':
/usr/src/linux-2.4.20/include/asm/pgalloc.h:110: `PAGE_OFFSET_RAW' 
undeclared (f
irst use in this function)
In file included from /usr/src/linux-2.4.20/include/asm/pci.h:35,
                  from /usr/src/linux-2.4.20/include/linux/pci.h:622,
                  from eni.c:10:
/usr/src/linux-2.4.20/include/asm/io.h: In function `virt_to_phys':
/usr/src/linux-2.4.20/include/asm/io.h:78: `PAGE_OFFSET_RAW' undeclared 
(first u
se in this function)
/usr/src/linux-2.4.20/include/asm/io.h:79: warning: control reaches end 
of non-v
oid function
/usr/src/linux-2.4.20/include/asm/io.h: In function `phys_to_virt':
/usr/src/linux-2.4.20/include/asm/io.h:96: `PAGE_OFFSET_RAW' undeclared 
(first u
se in this function)
/usr/src/linux-2.4.20/include/asm/io.h:97: warning: control reaches end 
of non-v
oid function
/usr/src/linux-2.4.20/include/asm/io.h: In function `isa_check_signature':
/usr/src/linux-2.4.20/include/asm/io.h:280: `PAGE_OFFSET_RAW' undeclared 
(first
use in this function)
make[2]: *** [eni.o] Error 1
make[1]: *** [_modsubdir_atm] Error 2
make: *** [_mod_drivers] Error 2




^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-01-10 20:34 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-10  3:17 2.4.20, .text.lock.swap cpu usage? (ibm x440) Brian Tinsley
2003-01-10  3:29 ` William Lee Irwin III
2003-01-10  3:42   ` Brian Tinsley
2003-01-10  3:54     ` William Lee Irwin III
2003-01-10  4:08       ` Brian Tinsley
2003-01-10  4:19         ` William Lee Irwin III
2003-01-10  4:50           ` Brian Tinsley
2003-01-10  5:17             ` Martin J. Bligh
2003-01-10  5:24             ` William Lee Irwin III
2003-01-10  5:45               ` Brian Tinsley
  -- strict thread matches above, loose matches on Subject: below --
2003-01-06 23:35 Chris Wood
2003-01-06 23:50 ` William Lee Irwin III
2003-01-06 23:52 ` Andrew Morton
2003-01-09 17:17   ` Chris Wood
2003-01-09 20:18     ` Andrew Morton
2003-01-10  0:25       ` William Lee Irwin III
2003-01-10  0:44         ` Brian Tinsley
2003-01-10  0:55           ` William Lee Irwin III
2003-01-10 20:42       ` Chris Wood
2003-01-09  2:20 ` James Cleverdon
2003-01-09  2:57 ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox