2.6.14 kswapd eating too much CPU

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.14 kswapd eating too much CPU
@ 2005-11-22 12:59 Jan Kasprzak
  2005-11-23  0:35 ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kasprzak @ 2005-11-22 12:59 UTC (permalink / raw)
  To: linux-kernel

	Hello,

I have noticed that on my system kswapd eats too much CPU time every two
hours or so. This started when I upgraded this server to 2.6.14.2 (was 2.6.13.2
before), and added another 4 GB of memory (to the total of 8GB). The server
runs mainly FTP load (proftpd w/ sendfile()), with some other apps (Apache,
Qmail, etc). It is dual Opteron (Tyan S2882 board) with 8 ATA disks
on 3ware 7506 controller with SW raid volumes (RAID-1 and RAID-5),
with XFS and ext3 filesystems.

Here is the top(1) output when the kswapd problem occurs:

top - 13:45:34 up 8 days, 15:49,  2 users,  load average: 3.37, 3.49, 3.44
Tasks: 325 total,   4 running, 321 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5% us, 75.6% sy,  0.8% ni, 20.3% id,  1.3% wa,  0.2% hi,  1.3% si
Mem:   8174528k total,  8151076k used,    23452k free,    49960k buffers
Swap: 14651256k total,      640k used, 14650616k free,  7562572k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   16 root      15   0     0    0    0 R 96.4  0.0   1542:27 kswapd0
   15 root      15   0     0    0    0 D 52.0  0.0 833:48.37 kswapd1
 1056 rsyncd    35  19 11196 2608  928 R  3.0  0.0   0:42.29 rsync
 2082 named     18   0 72816  17m 1888 S  0.7  0.2  47:47.55 named
18484 ftp       20   5 10028 1556  788 S  0.7  0.0   0:00.02 proftpd
   26 root      10  -5     0    0    0 S  0.3  0.0 156:47.10 md5_raid5

It can be seen that one kswapd uses almost 100% of CPU,and the other one
about 50%. The MRTG graph of CPU usage can be seen at

http://www.linux.cz/stats/mrtg-rrd/cpu.html

(orange is the system+irq+softirq time, and red is non-idle (i.e.
user+sys+nice+irq+softirq, without idle and without iowait). The upgrade
of the kernel and memory has been done on Saturday, November 12th.

The peaks of the CPU system time happen every two horus or so. The peak
from ~5am is wider, because at that time various cron daily jobs are run
(updatedb, and so on).

	Any clues on what is going on here? Thanks,

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> Specs are a basis for _talking_about_ things. But they are _not_ a basis <
> for implementing software.                              --Linus Torvalds <

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-22 12:59 2.6.14 kswapd eating too much CPU Jan Kasprzak
@ 2005-11-23  0:35 ` Andrew Morton
  2005-11-23  1:01   ` Jan Kasprzak
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2005-11-23  0:35 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: linux-kernel

Jan Kasprzak <kas@fi.muni.cz> wrote:
>
> I have noticed that on my system kswapd eats too much CPU time every two
> hours or so. This started when I upgraded this server to 2.6.14.2 (was 2.6.13.2
> before), and added another 4 GB of memory (to the total of 8GB).

Next time it happens, please gather some memory info (while it's happening):

	cat /proc/meminfo
	cat /proc/vmstat
	cat /proc/slabinfo
	dmesg -c > /dev/null
	echo m > /proc/sysrq-trigger
	dmesg

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-23  0:35 ` Andrew Morton
@ 2005-11-23  1:01   ` Jan Kasprzak
  2005-11-23  2:19     ` Nick Piggin
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kasprzak @ 2005-11-23  1:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
: Jan Kasprzak <kas@fi.muni.cz> wrote:
: >
: > I have noticed that on my system kswapd eats too much CPU time every two
: > hours or so. This started when I upgraded this server to 2.6.14.2 (was 2.6.13.2
: > before), and added another 4 GB of memory (to the total of 8GB).
: 
: Next time it happens, please gather some memory info (while it's happening):
: 
: 	cat /proc/meminfo

# cat /proc/meminfo
MemTotal:      8174528 kB
MemFree:         33312 kB
Buffers:         33472 kB
Cached:        7565188 kB
SwapCached:          0 kB
Active:        2457564 kB
Inactive:      5469168 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8174528 kB
LowFree:         33312 kB
SwapTotal:    14651256 kB
SwapFree:     14650616 kB
Dirty:           67136 kB
Writeback:           0 kB
Mapped:         348848 kB
Slab:           159900 kB
CommitLimit:  18738520 kB
Committed_AS:   746976 kB
PageTables:      15596 kB
VmallocTotal: 34359738367 kB
VmallocUsed:      9500 kB
VmallocChunk: 34359728803 kB

: 	cat /proc/vmstat

# cat /proc/vmstat
nr_dirty 15117
nr_writeback 0
nr_unstable 0
nr_page_table_pages 3975
nr_mapped 88352
nr_slab 39997
pgpgin 4793492568
pgpgout 470966429
pswpin 0
pswpout 160
pgalloc_high 0
pgalloc_normal 2450703504
pgalloc_dma 982887145
pgfree 3433597403
pgactivate 368526738
pgdeactivate 334812475
pgfault 1595083992
pgmajfault 64619
pgrefill_high 0
pgrefill_normal 533924428
pgrefill_dma 1995932
pgsteal_high 0
pgsteal_normal 923247535
pgsteal_dma 0
pgscan_kswapd_high 0
pgscan_kswapd_normal 940269891
pgscan_kswapd_dma 0
pgscan_direct_high 0
pgscan_direct_normal 13837131
pgscan_direct_dma 0
pginodesteal 11216563
slabs_scanned 160160350534400
kswapd_steal 909876526
kswapd_inodesteal 305039060
pageoutrun 30139677
allocstall 4067783
pgrotated 1198
nr_bounce 0

: 	cat /proc/slabinfo

# cat /proc/slabinfo
slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
nfsd4_delegations      0      0    656    6    1 : tunables   54   27    8 : slabdata      0      0      0
nfsd4_stateids         0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0
nfsd4_files            0      0     72   53    1 : tunables  120   60    8 : slabdata      0      0      0
nfsd4_stateowners      0      0    424    9    1 : tunables   54   27    8 : slabdata      0      0      0
raid5/md5            256    260   1424    5    2 : tunables   24   12    8 : slabdata     52     52      0
rpc_buffers            8      8   2048    2    1 : tunables   24   12    8 : slabdata      4      4      0
rpc_tasks              8     10    384   10    1 : tunables   54   27    8 : slabdata      1      1      0
rpc_inode_cache        8      8    832    4    1 : tunables   54   27    8 : slabdata      2      2      0
fib6_nodes            33    118     64   59    1 : tunables  120   60    8 : slabdata      2      2      0
ip6_dst_cache         26     36    320   12    1 : tunables   54   27    8 : slabdata      3      3      0
ndisc_cache            3     30    256   15    1 : tunables  120   60    8 : slabdata      2      2      0
RAWv6                  4      4    896    4    1 : tunables   54   27    8 : slabdata      1      1      0
UDPv6                  1      4    896    4    1 : tunables   54   27    8 : slabdata      1      1      0
tw_sock_TCPv6          0      0    192   20    1 : tunables  120   60    8 : slabdata      0      0      0
request_sock_TCPv6      0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0
TCPv6                 13     15   1536    5    2 : tunables   24   12    8 : slabdata      3      3      0
UNIX                 460    654    640    6    1 : tunables   54   27    8 : slabdata    109    109      0
tcp_bind_bucket      233   2800     32  112    1 : tunables  120   60    8 : slabdata     25     25      0
inet_peer_cache       32    177     64   59    1 : tunables  120   60    8 : slabdata      3      3      0
ip_fib_alias          20    118     64   59    1 : tunables  120   60    8 : slabdata      2      2      0
ip_fib_hash           17    118     64   59    1 : tunables  120   60    8 : slabdata      2      2      0
ip_dst_cache         985   1776    320   12    1 : tunables   54   27    8 : slabdata    148    148      0
arp_cache              8     30    256   15    1 : tunables  120   60    8 : slabdata      2      2      0
RAW                    3     11    704   11    2 : tunables   54   27    8 : slabdata      1      1      0
UDP                   35     40    768    5    1 : tunables   54   27    8 : slabdata      8      8      0
tw_sock_TCP          174    640    192   20    1 : tunables  120   60    8 : slabdata     32     32      0
request_sock_TCP      48     60    128   30    1 : tunables  120   60    8 : slabdata      2      2      0
TCP                  333    525   1408    5    2 : tunables   24   12    8 : slabdata    105    105      0
dm_tio                 0      0     24  144    1 : tunables  120   60    8 : slabdata      0      0      0
dm_io                  0      0     32  112    1 : tunables  120   60    8 : slabdata      0      0      0
scsi_cmd_cache        74    168    512    7    1 : tunables   54   27    8 : slabdata     19     24     30
cfq_ioc_pool           0      0     96   40    1 : tunables  120   60    8 : slabdata      0      0      0
cfq_pool               0      0    160   24    1 : tunables  120   60    8 : slabdata      0      0      0
crq_pool               0      0     88   44    1 : tunables  120   60    8 : slabdata      0      0      0
deadline_drq           0      0     96   40    1 : tunables  120   60    8 : slabdata      0      0      0
as_arq               159    408    112   34    1 : tunables  120   60    8 : slabdata     11     12      5
xfs_acl                0      0    304   13    1 : tunables   54   27    8 : slabdata      0      0      0
xfs_chashlist       4965   8512     32  112    1 : tunables  120   60    8 : slabdata     76     76      0
xfs_ili              213    260    192   20    1 : tunables  120   60    8 : slabdata     13     13      0
xfs_ifork              0      0     64   59    1 : tunables  120   60    8 : slabdata      0      0      0
xfs_efi_item           0      0    352   11    1 : tunables   54   27    8 : slabdata      0      0      0
xfs_efd_item           0      0    360   11    1 : tunables   54   27    8 : slabdata      0      0      0
xfs_buf_item          74     84    184   21    1 : tunables  120   60    8 : slabdata      4      4      0
xfs_dabuf             52    288     24  144    1 : tunables  120   60    8 : slabdata      2      2      0
xfs_da_state           6     16    488    8    1 : tunables   54   27    8 : slabdata      2      2      0
xfs_trans              6     45    872    9    2 : tunables   54   27    8 : slabdata      3      5      0
xfs_inode          31498  31577    528    7    1 : tunables   54   27    8 : slabdata   4511   4511      0
xfs_btree_cur         13     40    192   20    1 : tunables  120   60    8 : slabdata      2      2      0
xfs_bmap_free_item      2    144     24  144    1 : tunables  120   60    8 : slabdata      1      1      0
xfs_buf               88     99    408    9    1 : tunables   54   27    8 : slabdata     11     11      0
xfs_ioend             32     54    144   27    1 : tunables  120   60    8 : slabdata      2      2      0
xfs_vnode          31498  31572    648    6    1 : tunables   54   27    8 : slabdata   5262   5262      0
nfs_write_data        36     36    832    9    2 : tunables   54   27    8 : slabdata      4      4      0
nfs_read_data         32     35    768    5    1 : tunables   54   27    8 : slabdata      7      7      0
nfs_inode_cache      184    184    992    4    1 : tunables   54   27    8 : slabdata     46     46      0
nfs_page               0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0
isofs_inode_cache     10     12    648    6    1 : tunables   54   27    8 : slabdata      2      2      0
journal_handle        81    288     24  144    1 : tunables  120   60    8 : slabdata      2      2      0
journal_head         681   1160     96   40    1 : tunables  120   60    8 : slabdata     29     29    384
revoke_table           6    202     16  202    1 : tunables  120   60    8 : slabdata      1      1      0
revoke_record          0      0     32  112    1 : tunables  120   60    8 : slabdata      0      0      0
ext3_inode_cache    3542   3805    808    5    1 : tunables   54   27    8 : slabdata    761    761      0
ext3_xattr             0      0     88   44    1 : tunables  120   60    8 : slabdata      0      0      0
dnotify_cache          1     92     40   92    1 : tunables  120   60    8 : slabdata      1      1      0
dquot                  0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0
eventpoll_pwq          0      0     72   53    1 : tunables  120   60    8 : slabdata      0      0      0
eventpoll_epi          0      0    192   20    1 : tunables  120   60    8 : slabdata      0      0      0
inotify_event_cache      0      0     40   92    1 : tunables  120   60    8 : slabdata      0      0      0
inotify_watch_cache      0      0     64   59    1 : tunables  120   60    8 : slabdata      0      0      0
kioctx                 0      0    320   12    1 : tunables   54   27    8 : slabdata      0      0      0
kiocb                  0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0
fasync_cache           0      0     24  144    1 : tunables  120   60    8 : slabdata      0      0      0
shmem_inode_cache    818    825    808    5    1 : tunables   54   27    8 : slabdata    165    165      0
posix_timers_cache      0      0    168   23    1 : tunables  120   60    8 : slabdata      0      0      0
uid_cache             47    118     64   59    1 : tunables  120   60    8 : slabdata      2      2      0
sgpool-128            32     32   4096    1    1 : tunables   24   12    8 : slabdata     32     32      0
sgpool-64             32     32   2048    2    1 : tunables   24   12    8 : slabdata     16     16      0
sgpool-32             65    120   1024    4    1 : tunables   54   27    8 : slabdata     22     30      3
sgpool-16             36     40    512    8    1 : tunables   54   27    8 : slabdata      5      5      0
sgpool-8             109    195    256   15    1 : tunables  120   60    8 : slabdata     13     13      1
blkdev_ioc           231    469     56   67    1 : tunables  120   60    8 : slabdata      7      7      0
blkdev_queue          80     88    712   11    2 : tunables   54   27    8 : slabdata      8      8      0
blkdev_requests      283    375    264   15    1 : tunables   54   27    8 : slabdata     22     25    216
biovec-(256)         260    260   4096    1    1 : tunables   24   12    8 : slabdata    260    260      0
biovec-128           264    264   2048    2    1 : tunables   24   12    8 : slabdata    132    132      0
biovec-64            304    380   1024    4    1 : tunables   54   27    8 : slabdata     81     95      3
biovec-16            285    300    256   15    1 : tunables  120   60    8 : slabdata     20     20      0
biovec-4             284    354     64   59    1 : tunables  120   60    8 : slabdata      6      6      0
biovec-1             841   4040     16  202    1 : tunables  120   60    8 : slabdata     20     20    480
bio                  870   2430    128   30    1 : tunables  120   60    8 : slabdata     79     81    480
file_lock_cache       38     48    160   24    1 : tunables  120   60    8 : slabdata      2      2      0
sock_inode_cache     832   1250    704    5    1 : tunables   54   27    8 : slabdata    250    250      0
skbuff_fclone_cache   4963   7224    448    8    1 : tunables   54   27    8 : slabdata    903    903    162
skbuff_head_cache    542    675    256   15    1 : tunables  120   60    8 : slabdata     45     45      0
acpi_operand        1028   1060     72   53    1 : tunables  120   60    8 : slabdata     20     20      0
acpi_parse_ext         0      0     64   59    1 : tunables  120   60    8 : slabdata      0      0      0
acpi_parse             0      0     40   92    1 : tunables  120   60    8 : slabdata      0      0      0
acpi_state             0      0     88   44    1 : tunables  120   60    8 : slabdata      0      0      0
proc_inode_cache     910    954    632    6    1 : tunables   54   27    8 : slabdata    159    159     20
sigqueue              10     46    168   23    1 : tunables  120   60    8 : slabdata      2      2      0
radix_tree_node    70767  79870    536    7    1 : tunables   54   27    8 : slabdata  11410  11410      0
bdev_cache            55     56    832    4    1 : tunables   54   27    8 : slabdata     14     14      0
sysfs_dir_cache     3346   3445     72   53    1 : tunables  120   60    8 : slabdata     65     65      0
mnt_cache             39     60    192   20    1 : tunables  120   60    8 : slabdata      3      3      0
inode_cache         1164   1224    600    6    1 : tunables   54   27    8 : slabdata    204    204      0
dentry_cache       44291  48569    224   17    1 : tunables  120   60    8 : slabdata   2857   2857     51
filp                3106   5955    256   15    1 : tunables  120   60    8 : slabdata    397    397     48
names_cache            6      6   4096    1    1 : tunables   24   12    8 : slabdata      6      6      0
idr_layer_cache       90     98    528    7    1 : tunables   54   27    8 : slabdata     14     14      0
buffer_head       218819 233948     88   44    1 : tunables  120   60    8 : slabdata   5317   5317      0
mm_struct            272    483   1152    7    2 : tunables   24   12    8 : slabdata     69     69      3
vm_area_struct     17201  26670    184   21    1 : tunables  120   60    8 : slabdata   1270   1270    480
fs_cache             267    708     64   59    1 : tunables  120   60    8 : slabdata     12     12      0
files_cache          290    440    896    4    1 : tunables   54   27    8 : slabdata    110    110      0
signal_cache         343    546    640    6    1 : tunables   54   27    8 : slabdata     91     91      0
sighand_cache        316    423   2112    3    2 : tunables   24   12    8 : slabdata    141    141      3
task_struct          333    492   1728    4    2 : tunables   24   12    8 : slabdata    123    123      3
anon_vma            1474   3168     24  144    1 : tunables  120   60    8 : slabdata     22     22     12
shared_policy_node      0      0     56   67    1 : tunables  120   60    8 : slabdata      0      0      0
numa_policy           36    404     16  202    1 : tunables  120   60    8 : slabdata      2      2      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768            20     20  32768    1    8 : tunables    8    4    0 : slabdata     20     20      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             1      1  16384    1    4 : tunables    8    4    0 : slabdata      1      1      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192             36     37   8192    1    2 : tunables    8    4    0 : slabdata     36     37      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    8 : slabdata      0      0      0
size-4096            353    357   4096    1    1 : tunables   24   12    8 : slabdata    353    357     15
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    8 : slabdata      0      0      0
size-2048            978    988   2048    2    1 : tunables   24   12    8 : slabdata    492    494      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    8 : slabdata      0      0      0
size-1024           5016   5800   1024    4    1 : tunables   54   27    8 : slabdata   1450   1450    189
size-512(DMA)          0      0    512    8    1 : tunables   54   27    8 : slabdata      0      0      0
size-512             720    776    512    8    1 : tunables   54   27    8 : slabdata     97     97      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0
size-256              88    135    256   15    1 : tunables  120   60    8 : slabdata      9      9      0
size-192(DMA)          0      0    192   20    1 : tunables  120   60    8 : slabdata      0      0      0
size-192            1786   2600    192   20    1 : tunables  120   60    8 : slabdata    130    130      0
size-128(DMA)          0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0
size-64(DMA)           0      0     64   59    1 : tunables  120   60    8 : slabdata      0      0      0
size-64            18281  33158     64   59    1 : tunables  120   60    8 : slabdata    562    562      0
size-32(DMA)           0      0     32  112    1 : tunables  120   60    8 : slabdata      0      0      0
size-128            5450   9180    128   30    1 : tunables  120   60    8 : slabdata    306    306      1
size-32              950   1232     32  112    1 : tunables  120   60    8 : slabdata     11     11      0
kmem_cache           156    156    640    6    1 : tunables   54   27    8 : slabdata     26     26      0

: 	dmesg -c > /dev/null
: 	echo m > /proc/sysrq-trigger
: 	dmesg

	I don't have sysrq compiled in, sorry. I will recompile/boot
a new kernel maybe tomorrow.

	I have found that this two-hour period of system time peaks is from
a cron job which does a full-text indexing of the archive of the mailing
list I run - so access to many small files (mailing list posts), and update
to the large "inverted index" database. However - this time I have also
kswapd0 with 95% CPU, and the next ~50% is proftpd on the other CPU in the
"R" state, which cannot be killed even with SIGKILL:

top - 01:56:18 up 9 days,  4:00,  2 users,  load average: 4.91, 4.46, 5.71
Tasks: 310 total,  12 running, 298 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.8% us, 73.8% sy,  2.2% ni,  1.5% id, 17.1% wa,  0.2% hi,  1.5% si
Mem:   8174528k total,  8144104k used,    30424k free,    33552k buffers
Swap: 14651256k total,      640k used, 14650616k free,  7570548k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   16 root      15   0     0    0    0 R 91.8  0.0   1628:43 kswapd0
25345 ftp       22   5 12448 3988  872 R 49.4  0.0  14:30.79 proftpd
21680 apache    17   0 28448 9220 2260 R  2.7  0.1   0:01.39 ezarchive.fcgi
 1449 rsyncd    35  19 19836  11m  948 S  2.3  0.1   9:35.26 rsync
24152 rsyncd    34  19 12764 4568  936 R  1.7  0.1   0:31.54 rsync
30811 rsyncd    35  19 10424 1664  896 S  1.7  0.0   0:13.97 rsync

	Hope this helps,

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> Specs are a basis for _talking_about_ things. But they are _not_ a basis <
> for implementing software.                              --Linus Torvalds <

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-23  1:01   ` Jan Kasprzak
@ 2005-11-23  2:19     ` Nick Piggin
  2005-11-23  5:13       ` Jan Kasprzak
  0 siblings, 1 reply; 21+ messages in thread
From: Nick Piggin @ 2005-11-23  2:19 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: Andrew Morton, linux-kernel

Jan Kasprzak wrote:
> Andrew Morton wrote:
> : Jan Kasprzak <kas@fi.muni.cz> wrote:
> : >
> : > I have noticed that on my system kswapd eats too much CPU time every two
> : > hours or so. This started when I upgraded this server to 2.6.14.2 (was 2.6.13.2
> : > before), and added another 4 GB of memory (to the total of 8GB).
> : 
> : Next time it happens, please gather some memory info (while it's happening):
> : 
> : 	cat /proc/meminfo
> 
[snip]

> 	Hope this helps,
> 

Can't see anything yet. Sysrq-M would be good. cat /proc/zoneinfo gets you
most of the way there though.

A couple of samples would be handy, especially from /proc/vmstat.

cat /proc/vmstat > vmstat.out ; sleep 10 ; cat /proc/vmstat >> vmstat.out

The same for /proc/zoneinfo would be a good idea as well.

Also - when you say "too much cpu time", what does this mean? Does
performance noticably drop compared with 2.6.13 performing the same cron
job? Because kswapd is supposed to unburden other processes from page
reclaim work, so if it is working *properly*, then the more CPU it uses
the better.

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-23  2:19     ` Nick Piggin
@ 2005-11-23  5:13       ` Jan Kasprzak
  2005-11-23 13:14         ` Jan Kasprzak
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kasprzak @ 2005-11-23  5:13 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-kernel

Nick Piggin wrote:
: Can't see anything yet. Sysrq-M would be good. cat /proc/zoneinfo gets you
: most of the way there though.
: 
: A couple of samples would be handy, especially from /proc/vmstat.
: 
: cat /proc/vmstat > vmstat.out ; sleep 10 ; cat /proc/vmstat >> vmstat.out
: 
: The same for /proc/zoneinfo would be a good idea as well.

	I will send it tomorrow - I will try to boot 2.6.15-rc2
to see if the problem is still there.

: Also - when you say "too much cpu time", what does this mean? Does
: performance noticably drop compared with 2.6.13 performing the same cron
: job? Because kswapd is supposed to unburden other processes from page
: reclaim work, so if it is working *properly*, then the more CPU it uses
: the better.

	As you can see from the graphs at
http://www.linux.cz/stats/mrtg-rrd/cpu.html
- the total CPU usage in the new kernel/with more memory is significantly
higher. I have not measured the time of the cron job itself, just the total
system utilization.

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> Specs are a basis for _talking_about_ things. But they are _not_ a basis <
> for implementing software.                              --Linus Torvalds <

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-23  5:13       ` Jan Kasprzak
@ 2005-11-23 13:14         ` Jan Kasprzak
  2005-11-23 19:02           ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kasprzak @ 2005-11-23 13:14 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-kernel

Jan Kasprzak wrote:
: Nick Piggin wrote:
: : Can't see anything yet. Sysrq-M would be good. cat /proc/zoneinfo gets you
: : most of the way there though.
: : 
: : A couple of samples would be handy, especially from /proc/vmstat.
: : 
: : cat /proc/vmstat > vmstat.out ; sleep 10 ; cat /proc/vmstat >> vmstat.out
: : 
: : The same for /proc/zoneinfo would be a good idea as well.
: 
: 	I will send it tomorrow - I will try to boot 2.6.15-rc2
: to see if the problem is still there.
: 
	I am at 2.6.15-rc2 now, the problem is still there.
Currently according to top(1), kswapd1 eats >98% CPU for 50 minutes now
and counting. The last cron job which apparently triggers this has been
started at 13:15, and now it is 14:07; the cron job itself has finished
after few minutes, no processes of the user who runs this cron job are
remaining in the system, but kswapd still eats CPU).

	At the end of this mail you can find the top(1) output,
the two outputs of /proc/vmstat and /proc/zoneinfo separated by a 10 second
sleep, as well as sysrq-m ouptut, which has been taken about at the
same time as the vmstat and zoneinfo outputs.

	What else can be unusual on my system? I run 8-way RAID-5 volume,
but it is OK according to /proc/vmstat. I run each memory bus at the
different speed - CPU0 has 400MHz memory modules, CPU1 has 333MHz
modules. It is supported configuration according to Tyan docs. Anything else?

-Yenya

=========================== top:
top - 14:04:33 up  7:40,  5 users,  load average: 5.05, 3.32, 2.93
Tasks: 418 total,   2 running, 416 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5% us, 52.1% sy,  0.7% ni, 40.4% id,  3.8% wa,  0.0% hi,  2.5% si
Mem:   8174368k total,  8148188k used,    26180k free,    43956k buffers
Swap: 14651256k total,      656k used, 14650600k free,  7677580k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   17 root      25   0     0    0    0 R 98.8  0.0  67:58.25 kswapd1
   28 root      10  -5     0    0    0 S  1.0  0.0   4:59.17 md5_raid5
11421 ftp       20   5 10036 1504  764 S  1.0  0.0   0:00.05 proftpd
   18 root      15   0     0    0    0 S  0.7  0.0  23:39.88 kswapd0


============================== vmstat:
nr_dirty 1096
nr_writeback 0
nr_unstable 0
nr_page_table_pages 7897
nr_mapped 54381
nr_slab 31427
pgpgin 168267503
pgpgout 16779856
pswpin 0
pswpout 164
pgalloc_high 0
pgalloc_normal 68192735
pgalloc_dma 78742215
pgfree 146943386
pgactivate 11885097
pgdeactivate 10486693
pgfault 94787199
pgmajfault 3619
pgrefill_high 0
pgrefill_normal 7394476
pgrefill_dma 7988611
pgsteal_high 0
pgsteal_normal 13516901
pgsteal_dma 18774322
pgscan_kswapd_high 0
pgscan_kswapd_normal 12889635
pgscan_kswapd_dma 18124029
pgscan_direct_high 0
pgscan_direct_normal 1419165
pgscan_direct_dma 1397649
pginodesteal 1788228
slabs_scanned 8055609881856
kswapd_steal 29616872
kswapd_inodesteal 7587662
pageoutrun 997021
allocstall 168227
pgrotated 900
nr_bounce 0
============================== zoneinfo:
Node 1, zone   Normal
  pages free     1809
        min      1423
        low      1778
        high     2134
        active   251683
        inactive 753741
        scanned  66 (a: 0 i: 0)
        spanned  1048576
        present  1013647
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0 pcp: 0
              count: 66
              low:   0
              high:  384
              batch: 64
    cpu: 0 pcp: 1
              count: 127
              low:   0
              high:  128
              batch: 32
            numa_hit:       2155
            numa_miss:      26392398
            numa_foreign:   150
            interleave_hit: 2070
            local_node:     0
            other_node:     26394553
    cpu: 1 pcp: 0
              count: 68
              low:   0
              high:  384
              batch: 64
    cpu: 1 pcp: 1
              count: 111
              low:   0
              high:  128
              batch: 32
            numa_hit:       41672333
            numa_miss:      1482
            numa_foreign:   5904214
            interleave_hit: 5717
            local_node:     41673815
            other_node:     0
  all_unreclaimable: 0
  prev_priority:     12
  temp_priority:     12
  start_pfn:         1048576
Node 0, zone      DMA
  pages free     2855
        min      3
        low      3
        high     4
        active   0
        inactive 0
        scanned  0 (a: 12 i: 12)
        spanned  4096
        present  2761
        protection: (0, 3944, 3944, 3944)
  pagesets
    cpu: 1 pcp: 0
              count: 10
              low:   0
              high:  12
              batch: 2
    cpu: 1 pcp: 1
              count: 0
              low:   0
              high:  4
              batch: 1
            numa_hit:       5286426
            numa_miss:      0
            numa_foreign:   0
            interleave_hit: 3821307
            local_node:     0
            other_node:     5286426
  all_unreclaimable: 0
  prev_priority:     12
  temp_priority:     12
  start_pfn:         0
Node 0, zone    DMA32
  pages free     2912
        min      1418
        low      1772
        high     2127
        active   33310
        inactive 946725
        scanned  0 (a: 27 i: 0)
        spanned  1044480
        present  1009704
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0 pcp: 0
              count: 51
              low:   0
              high:  384
              batch: 64
    cpu: 0 pcp: 1
              count: 20
              low:   0
              high:  128
              batch: 32
            numa_hit:       47176253
            numa_miss:      150
            numa_foreign:   26392398
            interleave_hit: 1788
            local_node:     47176403
            other_node:     0
    cpu: 1 pcp: 0
              count: 345
              low:   0
              high:  384
              batch: 64
    cpu: 1 pcp: 1
              count: 62
              low:   0
              high:  128
              batch: 32
            numa_hit:       5081
            numa_miss:      5904214
            numa_foreign:   1482
            interleave_hit: 4507
            local_node:     0
            other_node:     5909295
  all_unreclaimable: 0
  prev_priority:     12
  temp_priority:     12
  start_pfn:         4096
============================== sleep 10:
============================== vmstat:
nr_dirty 922
nr_writeback 0
nr_unstable 0
nr_page_table_pages 7873
nr_mapped 54376
nr_slab 31318
pgpgin 168411599
pgpgout 16784255
pswpin 0
pswpout 164
pgalloc_high 0
pgalloc_normal 68198690
pgalloc_dma 78829202
pgfree 147035725
pgactivate 11887559
pgdeactivate 10489812
pgfault 94831072
pgmajfault 3621
pgrefill_high 0
pgrefill_normal 7396819
pgrefill_dma 7997162
pgsteal_high 0
pgsteal_normal 13519173
pgsteal_dma 18807281
pgscan_kswapd_high 0
pgscan_kswapd_normal 12889635
pgscan_kswapd_dma 18155676
pgscan_direct_high 0
pgscan_direct_normal 1421508
pgscan_direct_dma 1399992
pginodesteal 1788228
slabs_scanned 8065934875904
kswapd_steal 29647559
kswapd_inodesteal 7587889
pageoutrun 998045
allocstall 168298
pgrotated 900
nr_bounce 0
============================== zoneinfo:
Node 1, zone   Normal
  pages free     1765
        min      1423
        low      1778
        high     2134
        active   250416
        inactive 754970
        scanned  198 (a: 0 i: 0)
        spanned  1048576
        present  1013647
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0 pcp: 0
              count: 2
              low:   0
              high:  384
              batch: 64
    cpu: 0 pcp: 1
              count: 127
              low:   0
              high:  128
              batch: 32
            numa_hit:       2155
            numa_miss:      26395832
            numa_foreign:   151
            interleave_hit: 2070
            local_node:     0
            other_node:     26397987
    cpu: 1 pcp: 0
              count: 248
              low:   0
              high:  384
              batch: 64
    cpu: 1 pcp: 1
              count: 84
              low:   0
              high:  128
              batch: 32
            numa_hit:       41674836
            numa_miss:      1482
            numa_foreign:   5908193
            interleave_hit: 5718
            local_node:     41676318
            other_node:     0
  all_unreclaimable: 0
  prev_priority:     12
  temp_priority:     12
  start_pfn:         1048576
Node 0, zone      DMA
  pages free     2854
        min      3
        low      3
        high     4
        active   0
        inactive 0
        scanned  0 (a: 17 i: 17)
        spanned  4096
        present  2761
        protection: (0, 3944, 3944, 3944)
  pagesets
    cpu: 1 pcp: 0
              count: 11
              low:   0
              high:  12
              batch: 2
    cpu: 1 pcp: 1
              count: 0
              low:   0
              high:  4
              batch: 1
            numa_hit:       5288040
            numa_miss:      0
            numa_foreign:   0
            interleave_hit: 3822015
            local_node:     0
            other_node:     5288040
  all_unreclaimable: 0
  prev_priority:     12
  temp_priority:     12
  start_pfn:         0
Node 0, zone    DMA32
  pages free     2143
        min      1418
        low      1772
        high     2127
        active   33907
        inactive 946983
        scanned  0 (a: 0 i: 0)
        spanned  1044480
        present  1009704
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0 pcp: 0
              count: 146
              low:   0
              high:  384
              batch: 64
    cpu: 0 pcp: 1
              count: 54
              low:   0
              high:  128
              batch: 32
            numa_hit:       47239978
            numa_miss:      151
            numa_foreign:   26395832
            interleave_hit: 1790
            local_node:     47240129
            other_node:     0
    cpu: 1 pcp: 0
              count: 366
              low:   0
              high:  384
              batch: 64
    cpu: 1 pcp: 1
              count: 31
              low:   0
              high:  128
              batch: 32
            numa_hit:       5082
            numa_miss:      5908193
            numa_foreign:   1482
            interleave_hit: 4508
            local_node:     0
            other_node:     5913275
  all_unreclaimable: 0
  prev_priority:     12
  temp_priority:     12
  start_pfn:         4096

======================== dmesg after sysrq-m ============================
SysRq : Show Memory
Mem-info:
Node 1 DMA per-cpu: empty
Node 1 DMA32 per-cpu: empty
Node 1 Normal per-cpu:
cpu 0 hot: low 0, high 384, batch 64 used:59
cpu 0 cold: low 0, high 128, batch 32 used:24
cpu 1 hot: low 0, high 384, batch 64 used:84
cpu 1 cold: low 0, high 128, batch 32 used:93
Node 1 HighMem per-cpu: empty
Node 0 DMA per-cpu:
cpu 0 hot: low 0, high 12, batch 2 used:1
cpu 0 cold: low 0, high 4, batch 1 used:0
cpu 1 hot: low 0, high 12, batch 2 used:10
cpu 1 cold: low 0, high 4, batch 1 used:0
Node 0 DMA32 per-cpu:
cpu 0 hot: low 0, high 384, batch 64 used:334
cpu 0 cold: low 0, high 128, batch 32 used:31
cpu 1 hot: low 0, high 384, batch 64 used:66
cpu 1 cold: low 0, high 128, batch 32 used:31
Node 0 Normal per-cpu: empty
Node 0 HighMem per-cpu: empty
Free pages:       25780kB (0kB HighMem)
Active:283000 inactive:1704247 dirty:1177 writeback:1 unstable:0 free:6445 slab:31088 mapped:54478 pagetables:7852
Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 3959 3959
Node 1 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 3959 3959
Node 1 Normal free:7320kB min:5692kB low:7112kB high:8536kB active:996400kB inactive:3025596kB present:4054588kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 1 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA free:11412kB min:12kB low:12kB high:16kB active:0kB inactive:0kB present:11044kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 3944 3944 3944
Node 0 DMA32 free:7048kB min:5672kB low:7088kB high:8508kB active:135612kB inactive:3791400kB present:4038816kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 1 DMA: empty
Node 1 DMA32: empty
Node 1 Normal: 10*4kB 28*8kB 13*16kB 4*32kB 1*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 7320kB
Node 1 HighMem: empty
Node 0 DMA: 19*4kB 19*8kB 19*16kB 18*32kB 17*64kB 12*128kB 6*256kB 6*512kB 3*1024kB 0*2048kB 0*4096kB = 11412kB
Node 0 DMA32: 0*4kB 1*8kB 14*16kB 1*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 1*4096kB = 7048kB
Node 0 Normal: empty
Node 0 HighMem: empty
Swap cache: add 164, delete 164, find 0/0, race 0+0
Free swap  = 14650600kB
Total swap = 14651256kB
Free swap:       14650600kB
2097152 pages of RAM
53560 reserved pages
278668 pages shared
0 pages swap cached

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> Specs are a basis for _talking_about_ things. But they are _not_ a basis <
> for implementing software.                              --Linus Torvalds <

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-23 13:14         ` Jan Kasprzak
@ 2005-11-23 19:02           ` Andrew Morton
  2005-11-23 20:24             ` Jan Kasprzak
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2005-11-23 19:02 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: nickpiggin, linux-kernel

Jan Kasprzak <kas@fi.muni.cz> wrote:
>
> 	I am at 2.6.15-rc2 now, the problem is still there.
>  Currently according to top(1), kswapd1 eats >98% CPU for 50 minutes now
>  and counting.

When it's doing this, could you do sysrq-p a few times?  The output of that
should tell us where the CPU is executing.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-23 19:02           ` Andrew Morton
@ 2005-11-23 20:24             ` Jan Kasprzak
  2005-11-23 20:32               ` Jesper Juhl
  2005-11-23 20:35               ` Andrew Morton
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Kasprzak @ 2005-11-23 20:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, linux-kernel

Andrew Morton wrote:
: Jan Kasprzak <kas@fi.muni.cz> wrote:
: >
: > 	I am at 2.6.15-rc2 now, the problem is still there.
: >  Currently according to top(1), kswapd1 eats >98% CPU for 50 minutes now
: >  and counting.
: 
: When it's doing this, could you do sysrq-p a few times?  The output of that
: should tell us where the CPU is executing.

	Hmm, it does not show anything but the header. Should I enable
something special in the kernel?

# dmesg -c >/dev/null; echo -n p >/proc/sysrq-trigger ; sleep 5; dmesg
SysRq : Show Regs
# 

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> Specs are a basis for _talking_about_ things. But they are _not_ a basis <
> for implementing software.                              --Linus Torvalds <

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-23 20:24             ` Jan Kasprzak
@ 2005-11-23 20:32               ` Jesper Juhl
  2005-11-23 20:34                 ` Jan Kasprzak
  2005-11-23 20:35               ` Andrew Morton
  1 sibling, 1 reply; 21+ messages in thread
From: Jesper Juhl @ 2005-11-23 20:32 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: Andrew Morton, nickpiggin, linux-kernel

On 11/23/05, Jan Kasprzak <kas@fi.muni.cz> wrote:
> Andrew Morton wrote:
> : Jan Kasprzak <kas@fi.muni.cz> wrote:
> : >
> : >     I am at 2.6.15-rc2 now, the problem is still there.
> : >  Currently according to top(1), kswapd1 eats >98% CPU for 50 minutes now
> : >  and counting.
> :
> : When it's doing this, could you do sysrq-p a few times?  The output of that
> : should tell us where the CPU is executing.
>
>         Hmm, it does not show anything but the header. Should I enable
> something special in the kernel?
>

CONFIG_MAGIC_SYSRQ=y
(it's in 'Kernel hacking')


--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-23 20:32               ` Jesper Juhl
@ 2005-11-23 20:34                 ` Jan Kasprzak
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Kasprzak @ 2005-11-23 20:34 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Andrew Morton, nickpiggin, linux-kernel

Jesper Juhl wrote:
: On 11/23/05, Jan Kasprzak <kas@fi.muni.cz> wrote:
: >
: >         Hmm, it does not show anything but the header. Should I enable
: > something special in the kernel?
: >
: 
: CONFIG_MAGIC_SYSRQ=y
: (it's in 'Kernel hacking')
: 
	I of course have this:

# zgrep -i SYSRQ /proc/config.gz
CONFIG_MAGIC_SYSRQ=y
#

otherwise I would not have /proc/sysrq-trigger file at all, and I would not
be able to post the sysrq-m outputs like I did earlier today.

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> Specs are a basis for _talking_about_ things. But they are _not_ a basis <
> for implementing software.                              --Linus Torvalds <

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-23 20:24             ` Jan Kasprzak
  2005-11-23 20:32               ` Jesper Juhl
@ 2005-11-23 20:35               ` Andrew Morton
  2005-11-24  8:31                 ` Jan Kasprzak
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2005-11-23 20:35 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: nickpiggin, linux-kernel

Jan Kasprzak <kas@fi.muni.cz> wrote:
>
> Andrew Morton wrote:
> : Jan Kasprzak <kas@fi.muni.cz> wrote:
> : >
> : > 	I am at 2.6.15-rc2 now, the problem is still there.
> : >  Currently according to top(1), kswapd1 eats >98% CPU for 50 minutes now
> : >  and counting.
> : 
> : When it's doing this, could you do sysrq-p a few times?  The output of that
> : should tell us where the CPU is executing.
> 
> 	Hmm, it does not show anything but the header. Should I enable
> something special in the kernel?

Try `dmesg -n 7' first.

> # dmesg -c >/dev/null; echo -n p >/proc/sysrq-trigger ; sleep 5; dmesg
> SysRq : Show Regs
> # 

You won't get anything useful from sysrq-p via /proc/sysrq-trigger - it'll
just show the backtrace of the process `echo'.  It has to be via the
keyboard.

If there's no keyboard, do `echo t > /proc/sysrq-trigger' to get an
all-task backtrace, then locate the trace for kswapd in the resulting
output.

Thanks.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-23 20:35               ` Andrew Morton
@ 2005-11-24  8:31                 ` Jan Kasprzak
  2005-11-27  8:42                   ` Marcelo Tosatti
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kasprzak @ 2005-11-24  8:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, linux-kernel

Andrew Morton wrote:
: > # dmesg -c >/dev/null; echo -n p >/proc/sysrq-trigger ; sleep 5; dmesg
: > SysRq : Show Regs
: > # 
: 
: You won't get anything useful from sysrq-p via /proc/sysrq-trigger - it'll
: just show the backtrace of the process `echo'.  It has to be via the
: keyboard.

	OK, at the end of this mail I am attaching few instances of
( top -n 1 -b -p 17,18 ; dmesg -c ) after sysrq-p entered via the
serial console. However, I have now kswapd0 at 99% CPU and kswapd1 at
about 50%, and sysrq-p is now apparently handled by the CPU1, so I get
traces of kswapd1 only. How can I switch the serial interrupt to CPU0?

: If there's no keyboard, do `echo t > /proc/sysrq-trigger' to get an
: all-task backtrace, then locate the trace for kswapd in the resulting
: output.

	I had to increase CONFIG_LOG_BUF_SHIFT first, because sysrq-t
output was too long, but nevertheless - for the running processes there
are no call traces, just the note that the process is in the "R" state. Here
are few excerpts of sysrq-t:

-----
kswapd1       S ffff810100000000     0    17      1            18    11 (L-TLB)
ffff8101047bfea8 0000000000000046 ffff8101047bfe08 0000000000000292
       0000000000000292 ffffffff80160589 ffff8101ae564240 000000008012cf35
       0000000000000076 ffff8101ae564240
Call Trace:<ffffffff80160589>{balance_pgdat+297} <ffffffff80160936>{kswapd+278}
       <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff801472a0>{autoremove_wake_function+0}
       <ffffffff8010e7d6>{child_rip+8} <ffffffff8011ac50>{flat_send_IPI_mask+0}
       <ffffffff80160820>{kswapd+0} <ffffffff8010e7ce>{child_rip+0}
kswapd0       R  running task       0    18      1            33    17 (L-TLB)
-----
kswapd1       S ffff810100000000     0    17      1            18    11 (L-TLB)
ffff8101047bfea8 0000000000000046 ffff8101047bfe08 0000000000000292
       0000000000000292 ffffffff80160589 ffff8101ae564240 000000008012cf35
       0000000000000076 ffff8101ae564240
Call Trace:<ffffffff80160589>{balance_pgdat+297} <ffffffff80160936>{kswapd+278}
       <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff801472a0>{autoremove_wake_function+0}
       <ffffffff8010e7d6>{child_rip+8} <ffffffff8011ac50>{flat_send_IPI_mask+0}
       <ffffffff80160820>{kswapd+0} <ffffffff8010e7ce>{child_rip+0}
kswapd0       R  running task       0    18      1            33    17 (L-TLB)
-----

	And here are the sysrq-p outputs:

top - 09:17:11 up 10:09,  2 users,  load average: 13.56, 7.14, 3.45
Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.2% us, 20.4% sy,  3.0% ni, 65.9% id,  6.6% wa,  0.1% hi,  1.8% si
Mem:   8173472k total,  5769752k used,  2403720k free,    35436k buffers
Swap: 14651256k total,      656k used, 14650600k free,  5388176k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   18 root      15   0     0    0    0 R 91.9  0.0 118:24.00 kswapd0
   17 root      15   0     0    0    0 R 49.8  0.0  35:17.26 kswapd1

SysRq : Show Regs
CPU 1:
Modules linked in: binfmt_misc floppy
Pid: 17, comm: kswapd1 Not tainted 2.6.15-rc2 #2
RIP: 0010:[<ffffffff8019266b>] <ffffffff8019266b>{shrink_icache_memory+107}
RSP: 0000:ffff8101047bfd78  EFLAGS: 00000246
RAX: 00000000ffffffff RBX: 00000000000000c8 RCX: ffff8101046f6140
RDX: 0000000000000000 RSI: 00000000000000d0 RDI: ffffffff804683e0
RBP: 0000000000200200 R08: ffff8101047be000 R09: 0000000000000003
R10: 0000000000000000 R11: ffffffff8028caf0 R12: ffffffff8012c533
R13: ffff8101047bfd18 R14: 0000000000000003 R15: 0000000000000296
FS:  00002aaaaac3ed20(0000) GS:ffffffff80624880(0000) knlGS:00000000555565a0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aaaaacc1000 CR3: 00000001eeef9000 CR4: 00000000000006e0

Call Trace:<ffffffff80192814>{shrink_icache_memory+532} <ffffffff8015f20b>{shrink_slab+219}
       <ffffffff801606c9>{balance_pgdat+617} <ffffffff80160957>{kswapd+311}
       <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff801472a0>{autoremove_wake_function+0}
       <ffffffff8010e7d6>{child_rip+8} <ffffffff8011ac50>{flat_send_IPI_mask+0}
       <ffffffff80160820>{kswapd+0} <ffffffff8010e7ce>{child_rip+0}

top - 09:17:28 up 10:09,  2 users,  load average: 17.42, 8.31, 3.89
Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.2% us, 20.4% sy,  3.0% ni, 65.9% id,  6.6% wa,  0.1% hi,  1.8% si
Mem:   8173472k total,  5980464k used,  2193008k free,    36076k buffers
Swap: 14651256k total,      656k used, 14650600k free,  5593984k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   18 root      15   0     0    0    0 R 96.4  0.0 118:39.98 kswapd0
   17 root      15   0     0    0    0 R 60.3  0.0  35:26.37 kswapd1

SysRq : Show Regs
CPU 1:
Modules linked in: binfmt_misc floppy
Pid: 17, comm: kswapd1 Not tainted 2.6.15-rc2 #2
RIP: 0010:[<ffffffff803e8ab3>] <ffffffff803e8ab3>{_spin_lock+3}
RSP: 0000:ffff8101047bfd40  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff803e8951 RCX: ffff8101046f6140
RDX: 0000000000000000 RSI: 00000000000000d0 RDI: ffffffff80468400
RBP: 0000000000000296 R08: ffff8101047be000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8101047bfd88
R13: ffff8101047bfcc8 R14: ffff8101046f6140 R15: ffffffff804683e8
FS:  00002aaaaaab79e0(0000) GS:ffffffff80624880(0000) knlGS:00000000555565a0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000056f448 CR3: 000000003f0ee000 CR4: 00000000000006e0

Call Trace:<ffffffff801925dc>{dispose_list+220} <ffffffff8019260b>{shrink_icache_memory+11}
       <ffffffff801927dc>{shrink_icache_memory+476} <ffffffff8015f20b>{shrink_slab+219}
       <ffffffff801606c9>{balance_pgdat+617} <ffffffff80160957>{kswapd+311}
       <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff801472a0>{autoremove_wake_function+0}
       <ffffffff8010e7d6>{child_rip+8} <ffffffff8011ac50>{flat_send_IPI_mask+0}
       <ffffffff80160820>{kswapd+0} <ffffffff8010e7ce>{child_rip+0}

top - 09:17:46 up 10:10,  2 users,  load average: 15.03, 8.36, 4.00
Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.2% us, 20.5% sy,  3.0% ni, 65.9% id,  6.6% wa,  0.1% hi,  1.8% si
Mem:   8173472k total,  6117468k used,  2056004k free,    36460k buffers
Swap: 14651256k total,      656k used, 14650600k free,  5731640k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   18 root      15   0     0    0    0 R 86.2  0.0 118:56.67 kswapd0
   17 root      15   0     0    0    0 R 60.1  0.0  35:36.46 kswapd1

SysRq : Show Regs
CPU 1:
Modules linked in: binfmt_misc floppy
Pid: 17, comm: kswapd1 Not tainted 2.6.15-rc2 #2
RIP: 0010:[<ffffffff803e760e>] <ffffffff803e760e>{thread_return+94}
RSP: 0000:ffff8101047bfc08  EFLAGS: 00000246
RAX: ffff810103841860 RBX: 000000000050fb40 RCX: 0000000000000018
RDX: ffff810058446e40 RSI: ffff8101046f6140 RDI: 0000000000000000
RBP: 0000000000000246 R08: ffff8101047be000 R09: 0000000000000004
R10: 000000000050fb40 R11: 00000000ffffffff R12: ffffffff802645a8
R13: 0000000000000296 R14: ffff8101de939700 R15: 0000000000000008
FS:  00002aaaaaab79e0(0000) GS:ffffffff80624880(0000) knlGS:00000000555565a0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aaaaaaaf0b8 CR3: 000000003f0ee000 CR4: 00000000000006e0

Call Trace:<ffffffff8012c4c0>{__wake_up_common+64} <ffffffff803e88f8>{__down+152}
       <ffffffff8012c470>{default_wake_function+0} <ffffffff803e86e8>{__down_failed+53}
       <ffffffff80156440>{mempool_free_slab+0} <ffffffff80192cf6>{.text.lock.inode+5}
       <ffffffff8015f20b>{shrink_slab+219} <ffffffff801606c9>{balance_pgdat+617}       <ffffffff80160957>{kswapd+311} <ffffffff801472a0>{autoremove_wake_function+0}
       <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff8010e7d6>{child_rip+8}
       <ffffffff8011ac50>{flat_send_IPI_mask+0} <ffffffff80160820>{kswapd+0}
       <ffffffff8010e7ce>{child_rip+0}
top - 09:18:32 up 10:10,  2 users,  load average: 10.25, 8.02, 4.09
Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.2% us, 20.5% sy,  3.0% ni, 65.8% id,  6.6% wa,  0.1% hi,  1.8% si
Mem:   8173472k total,  6259616k used,  1913856k free,    37192k buffers
Swap: 14651256k total,      656k used, 14650600k free,  5860380k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   18 root      15   0     0    0    0 R 96.1  0.0 119:38.63 kswapd0
   17 root      15   0     0    0    0 R 56.0  0.0  36:03.12 kswapd1

SysRq : Show Regs
CPU 1:
Modules linked in: binfmt_misc floppy
Pid: 17, comm: kswapd1 Not tainted 2.6.15-rc2 #2
RIP: 0010:[<ffffffff80192827>] <ffffffff80192827>{shrink_icache_memory+551}
RSP: 0000:ffff8101047bfd78  EFLAGS: 00000256
RAX: 0000000051eb851f RBX: 0000000000000000 RCX: 0000000000000018
RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000000
RBP: 0000000000200200 R08: ffff8101047be000 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff80156440 R12: 0000000000100100
R13: ffffffff8012c470 R14: ffff8101046f6140 R15: 0000000000000001
FS:  00002aaaaac3ed20(0000) GS:ffffffff80624880(0000) knlGS:00000000555565a0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aaaaae70000 CR3: 00000000f8625000 CR4: 00000000000006e0

Call Trace:<ffffffff80192814>{shrink_icache_memory+532} <ffffffff8015f1fe>{shrink_slab+206}
       <ffffffff801606c9>{balance_pgdat+617} <ffffffff80160957>{kswapd+311}
       <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff801472a0>{autoremove_wake_function+0}
       <ffffffff8010e7d6>{child_rip+8} <ffffffff8011ac50>{flat_send_IPI_mask+0}
       <ffffffff80160820>{kswapd+0} <ffffffff8010e7ce>{child_rip+0}

top - 09:18:48 up 10:11,  2 users,  load average: 9.96, 8.08, 4.17
Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.2% us, 20.6% sy,  3.0% ni, 65.8% id,  6.6% wa,  0.1% hi,  1.8% si
Mem:   8173472k total,  6361680k used,  1811792k free,    37592k buffers
Swap: 14651256k total,      656k used, 14650600k free,  5962320k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   18 root      15   0     0    0    0 R 90.2  0.0 119:53.39 kswapd0
   17 root      15   0     0    0    0 R 70.2  0.0  36:12.62 kswapd1

SysRq : Show Regs
CPU 1:
Modules linked in: binfmt_misc floppy
Pid: 17, comm: kswapd1 Not tainted 2.6.15-rc2 #2
RIP: 0010:[<ffffffff80156e54>] <ffffffff80156e54>{__mod_page_state+36}
RSP: 0000:ffff8101047bfdc8  EFLAGS: 00000292
RAX: ffff810103844e00 RBX: ffffffff801925dc RCX: ffff8101046f6140
RDX: ffffffff8066dca0 RSI: 0000000000000080 RDI: 00000000000000f8
RBP: 0000000000000000 R08: ffff8101047be000 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff80156440 R12: 0000000000000000
R13: ffffffff80156440 R14: ffffffff803e86e8 R15: ffffffff804683e0
FS:  00002aaaaaab79e0(0000) GS:ffffffff80624880(0000) knlGS:00000000555565a0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000004472f0 CR3: 000000003f0ee000 CR4: 00000000000006e0

Call Trace:<ffffffff8015f22f>{shrink_slab+255} <ffffffff801606c9>{balance_pgdat+617}
       <ffffffff80160957>{kswapd+311} <ffffffff801472a0>{autoremove_wake_function+0}
       <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff8010e7d6>{child_rip+8}
       <ffffffff8011ac50>{flat_send_IPI_mask+0} <ffffffff80160820>{kswapd+0}
       <ffffffff8010e7ce>{child_rip+0}


-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> Specs are a basis for _talking_about_ things. But they are _not_ a basis <
> for implementing software.                              --Linus Torvalds <

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-24  8:31                 ` Jan Kasprzak
@ 2005-11-27  8:42                   ` Marcelo Tosatti
  2005-11-27 20:39                     ` Jan Kasprzak
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2005-11-27  8:42 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: Andrew Morton, nickpiggin, linux-kernel

On Thu, Nov 24, 2005 at 09:31:41AM +0100, Jan Kasprzak wrote:
> Andrew Morton wrote:
> : > # dmesg -c >/dev/null; echo -n p >/proc/sysrq-trigger ; sleep 5; dmesg
> : > SysRq : Show Regs
> : > # 
> : 
> : You won't get anything useful from sysrq-p via /proc/sysrq-trigger - it'll
> : just show the backtrace of the process `echo'.  It has to be via the
> : keyboard.
> 
> 	OK, at the end of this mail I am attaching few instances of
> ( top -n 1 -b -p 17,18 ; dmesg -c ) after sysrq-p entered via the
> serial console. However, I have now kswapd0 at 99% CPU and kswapd1 at
> about 50%, and sysrq-p is now apparently handled by the CPU1, so I get
> traces of kswapd1 only. How can I switch the serial interrupt to CPU0?
> 
> : If there's no keyboard, do `echo t > /proc/sysrq-trigger' to get an
> : all-task backtrace, then locate the trace for kswapd in the resulting
> : output.
> 
> 	I had to increase CONFIG_LOG_BUF_SHIFT first, because sysrq-t
> output was too long, but nevertheless - for the running processes there
> are no call traces, just the note that the process is in the "R" state. Here
> are few excerpts of sysrq-t:
> 
> -----
> kswapd1       S ffff810100000000     0    17      1            18    11 (L-TLB)
> ffff8101047bfea8 0000000000000046 ffff8101047bfe08 0000000000000292
>        0000000000000292 ffffffff80160589 ffff8101ae564240 000000008012cf35
>        0000000000000076 ffff8101ae564240
> Call Trace:<ffffffff80160589>{balance_pgdat+297} <ffffffff80160936>{kswapd+278}
>        <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff801472a0>{autoremove_wake_function+0}
>        <ffffffff8010e7d6>{child_rip+8} <ffffffff8011ac50>{flat_send_IPI_mask+0}
>        <ffffffff80160820>{kswapd+0} <ffffffff8010e7ce>{child_rip+0}
> kswapd0       R  running task       0    18      1            33    17 (L-TLB)
> -----
> kswapd1       S ffff810100000000     0    17      1            18    11 (L-TLB)
> ffff8101047bfea8 0000000000000046 ffff8101047bfe08 0000000000000292
>        0000000000000292 ffffffff80160589 ffff8101ae564240 000000008012cf35
>        0000000000000076 ffff8101ae564240
> Call Trace:<ffffffff80160589>{balance_pgdat+297} <ffffffff80160936>{kswapd+278}
>        <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff801472a0>{autoremove_wake_function+0}
>        <ffffffff8010e7d6>{child_rip+8} <ffffffff8011ac50>{flat_send_IPI_mask+0}
>        <ffffffff80160820>{kswapd+0} <ffffffff8010e7ce>{child_rip+0}
> kswapd0       R  running task       0    18      1            33    17 (L-TLB)
> -----

Out of curiosity, what is the size of the inode cache from /proc/slabinfo 
at this moment? 

Because even though the traces shows kswapd trying to reclaim i-cache, the percentage
of i-cache is really small:

inode_cache         1164   1224    600    6    1 : tunables   54   27    8 : slabdata    204    204      0
dentry_cache       44291  48569    224   17    1 : tunables  120   60    8 : slabdata   2857   2857     51


Maybe you should also try profile/oprofile during the kswapd peeks?

Just my two cents.

> 
> 	And here are the sysrq-p outputs:
> 
> top - 09:17:11 up 10:09,  2 users,  load average: 13.56, 7.14, 3.45
> Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
> Cpu(s):  2.2% us, 20.4% sy,  3.0% ni, 65.9% id,  6.6% wa,  0.1% hi,  1.8% si
> Mem:   8173472k total,  5769752k used,  2403720k free,    35436k buffers
> Swap: 14651256k total,      656k used, 14650600k free,  5388176k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>    18 root      15   0     0    0    0 R 91.9  0.0 118:24.00 kswapd0
>    17 root      15   0     0    0    0 R 49.8  0.0  35:17.26 kswapd1
> 
> SysRq : Show Regs
> CPU 1:
> Modules linked in: binfmt_misc floppy
> Pid: 17, comm: kswapd1 Not tainted 2.6.15-rc2 #2
> RIP: 0010:[<ffffffff8019266b>] <ffffffff8019266b>{shrink_icache_memory+107}
> RSP: 0000:ffff8101047bfd78  EFLAGS: 00000246
> RAX: 00000000ffffffff RBX: 00000000000000c8 RCX: ffff8101046f6140
> RDX: 0000000000000000 RSI: 00000000000000d0 RDI: ffffffff804683e0
> RBP: 0000000000200200 R08: ffff8101047be000 R09: 0000000000000003
> R10: 0000000000000000 R11: ffffffff8028caf0 R12: ffffffff8012c533
> R13: ffff8101047bfd18 R14: 0000000000000003 R15: 0000000000000296
> FS:  00002aaaaac3ed20(0000) GS:ffffffff80624880(0000) knlGS:00000000555565a0
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00002aaaaacc1000 CR3: 00000001eeef9000 CR4: 00000000000006e0
> 
> Call Trace:<ffffffff80192814>{shrink_icache_memory+532} <ffffffff8015f20b>{shrink_slab+219}
>        <ffffffff801606c9>{balance_pgdat+617} <ffffffff80160957>{kswapd+311}
>        <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff801472a0>{autoremove_wake_function+0}
>        <ffffffff8010e7d6>{child_rip+8} <ffffffff8011ac50>{flat_send_IPI_mask+0}
>        <ffffffff80160820>{kswapd+0} <ffffffff8010e7ce>{child_rip+0}
> 
> top - 09:17:28 up 10:09,  2 users,  load average: 17.42, 8.31, 3.89
> Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
> Cpu(s):  2.2% us, 20.4% sy,  3.0% ni, 65.9% id,  6.6% wa,  0.1% hi,  1.8% si
> Mem:   8173472k total,  5980464k used,  2193008k free,    36076k buffers
> Swap: 14651256k total,      656k used, 14650600k free,  5593984k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>    18 root      15   0     0    0    0 R 96.4  0.0 118:39.98 kswapd0
>    17 root      15   0     0    0    0 R 60.3  0.0  35:26.37 kswapd1
> 
> SysRq : Show Regs
> CPU 1:
> Modules linked in: binfmt_misc floppy
> Pid: 17, comm: kswapd1 Not tainted 2.6.15-rc2 #2
> RIP: 0010:[<ffffffff803e8ab3>] <ffffffff803e8ab3>{_spin_lock+3}
> RSP: 0000:ffff8101047bfd40  EFLAGS: 00000246
> RAX: 0000000000000000 RBX: ffffffff803e8951 RCX: ffff8101046f6140
> RDX: 0000000000000000 RSI: 00000000000000d0 RDI: ffffffff80468400
> RBP: 0000000000000296 R08: ffff8101047be000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8101047bfd88
> R13: ffff8101047bfcc8 R14: ffff8101046f6140 R15: ffffffff804683e8
> FS:  00002aaaaaab79e0(0000) GS:ffffffff80624880(0000) knlGS:00000000555565a0
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 000000000056f448 CR3: 000000003f0ee000 CR4: 00000000000006e0
> 
> Call Trace:<ffffffff801925dc>{dispose_list+220} <ffffffff8019260b>{shrink_icache_memory+11}
>        <ffffffff801927dc>{shrink_icache_memory+476} <ffffffff8015f20b>{shrink_slab+219}
>        <ffffffff801606c9>{balance_pgdat+617} <ffffffff80160957>{kswapd+311}
>        <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff801472a0>{autoremove_wake_function+0}
>        <ffffffff8010e7d6>{child_rip+8} <ffffffff8011ac50>{flat_send_IPI_mask+0}
>        <ffffffff80160820>{kswapd+0} <ffffffff8010e7ce>{child_rip+0}
> 
> top - 09:17:46 up 10:10,  2 users,  load average: 15.03, 8.36, 4.00
> Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
> Cpu(s):  2.2% us, 20.5% sy,  3.0% ni, 65.9% id,  6.6% wa,  0.1% hi,  1.8% si
> Mem:   8173472k total,  6117468k used,  2056004k free,    36460k buffers
> Swap: 14651256k total,      656k used, 14650600k free,  5731640k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>    18 root      15   0     0    0    0 R 86.2  0.0 118:56.67 kswapd0
>    17 root      15   0     0    0    0 R 60.1  0.0  35:36.46 kswapd1
> 
> SysRq : Show Regs
> CPU 1:
> Modules linked in: binfmt_misc floppy
> Pid: 17, comm: kswapd1 Not tainted 2.6.15-rc2 #2
> RIP: 0010:[<ffffffff803e760e>] <ffffffff803e760e>{thread_return+94}
> RSP: 0000:ffff8101047bfc08  EFLAGS: 00000246
> RAX: ffff810103841860 RBX: 000000000050fb40 RCX: 0000000000000018
> RDX: ffff810058446e40 RSI: ffff8101046f6140 RDI: 0000000000000000
> RBP: 0000000000000246 R08: ffff8101047be000 R09: 0000000000000004
> R10: 000000000050fb40 R11: 00000000ffffffff R12: ffffffff802645a8
> R13: 0000000000000296 R14: ffff8101de939700 R15: 0000000000000008
> FS:  00002aaaaaab79e0(0000) GS:ffffffff80624880(0000) knlGS:00000000555565a0
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00002aaaaaaaf0b8 CR3: 000000003f0ee000 CR4: 00000000000006e0
> 
> Call Trace:<ffffffff8012c4c0>{__wake_up_common+64} <ffffffff803e88f8>{__down+152}
>        <ffffffff8012c470>{default_wake_function+0} <ffffffff803e86e8>{__down_failed+53}
>        <ffffffff80156440>{mempool_free_slab+0} <ffffffff80192cf6>{.text.lock.inode+5}
>        <ffffffff8015f20b>{shrink_slab+219} <ffffffff801606c9>{balance_pgdat+617}       <ffffffff80160957>{kswapd+311} <ffffffff801472a0>{autoremove_wake_function+0}
>        <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff8010e7d6>{child_rip+8}
>        <ffffffff8011ac50>{flat_send_IPI_mask+0} <ffffffff80160820>{kswapd+0}
>        <ffffffff8010e7ce>{child_rip+0}
> top - 09:18:32 up 10:10,  2 users,  load average: 10.25, 8.02, 4.09
> Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
> Cpu(s):  2.2% us, 20.5% sy,  3.0% ni, 65.8% id,  6.6% wa,  0.1% hi,  1.8% si
> Mem:   8173472k total,  6259616k used,  1913856k free,    37192k buffers
> Swap: 14651256k total,      656k used, 14650600k free,  5860380k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>    18 root      15   0     0    0    0 R 96.1  0.0 119:38.63 kswapd0
>    17 root      15   0     0    0    0 R 56.0  0.0  36:03.12 kswapd1
> 
> SysRq : Show Regs
> CPU 1:
> Modules linked in: binfmt_misc floppy
> Pid: 17, comm: kswapd1 Not tainted 2.6.15-rc2 #2
> RIP: 0010:[<ffffffff80192827>] <ffffffff80192827>{shrink_icache_memory+551}
> RSP: 0000:ffff8101047bfd78  EFLAGS: 00000256
> RAX: 0000000051eb851f RBX: 0000000000000000 RCX: 0000000000000018
> RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000000
> RBP: 0000000000200200 R08: ffff8101047be000 R09: 0000000000000000
> R10: 0000000000000000 R11: ffffffff80156440 R12: 0000000000100100
> R13: ffffffff8012c470 R14: ffff8101046f6140 R15: 0000000000000001
> FS:  00002aaaaac3ed20(0000) GS:ffffffff80624880(0000) knlGS:00000000555565a0
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00002aaaaae70000 CR3: 00000000f8625000 CR4: 00000000000006e0
> 
> Call Trace:<ffffffff80192814>{shrink_icache_memory+532} <ffffffff8015f1fe>{shrink_slab+206}
>        <ffffffff801606c9>{balance_pgdat+617} <ffffffff80160957>{kswapd+311}
>        <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff801472a0>{autoremove_wake_function+0}
>        <ffffffff8010e7d6>{child_rip+8} <ffffffff8011ac50>{flat_send_IPI_mask+0}
>        <ffffffff80160820>{kswapd+0} <ffffffff8010e7ce>{child_rip+0}
> 
> top - 09:18:48 up 10:11,  2 users,  load average: 9.96, 8.08, 4.17
> Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
> Cpu(s):  2.2% us, 20.6% sy,  3.0% ni, 65.8% id,  6.6% wa,  0.1% hi,  1.8% si
> Mem:   8173472k total,  6361680k used,  1811792k free,    37592k buffers
> Swap: 14651256k total,      656k used, 14650600k free,  5962320k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>    18 root      15   0     0    0    0 R 90.2  0.0 119:53.39 kswapd0
>    17 root      15   0     0    0    0 R 70.2  0.0  36:12.62 kswapd1
> 
> SysRq : Show Regs
> CPU 1:
> Modules linked in: binfmt_misc floppy
> Pid: 17, comm: kswapd1 Not tainted 2.6.15-rc2 #2
> RIP: 0010:[<ffffffff80156e54>] <ffffffff80156e54>{__mod_page_state+36}
> RSP: 0000:ffff8101047bfdc8  EFLAGS: 00000292
> RAX: ffff810103844e00 RBX: ffffffff801925dc RCX: ffff8101046f6140
> RDX: ffffffff8066dca0 RSI: 0000000000000080 RDI: 00000000000000f8
> RBP: 0000000000000000 R08: ffff8101047be000 R09: 0000000000000000
> R10: 0000000000000000 R11: ffffffff80156440 R12: 0000000000000000
> R13: ffffffff80156440 R14: ffffffff803e86e8 R15: ffffffff804683e0
> FS:  00002aaaaaab79e0(0000) GS:ffffffff80624880(0000) knlGS:00000000555565a0
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000000004472f0 CR3: 000000003f0ee000 CR4: 00000000000006e0
> 
> Call Trace:<ffffffff8015f22f>{shrink_slab+255} <ffffffff801606c9>{balance_pgdat+617}
>        <ffffffff80160957>{kswapd+311} <ffffffff801472a0>{autoremove_wake_function+0}
>        <ffffffff801472a0>{autoremove_wake_function+0} <ffffffff8010e7d6>{child_rip+8}
>        <ffffffff8011ac50>{flat_send_IPI_mask+0} <ffffffff80160820>{kswapd+0}
>        <ffffffff8010e7ce>{child_rip+0}
> 
> 
> -Yenya
> 
> -- 
> | Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
> | GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
> | http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> > Specs are a basis for _talking_about_ things. But they are _not_ a basis <
> > for implementing software.                              --Linus Torvalds <
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-27  8:42                   ` Marcelo Tosatti
@ 2005-11-27 20:39                     ` Jan Kasprzak
  2005-11-27 15:45                       ` Marcelo Tosatti
  2005-11-27 16:02                       ` Marcelo Tosatti
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Kasprzak @ 2005-11-27 20:39 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Andrew Morton, nickpiggin, linux-kernel

Marcelo Tosatti wrote:
: Out of curiosity, what is the size of the inode cache from /proc/slabinfo 
: at this moment? 
: 
: Because even though the traces shows kswapd trying to reclaim i-cache, the percentage
: of i-cache is really small:
: 
: inode_cache         1164   1224    600    6    1 : tunables   54   27    8 : slabdata    204    204      0
: dentry_cache       44291  48569    224   17    1 : tunables  120   60    8 : slabdata   2857   2857     51
: 
	The size of icache is similar to that shown above:

# egrep '^(inode|dentry)_cache' /proc/slabinfo

inode_cache         1189   1326    600    6    1 : tunables   54   27    8 : slabdata    221    221      0
dentry_cache       41845  45509    224   17    1 : tunables  120   60    8 : slabdata   2677   2677      0

inode_cache         1212   1326    600    6    1 : tunables   54   27    8 : slabdata    221    221      0
dentry_cache       42662  48892    224   17    1 : tunables  120   60    8 : slabdata   2876   2876    288

	However, the traces I have sent are traces of kswapd1, which at that
time was eating around 50% of CPU, while kswapd0 was at >95%. I have not
managed to get the trace of kswapd0 yet.

	I have tried to bind the serial IRQ to CPU0 to get the trace of
kswapd0 (echo 1 >/proc/irq/4/smp_affinity). After sysrq-p I get the dump
of registers at CPU0, but the strange thing is, that I get the stack trace
of kacpid instead of kswapd0 (kacpid is not even visible in top(1) output,
and it has a total of 0 seconds of CPU time consumed since boot, while kswapd0
is first in top(1) eating >95% of CPU). Why kacpid? When I bind the serial IRQ
to CPU1, I get the trace of PID 0 (swapper).

	The task that probably triggers this problem is a cron job
doing full-text indexing of mailing list archive, so it accesses lots
of small files, and then recreates the inverted index, which is one big
file. So maybe inode cache shrinking or something may be the problem there.
However, the cron job does an incremental reindexing only, so I think it
reads less than 100 files per each run.
: 
: Maybe you should also try profile/oprofile during the kswapd peeks?
: 
	Do you have any details on it? I can of course RTFdocs of oprofile,
but should I try to catch something special?

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> Specs are a basis for _talking_about_ things. But they are _not_ a basis <
> for implementing software.                              --Linus Torvalds <

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-27 20:39                     ` Jan Kasprzak
@ 2005-11-27 15:45                       ` Marcelo Tosatti
  2005-11-27 16:02                       ` Marcelo Tosatti
  1 sibling, 0 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2005-11-27 15:45 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: Andrew Morton, nickpiggin, linux-kernel

On Sun, Nov 27, 2005 at 09:39:24PM +0100, Jan Kasprzak wrote:
> Marcelo Tosatti wrote:
> : Out of curiosity, what is the size of the inode cache from /proc/slabinfo 
> : at this moment? 
> : 
> : Because even though the traces shows kswapd trying to reclaim i-cache, the percentage
> : of i-cache is really small:
> : 
> : inode_cache         1164   1224    600    6    1 : tunables   54   27    8 : slabdata    204    204      0
> : dentry_cache       44291  48569    224   17    1 : tunables  120   60    8 : slabdata   2857   2857     51
> : 
> 	The size of icache is similar to that shown above:
> 
> # egrep '^(inode|dentry)_cache' /proc/slabinfo
> 
> inode_cache         1189   1326    600    6    1 : tunables   54   27    8 : slabdata    221    221      0
> dentry_cache       41845  45509    224   17    1 : tunables  120   60    8 : slabdata   2677   2677      0
> 
> inode_cache         1212   1326    600    6    1 : tunables   54   27    8 : slabdata    221    221      0
> dentry_cache       42662  48892    224   17    1 : tunables  120   60    8 : slabdata   2876   2876    288
> 
> 	However, the traces I have sent are traces of kswapd1, which at that
> time was eating around 50% of CPU, while kswapd0 was at >95%. I have not
> managed to get the trace of kswapd0 yet.
> 
> 	I have tried to bind the serial IRQ to CPU0 to get the trace of
> kswapd0 (echo 1 >/proc/irq/4/smp_affinity). After sysrq-p I get the dump
> of registers at CPU0, but the strange thing is, that I get the stack trace
> of kacpid instead of kswapd0 (kacpid is not even visible in top(1) output,
> and it has a total of 0 seconds of CPU time consumed since boot, while kswapd0
> is first in top(1) eating >95% of CPU). Why kacpid? When I bind the serial IRQ
> to CPU1, I get the trace of PID 0 (swapper).

Have no idea, sorry.

> 	The task that probably triggers this problem is a cron job
> doing full-text indexing of mailing list archive, so it accesses lots
> of small files, and then recreates the inverted index, which is one big
> file. So maybe inode cache shrinking or something may be the problem there.
> However, the cron job does an incremental reindexing only, so I think it
> reads less than 100 files per each run.
> : 
> : Maybe you should also try profile/oprofile during the kswapd peeks?
> : 
> 	Do you have any details on it? I can of course RTFdocs of oprofile,
> but should I try to catch something special?

Try to isolate the profile trace to the period which kswapd goes mad.

That is, reset the profiling buffer when you notice kswapd going crazy,
let it profile for a few minutes (no idea how long the load takes),
and stop before kswapd is back to idleness (preferably while its still
burning the CPU).

I think you can try the old profiler first (CONFIG_PROFILE=y, boot with
readprofile=2, use readprofile..).

If that does not give meaningful data then try oprofile.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-27 20:39                     ` Jan Kasprzak
  2005-11-27 15:45                       ` Marcelo Tosatti
@ 2005-11-27 16:02                       ` Marcelo Tosatti
  2005-11-27 23:21                         ` Andrew Morton
  1 sibling, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2005-11-27 16:02 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: Andrew Morton, nickpiggin, linux-kernel, Bharata B Rao

Hi Jan,

> 	The task that probably triggers this problem is a cron job
> doing full-text indexing of mailing list archive, so it accesses lots
> of small files, and then recreates the inverted index, which is one big
> file. So maybe inode cache shrinking or something may be the problem there.
> However, the cron job does an incremental reindexing only, so I think it
> reads less than 100 files per each run.
> : 
> : Maybe you should also try profile/oprofile during the kswapd peeks?
> : 
> 	Do you have any details on it? I can of course RTFdocs of oprofile,
> but should I try to catch something special?

It does seem to scan SLABs intensively:

pgscan_kswapd_high 0
pgscan_kswapd_normal 940269891
pgscan_kswapd_dma 0
pgscan_direct_high 0
pgscan_direct_normal 13837131
pgscan_direct_dma 0
pginodesteal 11216563
slabs_scanned 160160350534400
kswapd_steal 909876526
kswapd_inodesteal 305039060
pageoutrun 30139677
allocstall 4067783

If you take the amont of scanned slabs and divide by the sum of
direct/kswapd pagescans:

160160350534400 / (940269891+13837131) = 167864

Which means that for each page scanned about 168000 slab entries are
scanned. Does not look very good.

Other than the profiling can you please also try Bharata's
shrinkable slab cache statistics patch?

http://lkml.org/lkml/2005/10/26/1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-27 16:02                       ` Marcelo Tosatti
@ 2005-11-27 23:21                         ` Andrew Morton
  2005-11-28 13:16                           ` Jan Kasprzak
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2005-11-27 23:21 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kas, nickpiggin, linux-kernel, bharata

Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
>
> It does seem to scan SLABs intensively:
>

It might be worth trying the below.  Mainly for the debugging check.


From: Andrea Arcangeli <andrea@suse.de>

Greg Edwards found some deadlock in the icache shrinker.

I believe the major bug is that the VM is currently potentially setting nr
= LONG_MAX before shrinking the icache (and the icache shrinker never
returns -1, which means the api doesn't currently require shrinkers to
return -1 when they're finished).

The below is the most obviously safe way I could address this problem
(still untested).

This is not necessairly the way we want to fix it in mainline, but it at
least shows what I believe to be the major cuplrit in the code (i.e.  nr
growing insane ;).

Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 mm/vmscan.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff -puN mm/vmscan.c~shrinker-nr-=-long_max-means-deadlock-for-icache mm/vmscan.c
--- devel/mm/vmscan.c~shrinker-nr-=-long_max-means-deadlock-for-icache	2005-11-19 02:53:18.000000000 -0800
+++ devel-akpm/mm/vmscan.c	2005-11-19 03:00:25.000000000 -0800
@@ -201,13 +201,25 @@ static int shrink_slab(unsigned long sca
 	list_for_each_entry(shrinker, &shrinker_list, list) {
 		unsigned long long delta;
 		unsigned long total_scan;
+		unsigned long max_pass = (*shrinker->shrinker)(0, gfp_mask);
 
 		delta = (4 * scanned) / shrinker->seeks;
-		delta *= (*shrinker->shrinker)(0, gfp_mask);
+		delta *= max_pass;
 		do_div(delta, lru_pages + 1);
 		shrinker->nr += delta;
-		if (shrinker->nr < 0)
-			shrinker->nr = LONG_MAX;	/* It wrapped! */
+		if (shrinker->nr < 0) {
+			printk(KERN_ERR "%s: nr=%ld\n",
+					__FUNCTION__, shrinker->nr);
+			shrinker->nr = max_pass;
+		}
+
+		/*
+		 * Avoid risking looping forever due to too large nr value:
+		 * never try to free more than twice the estimate number of
+		 * freeable entries.
+		 */
+		if (shrinker->nr > max_pass * 2)
+			shrinker->nr = max_pass * 2;
 
 		total_scan = shrinker->nr;
 		shrinker->nr = 0;
_


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-27 23:21                         ` Andrew Morton
@ 2005-11-28 13:16                           ` Jan Kasprzak
  2005-11-28  8:04                             ` Marcelo Tosatti
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kasprzak @ 2005-11-28 13:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Marcelo Tosatti, nickpiggin, linux-kernel, bharata

Andrew Morton wrote:
: Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
: >
: > It does seem to scan SLABs intensively:
: >
: It might be worth trying the below.  Mainly for the debugging check.
: 
	I have compiled a new kernel - 2.6.15-rc2 with the patch you
recommended and with the slab statistics patch Marcelo mentioned.
I have add the oprofile support, but apart from that it is the same
kernel. It seems that the kswapd system time peaks has disappeared,
or at least they are much lower - kswapd0 has eaten ~3 minutes from
11 hours of uptime (in one of my previous mails I found that it used
to be 117 minutes after ~10 hours of uptime). On my MRTG graphs
at http://www.linux.cz/stats/mrtg-rrd/cpu.html some _small_ peaks
can be seen at 15 minutes after every odd-numbered hour. I have booted
this kernel around 2am local time.

	I have no unusual error messages in dmesg output, so this must
be this part of the patch:

: +		/*
: +		 * Avoid risking looping forever due to too large nr value:
: +		 * never try to free more than twice the estimate number of
: +		 * freeable entries.
: +		 */
: +		if (shrinker->nr > max_pass * 2)
: +			shrinker->nr = max_pass * 2;

	The shrinker statistics displayed in /proc/slabinfo are
# egrep '^(inode|dentry)_cache' /proc/slabinfo
inode_cache         1338   1380    600    6    1 : tunables   54   27    8 : slabdata    230    230      0 : shrinker stat 261765504 16831100
dentry_cache       40195  49130    224   17    1 : tunables  120   60    8 : slabdata   2890   2890    204 : shrinker stat 57946368 28877600

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> Specs are a basis for _talking_about_ things. But they are _not_ a basis <
> for implementing software.                              --Linus Torvalds <

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-28 13:16                           ` Jan Kasprzak
@ 2005-11-28  8:04                             ` Marcelo Tosatti
  2005-12-06  0:10                               ` Jan Kasprzak
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2005-11-28  8:04 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: Andrew Morton, nickpiggin, linux-kernel, bharata


On Mon, Nov 28, 2005 at 02:16:48PM +0100, Jan Kasprzak wrote:
> Andrew Morton wrote:
> : Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
> : >
> : > It does seem to scan SLABs intensively:
> : >
> : It might be worth trying the below.  Mainly for the debugging check.
> : 
> 	I have compiled a new kernel - 2.6.15-rc2 with the patch you
> recommended and with the slab statistics patch Marcelo mentioned.
> I have add the oprofile support, but apart from that it is the same
> kernel. It seems that the kswapd system time peaks has disappeared,
> or at least they are much lower - kswapd0 has eaten ~3 minutes from
> 11 hours of uptime (in one of my previous mails I found that it used
> to be 117 minutes after ~10 hours of uptime). On my MRTG graphs
> at http://www.linux.cz/stats/mrtg-rrd/cpu.html some _small_ peaks
> can be seen at 15 minutes after every odd-numbered hour. I have booted
> this kernel around 2am local time.
> 
> 	I have no unusual error messages in dmesg output, so this must
> be this part of the patch:
> 
> : +		/*
> : +		 * Avoid risking looping forever due to too large nr value:
> : +		 * never try to free more than twice the estimate number of
> : +		 * freeable entries.
> : +		 */
> : +		if (shrinker->nr > max_pass * 2)
> : +			shrinker->nr = max_pass * 2;

Yep, great.

> 
> 	The shrinker statistics displayed in /proc/slabinfo are
> # egrep '^(inode|dentry)_cache' /proc/slabinfo
> inode_cache         1338   1380    600    6    1 : tunables   54   27    8 : slabdata    230    230      0 : shrinker stat 261765504 16831100
> dentry_cache       40195  49130    224   17    1 : tunables  120   60    8 : slabdata   2890   2890    204 : shrinker stat 57946368 28877600

Interesting, the success/attempt reclaim ratio for the dentry cache is
about 1/2 (pretty good), while the inode cache ratio is 1/15 (not so good).

I wonder why prune_icache() does not move inodes with positive i_count
to inode_inuse list, letting iput() take care of moving to unused
once the count reaches zero.

                inode = list_entry(inode_unused.prev, struct inode, i_list);
                if (inode->i_state || atomic_read(&inode->i_count)) {
                        list_move(&inode->i_list, &inode_unused);
                        continue;
                }

Couldnt it be 
			list_move(&inode->i_list, &inode_inuse);

?






^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-11-28  8:04                             ` Marcelo Tosatti
@ 2005-12-06  0:10                               ` Jan Kasprzak
  2005-12-06 18:14                                 ` Marcelo Tosatti
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kasprzak @ 2005-12-06  0:10 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Andrew Morton, nickpiggin, linux-kernel, bharata

Marcelo Tosatti wrote:
: I wonder why prune_icache() does not move inodes with positive i_count
: to inode_inuse list, letting iput() take care of moving to unused
: once the count reaches zero.
: 
:                 inode = list_entry(inode_unused.prev, struct inode, i_list);
:                 if (inode->i_state || atomic_read(&inode->i_count)) {
:                         list_move(&inode->i_list, &inode_unused);
:                         continue;
:                 }
: 
: Couldnt it be 
: 			list_move(&inode->i_list, &inode_inuse);
: 
: ?

	Hmm, this code is indeed strange. Why does it move the inode
to the inode_unused list, when the inode has in fact been _found_ while
scanning the inode_unused list? And how can an inode with positive
->i_count end up on the inode_unused list?

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> Specs are a basis for _talking_about_ things. But they are _not_ a basis <
> for implementing software.                              --Linus Torvalds <

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.6.14 kswapd eating too much CPU
  2005-12-06  0:10                               ` Jan Kasprzak
@ 2005-12-06 18:14                                 ` Marcelo Tosatti
  0 siblings, 0 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2005-12-06 18:14 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: Andrew Morton, nickpiggin, linux-kernel, bharata

On Tue, Dec 06, 2005 at 01:10:06AM +0100, Jan Kasprzak wrote:
> Marcelo Tosatti wrote:
> : I wonder why prune_icache() does not move inodes with positive i_count
> : to inode_inuse list, letting iput() take care of moving to unused
> : once the count reaches zero.
> : 
> :                 inode = list_entry(inode_unused.prev, struct inode, i_list);
> :                 if (inode->i_state || atomic_read(&inode->i_count)) {
> :                         list_move(&inode->i_list, &inode_unused);
> :                         continue;
> :                 }
> : 
> : Couldnt it be 
> : 			list_move(&inode->i_list, &inode_inuse);
> : 
> : ?
> 
> 	Hmm, this code is indeed strange. Why does it move the inode
> to the inode_unused list, when the inode has in fact been _found_ while
> scanning the inode_unused list? 

It just moves to the head of the list, for later scanning.

> And how can an inode with positive ->i_count end up on the
> inode_unused list?

Such inodes only end up in the unused list during superblock
shutdown, so they should not be a problem actually (my bad).


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2005-12-06 18:14 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-22 12:59 2.6.14 kswapd eating too much CPU Jan Kasprzak
2005-11-23  0:35 ` Andrew Morton
2005-11-23  1:01   ` Jan Kasprzak
2005-11-23  2:19     ` Nick Piggin
2005-11-23  5:13       ` Jan Kasprzak
2005-11-23 13:14         ` Jan Kasprzak
2005-11-23 19:02           ` Andrew Morton
2005-11-23 20:24             ` Jan Kasprzak
2005-11-23 20:32               ` Jesper Juhl
2005-11-23 20:34                 ` Jan Kasprzak
2005-11-23 20:35               ` Andrew Morton
2005-11-24  8:31                 ` Jan Kasprzak
2005-11-27  8:42                   ` Marcelo Tosatti
2005-11-27 20:39                     ` Jan Kasprzak
2005-11-27 15:45                       ` Marcelo Tosatti
2005-11-27 16:02                       ` Marcelo Tosatti
2005-11-27 23:21                         ` Andrew Morton
2005-11-28 13:16                           ` Jan Kasprzak
2005-11-28  8:04                             ` Marcelo Tosatti
2005-12-06  0:10                               ` Jan Kasprzak
2005-12-06 18:14                                 ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox