* 2.6.8.1 mempool subsystem sickness @ 2004-09-08 16:48 jmerkey 2004-09-08 23:05 ` Nick Piggin 0 siblings, 1 reply; 10+ messages in thread From: jmerkey @ 2004-09-08 16:48 UTC (permalink / raw) To: linux-kernel; +Cc: jmerkey On a system with 4GB of memory, and without the user space patch that spilts user space just a stock kernel, I am seeing memory allocation failures with X server and simple apps on a machine with a Pentium 4 processor and 500MB of memory. If you load large apps and do a lot of skb traffic, the mempool abd slab caches start gobbling up pages and don't seem to balance them very well, resulting in memory allocation failures over time if the system stays up for a week or more. I am also seeing the same behavior on another system which has been running for almost 30 days with an skb based traffic regeneration test calling and sending skb's in kernel between two interfaces. The pages over time get stuck in the slab allocator and user space apps start to fail on alloc requests. Rebooting the system clears the problem, which slowly over time comes back. I am seeing this with stock kernels from kernel.org and on kernels I have patched, so the problem seems to be in the base code. I have spent the last two weeks observing the problem to verify I can reproduce it and it keeps happening. Jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness 2004-09-08 16:48 2.6.8.1 mempool subsystem sickness jmerkey @ 2004-09-08 23:05 ` Nick Piggin 0 siblings, 0 replies; 10+ messages in thread From: Nick Piggin @ 2004-09-08 23:05 UTC (permalink / raw) To: jmerkey; +Cc: linux-kernel, jmerkey jmerkey@comcast.net wrote: > On a system with 4GB of memory, and without > the user space patch that spilts user space > just a stock kernel, I am seeing memory > allocation failures with X server and simple > apps on a machine with a Pentium 4 > processor and 500MB of memory. > > If you load large apps and do a lot of > skb traffic, the mempool abd slab > caches start gobbling up pages > and don't seem to balance them > very well, resulting in memory > allocation failures over time if > the system stays up for a week > or more. > > I am also seeing the same behavior > on another system which has been > running for almost 30 days with > an skb based traffic regeneration > test calling and sending skb's > in kernel between two interfaces. > > The pages over time get stuck > in the slab allocator and user > space apps start to fail on alloc > requests. > > Rebooting the system clears > the problem, which slowly over time > comes back. I am seeing this with > stock kernels from kernel.org > and on kernels I have patched, > so the problem seems to be > in the base code. I have spent > the last two weeks observing > the problem to verify I can > reproduce it and it keeps > happening. > > Jeff > Hi Jeff, Can you give us a few more details please? Post the allocation failure messages in full, and post /proc/meminfo, etc. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <091420042058.15928.41475B8000002BA100003E382200763704970A059D0A0306@comcast.net>]
* Re: 2.6.8.1 mempool subsystem sickness [not found] <091420042058.15928.41475B8000002BA100003E382200763704970A059D0A0306@comcast.net> @ 2004-09-14 20:32 ` Jeff V. Merkey 2004-09-14 22:59 ` Nick Piggin 0 siblings, 1 reply; 10+ messages in thread From: Jeff V. Merkey @ 2004-09-14 20:32 UTC (permalink / raw) To: Nick Piggin, linux-kernel, jmerkey Hi Jeff, > Can you give us a few more details please? Post the allocation failure > messages in full, and post /proc/meminfo, etc. Thanks. > - > Here you go. Jeff Sep 14 14:18:59 datascout4 kernel: if_regen2: page allocation failure. order:3, mode:0x20 Sep 14 14:18:59 datascout4 kernel: [<80106c7e>] dump_stack+0x1e/0x30 Sep 14 14:18:59 datascout4 kernel: [<80134aac>] __alloc_pages+0x2dc/0x350 Sep 14 14:18:59 datascout4 kernel: [<80134b42>] __get_free_pages+0x22/0x50 Sep 14 14:18:59 datascout4 kernel: [<80137d9f>] kmem_getpages+0x1f/0xd0 Sep 14 14:18:59 datascout4 kernel: [<8013897a>] cache_grow+0x9a/0x130 Sep 14 14:18:59 datascout4 kernel: [<80138b4b>] cache_alloc_refill+0x13b/0x1e0 Sep 14 14:18:59 datascout4 kernel: [<80138fa4>] __kmalloc+0x74/0x80 Sep 14 14:18:59 datascout4 kernel: [<80299298>] alloc_skb+0x48/0xf0 Sep 14 14:18:59 datascout4 kernel: [<f8972e67>] create_xmit_packet+0x57/0x100 [dsfs] Sep 14 14:18:59 datascout4 kernel: [<f8973150>] regen_data+0x60/0x1d0 [dsfs] Sep 14 14:18:59 datascout4 kernel: [<80104355>] kernel_thread_helper+0x5/0x10 Sep 14 14:18:59 datascout4 kernel: printk: 12 messages suppressed. MemTotal: 1944860 kB MemFree: 200008 kB Buffers: 133772 kB Cached: 678268 kB SwapCached: 208 kB Active: 435712 kB Inactive: 436756 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 1944860 kB LowFree: 200008 kB SwapTotal: 1052248 kB SwapFree: 1051976 kB Dirty: 7960 kB Writeback: 0 kB Mapped: 68160 kB Slab: 863556 kB Committed_AS: 92220 kB PageTables: 1344 kB VmallocTotal: 122804 kB VmallocUsed: 2872 kB VmallocChunk: 119856 kB evdev 7552 0 - Live 0xf8aa3000 ipx 24364 0 - Live 0xf89f5000 appletalk 29748 0 - Live 0xf8ad8000 parport_pc 22464 1 - Live 0xf89e3000 lp 9644 0 - Live 0xf880f000 parport 34376 2 parport_pc,lp, Live 0xf8aab000 autofs4 15492 0 - Live 0xf89f0000 rfcomm 32412 0 - Live 0xf8a96000 l2cap 19460 5 rfcomm, Live 0xf89ea000 bluetooth 39812 4 rfcomm,l2cap, Live 0xf89ce000 sunrpc 126436 1 - Live 0xf8ab8000 e1000 83460 0 - Live 0xf89fc000 sg 33568 0 - Live 0xf89d9000 microcode 5536 0 - Live 0xf8851000 ide_cd 36512 0 - Live 0xf890c000 cdrom 35868 1 ide_cd, Live 0xf8867000 dsfs 269912 1 - Live 0xf896d000 sd_mod 17280 2 - Live 0xf8861000 3w_xxxx 35108 2 - Live 0xf8822000 scsi_mod 103244 3 sg,sd_mod,3w_xxxx, Live 0xf8923000 dm_mod 49788 0 - Live 0xf8871000 orinoco_usb 22440 0 - Live 0xf8847000 firmware_class 7424 1 orinoco_usb, Live 0xf8813000 orinoco 44048 1 orinoco_usb, Live 0xf8855000 uhci_hcd 28944 0 - Live 0xf8819000 usbcore 99300 4 orinoco_usb,uhci_hcd, Live 0xf882d000 slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> bt_sock 3 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 rpc_buffers 8 8 2048 2 1 : tunables 24 12 0 : slabdata 4 4 0 rpc_tasks 8 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0 rpc_inode_cache 6 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0 ip_fib_hash 11 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 scsi_cmd_cache 160 160 384 10 1 : tunables 54 27 0 : slabdata 16 16 0 sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0 sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0 sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0 sgpool-8 217 217 128 31 1 : tunables 120 60 0 : slabdata 7 7 0 dm_tio 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 dm_io 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 uhci_urb_priv 1 88 44 88 1 : tunables 120 60 0 : slabdata 1 1 0 dn_fib_info_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 dn_dst_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 dn_neigh_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 decnet_socket_cache 0 0 768 5 1 : tunables 54 27 0 : slabdata 0 0 0 clip_arp_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 xfrm6_tunnel_spi 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 fib6_nodes 15 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 ip6_dst_cache 60 105 256 15 1 : tunables 120 60 0 : slabdata 7 7 0 ndisc_cache 5 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0 raw6_sock 0 0 640 6 1 : tunables 54 27 0 : slabdata 0 0 0 udp6_sock 0 0 640 6 1 : tunables 54 27 0 : slabdata 0 0 0 tcp6_sock 7 7 1152 7 2 : tunables 24 12 0 : slabdata 1 1 0 unix_sock 50 50 384 10 1 : tunables 54 27 0 : slabdata 5 5 0 ip_mrt_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 tcp_tw_bucket 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 tcp_bind_bucket 8 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_open_request 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 inet_peer_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 secpath_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 xfrm_dst_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 ip_dst_cache 20 30 256 15 1 : tunables 120 60 0 : slabdata 2 2 0 arp_cache 1 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 raw4_sock 0 0 512 7 1 : tunables 54 27 0 : slabdata 0 0 0 udp_sock 3 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0 tcp_sock 16 16 1024 4 1 : tunables 54 27 0 : slabdata 4 4 0 flow_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 isofs_inode_cache 0 0 384 10 1 : tunables 54 27 0 : slabdata 0 0 0 fat_inode_cache 0 0 384 10 1 : tunables 54 27 0 : slabdata 0 0 0 ext2_inode_cache 0 0 512 7 1 : tunables 54 27 0 : slabdata 0 0 0 journal_handle 8 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0 journal_head 456 2349 48 81 1 : tunables 120 60 0 : slabdata 29 29 0 revoke_table 4 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0 revoke_record 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 ext3_inode_cache 336400 340655 512 7 1 : tunables 54 27 0 : slabdata 48665 48665 0 ext3_xattr 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0 reiser_inode_cache 0 0 384 10 1 : tunables 54 27 0 : slabdata 0 0 0 dquot 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_epi 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 kioctx 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 kiocb 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 dnotify_cache 1 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 file_lock_cache 2 43 92 43 1 : tunables 120 60 0 : slabdata 1 1 0 fasync_cache 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 shmem_inode_cache 8 14 512 7 1 : tunables 54 27 0 : slabdata 2 2 0 posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uid_cache 9 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 cfq_pool 64 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 crq_pool 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0 as_arq 200 260 60 65 1 : tunables 120 60 0 : slabdata 4 4 0 blkdev_ioc 35 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_queue 21 27 448 9 1 : tunables 54 27 0 : slabdata 3 3 0 blkdev_requests 252 286 152 26 1 : tunables 120 60 0 : slabdata 11 11 0 biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0 biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0 biovec-64 256 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0 biovec-16 131328 131340 256 15 1 : tunables 120 60 0 : slabdata 8756 8756 0 biovec-4 256 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0 biovec-1 404 678 16 226 1 : tunables 120 60 0 : slabdata 3 3 0 bio 131457 131577 64 61 1 : tunables 120 60 0 : slabdata 2157 2157 0 sock_inode_cache 60 60 384 10 1 : tunables 54 27 0 : slabdata 6 6 0 skbuff_head_cache 2130 2370 256 15 1 : tunables 120 60 0 : slabdata 158 158 0 sock 4 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 proc_inode_cache 409 470 384 10 1 : tunables 54 27 0 : slabdata 47 47 0 sigqueue 27 27 148 27 1 : tunables 120 60 0 : slabdata 1 1 0 radix_tree_node 53400 57358 276 14 1 : tunables 54 27 0 : slabdata 4097 4097 0 bdev_cache 8 14 512 7 1 : tunables 54 27 0 : slabdata 2 2 0 mnt_cache 21 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 inode_cache 3299 3310 384 10 1 : tunables 54 27 0 : slabdata 331 331 0 dentry_cache 198368 254576 140 28 1 : tunables 120 60 0 : slabdata 9092 9092 0 filp 675 675 256 15 1 : tunables 120 60 0 : slabdata 45 45 0 names_cache 4 4 4096 1 1 : tunables 24 12 0 : slabdata 4 4 0 idr_layer_cache 88 116 136 29 1 : tunables 120 60 0 : slabdata 4 4 0 buffer_head 193153 230931 48 81 1 : tunables 120 60 0 : slabdata 2851 2851 0 mm_struct 70 70 512 7 1 : tunables 54 27 0 : slabdata 10 10 0 vm_area_struct 1726 1786 84 47 1 : tunables 120 60 0 : slabdata 38 38 0 fs_cache 106 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 files_cache 70 70 512 7 1 : tunables 54 27 0 : slabdata 10 10 0 signal_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 sighand_cache 90 90 1408 5 2 : tunables 24 12 0 : slabdata 18 18 0 task_struct 95 95 1424 5 2 : tunables 24 12 0 : slabdata 19 19 0 anon_vma 814 814 8 407 1 : tunables 120 60 0 : slabdata 2 2 0 pgd 65 65 4096 1 1 : tunables 24 12 0 : slabdata 65 65 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 2 2 131072 1 32 : tunables 8 4 0 : slabdata 2 2 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 8234 8234 65536 1 16 : tunables 8 4 0 : slabdata 8234 8234 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 7 16 32768 1 8 : tunables 8 4 0 : slabdata 7 16 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 6 6 16384 1 4 : tunables 8 4 0 : slabdata 6 6 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 105 109 8192 1 2 : tunables 8 4 0 : slabdata 105 109 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0 size-4096 2099 2111 4096 1 1 : tunables 24 12 0 : slabdata 2099 2111 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0 size-2048 8276 8276 2048 2 1 : tunables 24 12 0 : slabdata 4138 4138 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0 size-1024 140 140 1024 4 1 : tunables 54 27 0 : slabdata 35 35 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 size-512 176 560 512 8 1 : tunables 54 27 0 : slabdata 70 70 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 size-256 222 480 256 15 1 : tunables 120 60 0 : slabdata 32 32 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 size-128 1807 1829 128 31 1 : tunables 120 60 0 : slabdata 59 59 0 size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 size-64 7762 8662 64 61 1 : tunables 120 60 0 : slabdata 142 142 0 size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 size-32 15764 16184 32 119 1 : tunables 120 60 0 : slabdata 136 136 0 kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 nr_dirty 1316 nr_writeback 0 nr_unstable 0 nr_page_table_pages 336 nr_mapped 18227 nr_slab 215890 pgpgin 1496713 pgpgout 1020309568 pswpin 365 pswpout 3752 pgalloc_high 0 pgalloc_normal 1578465026 pgalloc_dma 319286 pgfree 1578828064 pgactivate 592523 pgdeactivate 198222 pgfault 118936465 pgmajfault 1721 pgrefill_high 0 pgrefill_normal 199636 pgrefill_dma 50800 pgsteal_high 0 pgsteal_normal 172811 pgsteal_dma 26364 pgscan_kswapd_high 0 pgscan_kswapd_normal 79497 pgscan_kswapd_dma 140076 pgscan_direct_high 0 pgscan_direct_normal 99231 pgscan_direct_dma 3795 pginodesteal 3837 slabs_scanned 847283 kswapd_steal 101698 kswapd_inodesteal 35700 pageoutrun 282 allocstall 2977 pgrotated 20310 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness 2004-09-14 20:32 ` Jeff V. Merkey @ 2004-09-14 22:59 ` Nick Piggin [not found] ` <20040914223122.GA3325@galt.devicelogics.com> 0 siblings, 1 reply; 10+ messages in thread From: Nick Piggin @ 2004-09-14 22:59 UTC (permalink / raw) To: Jeff V. Merkey; +Cc: linux-kernel, jmerkey Jeff V. Merkey wrote: > > Hi Jeff, > >> Can you give us a few more details please? Post the allocation failure >> messages in full, and post /proc/meminfo, etc. Thanks. >> - >> > Here you go. > Thanks. > Jeff > > Sep 14 14:18:59 datascout4 kernel: if_regen2: page allocation failure. > order:3, mode:0x20 > Sep 14 14:18:59 datascout4 kernel: [<80106c7e>] dump_stack+0x1e/0x30 > Sep 14 14:18:59 datascout4 kernel: [<80134aac>] __alloc_pages+0x2dc/0x350 > Sep 14 14:18:59 datascout4 kernel: [<80134b42>] __get_free_pages+0x22/0x50 > Sep 14 14:18:59 datascout4 kernel: [<80137d9f>] kmem_getpages+0x1f/0xd0 > Sep 14 14:18:59 datascout4 kernel: [<8013897a>] cache_grow+0x9a/0x130 > Sep 14 14:18:59 datascout4 kernel: [<80138b4b>] > cache_alloc_refill+0x13b/0x1e0 > Sep 14 14:18:59 datascout4 kernel: [<80138fa4>] __kmalloc+0x74/0x80 > Sep 14 14:18:59 datascout4 kernel: [<80299298>] alloc_skb+0x48/0xf0 > Sep 14 14:18:59 datascout4 kernel: [<f8972e67>] > create_xmit_packet+0x57/0x100 [dsfs] > Sep 14 14:18:59 datascout4 kernel: [<f8973150>] regen_data+0x60/0x1d0 > [dsfs] > Sep 14 14:18:59 datascout4 kernel: [<80104355>] > kernel_thread_helper+0x5/0x10 > Sep 14 14:18:59 datascout4 kernel: printk: 12 messages suppressed. > > MemTotal: 1944860 kB > MemFree: 200008 kB Wow. You have 200MB free, and can't satisfy an order 3 allocation (although it is atomic). Now it just so happens that I have a couple of patches that are supposed to fix exactly this. Unfortunately I haven't had the hardware to properly test them (they're pretty stable though). Want to give them a try? :) ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20040914223122.GA3325@galt.devicelogics.com>]
* Re: 2.6.8.1 mempool subsystem sickness [not found] ` <20040914223122.GA3325@galt.devicelogics.com> @ 2004-09-14 23:51 ` Nick Piggin 2004-09-15 0:51 ` Gene Heskett 2004-09-15 17:27 ` Jeff V. Merkey 0 siblings, 2 replies; 10+ messages in thread From: Nick Piggin @ 2004-09-14 23:51 UTC (permalink / raw) To: jmerkey; +Cc: Jeff V. Merkey, linux-kernel, jmerkey [-- Attachment #1: Type: text/plain, Size: 199 bytes --] jmerkey@galt.devicelogics.com wrote: > You bet. Send them to me. For some reason I am not able to post > to LKML again. > > Jeff > OK, this is against 2.6.9-rc2. Let me know how you go. Thanks [-- Attachment #2: vm-rollup.patch --] [-- Type: text/x-patch, Size: 9996 bytes --] --- linux-2.6-npiggin/include/linux/mmzone.h | 8 ++ linux-2.6-npiggin/mm/page_alloc.c | 83 ++++++++++++++++++------------- linux-2.6-npiggin/mm/vmscan.c | 34 +++++++++--- 3 files changed, 81 insertions(+), 44 deletions(-) diff -puN mm/page_alloc.c~vm-rollup mm/page_alloc.c --- linux-2.6/mm/page_alloc.c~vm-rollup 2004-09-15 09:48:12.000000000 +1000 +++ linux-2.6-npiggin/mm/page_alloc.c 2004-09-15 09:48:59.000000000 +1000 @@ -206,6 +206,7 @@ static inline void __free_pages_bulk (st BUG_ON(bad_range(zone, buddy1)); BUG_ON(bad_range(zone, buddy2)); list_del(&buddy1->lru); + area->nr_free--; mask <<= 1; order++; area++; @@ -213,6 +214,7 @@ static inline void __free_pages_bulk (st page_idx &= mask; } list_add(&(base + page_idx)->lru, &area->free_list); + area->nr_free++; } static inline void free_pages_check(const char *function, struct page *page) @@ -314,6 +316,7 @@ expand(struct zone *zone, struct page *p size >>= 1; BUG_ON(bad_range(zone, &page[size])); list_add(&page[size].lru, &area->free_list); + area->nr_free++; MARK_USED(index + size, high, area); } return page; @@ -377,6 +380,7 @@ static struct page *__rmqueue(struct zon page = list_entry(area->free_list.next, struct page, lru); list_del(&page->lru); + area->nr_free--; index = page - zone->zone_mem_map; if (current_order != MAX_ORDER-1) MARK_USED(index, current_order, area); @@ -579,6 +583,36 @@ buffered_rmqueue(struct zone *zone, int } /* + * Return 1 if free pages are above 'mark'. This takes into account the order + * of the allocation. + */ +int zone_watermark_ok(struct zone *z, int order, unsigned long mark, + int alloc_type, int can_try_harder, int gfp_high) +{ + unsigned long min = mark, free_pages = z->free_pages; + int o; + + if (gfp_high) + min -= min / 2; + if (can_try_harder) + min -= min / 4; + + if (free_pages < min + z->protection[alloc_type]) + return 0; + for (o = 0; o < order; o++) { + /* At the next order, this order's pages become unavailable */ + free_pages -= z->free_area[order].nr_free << o; + + /* Require fewer higher order pages to be free */ + min >>= 1; + + if (free_pages < min + (1 << order) - 1) + return 0; + } + return 1; +} + +/* * This is the 'heart' of the zoned buddy allocator. * * Herein lies the mysterious "incremental min". That's the @@ -599,7 +633,6 @@ __alloc_pages(unsigned int gfp_mask, uns struct zonelist *zonelist) { const int wait = gfp_mask & __GFP_WAIT; - unsigned long min; struct zone **zones, *z; struct page *page; struct reclaim_state reclaim_state; @@ -629,9 +662,9 @@ __alloc_pages(unsigned int gfp_mask, uns /* Go through the zonelist once, looking for a zone with enough free */ for (i = 0; (z = zones[i]) != NULL; i++) { - min = z->pages_low + (1<<order) + z->protection[alloc_type]; - if (z->free_pages < min) + if (!zone_watermark_ok(z, order, z->pages_low, + alloc_type, 0, 0)) continue; page = buffered_rmqueue(z, order, gfp_mask); @@ -640,21 +673,16 @@ __alloc_pages(unsigned int gfp_mask, uns } for (i = 0; (z = zones[i]) != NULL; i++) - wakeup_kswapd(z); + wakeup_kswapd(z, order); /* * Go through the zonelist again. Let __GFP_HIGH and allocations * coming from realtime tasks to go deeper into reserves */ for (i = 0; (z = zones[i]) != NULL; i++) { - min = z->pages_min; - if (gfp_mask & __GFP_HIGH) - min /= 2; - if (can_try_harder) - min -= min / 4; - min += (1<<order) + z->protection[alloc_type]; - - if (z->free_pages < min) + if (!zone_watermark_ok(z, order, z->pages_min, + alloc_type, can_try_harder, + gfp_mask & __GFP_HIGH)) continue; page = buffered_rmqueue(z, order, gfp_mask); @@ -690,14 +718,9 @@ rebalance: /* go through the zonelist yet one more time */ for (i = 0; (z = zones[i]) != NULL; i++) { - min = z->pages_min; - if (gfp_mask & __GFP_HIGH) - min /= 2; - if (can_try_harder) - min -= min / 4; - min += (1<<order) + z->protection[alloc_type]; - - if (z->free_pages < min) + if (!zone_watermark_ok(z, order, z->pages_min, + alloc_type, can_try_harder, + gfp_mask & __GFP_HIGH)) continue; page = buffered_rmqueue(z, order, gfp_mask); @@ -1117,7 +1140,6 @@ void show_free_areas(void) } for_each_zone(zone) { - struct list_head *elem; unsigned long nr, flags, order, total = 0; show_node(zone); @@ -1129,9 +1151,7 @@ void show_free_areas(void) spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < MAX_ORDER; order++) { - nr = 0; - list_for_each(elem, &zone->free_area[order].free_list) - ++nr; + nr = zone->free_area[order].nr_free; total += nr << order; printk("%lu*%lukB ", nr, K(1UL) << order); } @@ -1457,6 +1477,7 @@ void zone_init_free_lists(struct pglist_ bitmap_size = pages_to_bitmap_size(order, size); zone->free_area[order].map = (unsigned long *) alloc_bootmem_node(pgdat, bitmap_size); + zone->free_area[order].nr_free = 0; } } @@ -1481,6 +1502,7 @@ static void __init free_area_init_core(s pgdat->nr_zones = 0; init_waitqueue_head(&pgdat->kswapd_wait); + pgdat->kswapd_max_order = 0; for (j = 0; j < MAX_NR_ZONES; j++) { struct zone *zone = pgdat->node_zones + j; @@ -1644,8 +1666,7 @@ static void frag_stop(struct seq_file *m } /* - * This walks the freelist for each zone. Whilst this is slow, I'd rather - * be slow here than slow down the fast path by keeping stats - mjbligh + * This walks the free areas for each zone. */ static int frag_show(struct seq_file *m, void *arg) { @@ -1661,14 +1682,8 @@ static int frag_show(struct seq_file *m, spin_lock_irqsave(&zone->lock, flags); seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name); - for (order = 0; order < MAX_ORDER; ++order) { - unsigned long nr_bufs = 0; - struct list_head *elem; - - list_for_each(elem, &(zone->free_area[order].free_list)) - ++nr_bufs; - seq_printf(m, "%6lu ", nr_bufs); - } + for (order = 0; order < MAX_ORDER; ++order) + seq_printf(m, "%6lu ", zone->free_area[order].nr_free); spin_unlock_irqrestore(&zone->lock, flags); seq_putc(m, '\n'); } diff -puN include/linux/mmzone.h~vm-rollup include/linux/mmzone.h --- linux-2.6/include/linux/mmzone.h~vm-rollup 2004-09-15 09:48:16.000000000 +1000 +++ linux-2.6-npiggin/include/linux/mmzone.h 2004-09-15 09:48:59.000000000 +1000 @@ -23,6 +23,7 @@ struct free_area { struct list_head free_list; unsigned long *map; + unsigned long nr_free; }; struct pglist_data; @@ -262,8 +263,9 @@ typedef struct pglist_data { range, including holes */ int node_id; struct pglist_data *pgdat_next; - wait_queue_head_t kswapd_wait; + wait_queue_head_t kswapd_wait; struct task_struct *kswapd; + int kswapd_max_order; } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) @@ -277,7 +279,9 @@ void __get_zone_counts(unsigned long *ac void get_zone_counts(unsigned long *active, unsigned long *inactive, unsigned long *free); void build_all_zonelists(void); -void wakeup_kswapd(struct zone *zone); +void wakeup_kswapd(struct zone *zone, int order); +int zone_watermark_ok(struct zone *z, int order, unsigned long mark, + int alloc_type, int can_try_harder, int gfp_high); /* * zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc. diff -puN mm/vmscan.c~vm-rollup mm/vmscan.c --- linux-2.6/mm/vmscan.c~vm-rollup 2004-09-15 09:48:18.000000000 +1000 +++ linux-2.6-npiggin/mm/vmscan.c 2004-09-15 09:49:31.000000000 +1000 @@ -965,7 +965,7 @@ out: * the page allocator fallback scheme to ensure that aging of pages is balanced * across the zones. */ -static int balance_pgdat(pg_data_t *pgdat, int nr_pages) +static int balance_pgdat(pg_data_t *pgdat, int nr_pages, int order) { int to_free = nr_pages; int priority; @@ -1003,7 +1003,8 @@ static int balance_pgdat(pg_data_t *pgda priority != DEF_PRIORITY) continue; - if (zone->free_pages <= zone->pages_high) { + if (!zone_watermark_ok(zone, order, + zone->pages_high, 0, 0, 0)) { end_zone = i; goto scan; } @@ -1035,7 +1036,8 @@ scan: continue; if (nr_pages == 0) { /* Not software suspend */ - if (zone->free_pages <= zone->pages_high) + if (!zone_watermark_ok(zone, order, + zone->pages_high, end_zone, 0, 0)) all_zones_ok = 0; } zone->temp_priority = priority; @@ -1126,13 +1128,26 @@ static int kswapd(void *p) tsk->flags |= PF_MEMALLOC|PF_KSWAPD; for ( ; ; ) { + unsigned long order = 0, new_order; if (current->flags & PF_FREEZE) refrigerator(PF_FREEZE); + prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); - schedule(); + new_order = pgdat->kswapd_max_order; + pgdat->kswapd_max_order = 0; + if (order < new_order) { + /* + * Don't sleep if someone wants a larger 'order' + * allocation + */ + order = new_order; + } else { + schedule(); + order = pgdat->kswapd_max_order; + } finish_wait(&pgdat->kswapd_wait, &wait); - balance_pgdat(pgdat, 0); + balance_pgdat(pgdat, 0, order); } return 0; } @@ -1140,10 +1155,13 @@ static int kswapd(void *p) /* * A zone is low on free memory, so wake its kswapd task to service it. */ -void wakeup_kswapd(struct zone *zone) +void wakeup_kswapd(struct zone *zone, int order) { - if (zone->free_pages > zone->pages_low) + pg_data_t *pgdat = zone->zone_pgdat; + + if (pgdat->kswapd_max_order < order) return; + pgdat->kswapd_max_order = order; if (!waitqueue_active(&zone->zone_pgdat->kswapd_wait)) return; wake_up_interruptible(&zone->zone_pgdat->kswapd_wait); @@ -1166,7 +1184,7 @@ int shrink_all_memory(int nr_pages) current->reclaim_state = &reclaim_state; for_each_pgdat(pgdat) { int freed; - freed = balance_pgdat(pgdat, nr_to_free); + freed = balance_pgdat(pgdat, nr_to_free, 0); ret += freed; nr_to_free -= freed; if (nr_to_free <= 0) _ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness 2004-09-14 23:51 ` Nick Piggin @ 2004-09-15 0:51 ` Gene Heskett 2004-09-15 17:27 ` Jeff V. Merkey 1 sibling, 0 replies; 10+ messages in thread From: Gene Heskett @ 2004-09-15 0:51 UTC (permalink / raw) To: linux-kernel; +Cc: Nick Piggin, jmerkey, Jeff V. Merkey, jmerkey On Tuesday 14 September 2004 19:51, Nick Piggin wrote: >jmerkey@galt.devicelogics.com wrote: >> You bet. Send them to me. For some reason I am not able to post >> to LKML again. >> >> Jeff > >OK, this is against 2.6.9-rc2. Let me know how you go. Thanks Humm, it came thru the list just fine, Nick. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.26% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness 2004-09-14 23:51 ` Nick Piggin 2004-09-15 0:51 ` Gene Heskett @ 2004-09-15 17:27 ` Jeff V. Merkey 2004-09-15 17:33 ` Jeff V. Merkey 2004-09-16 1:46 ` Nick Piggin 1 sibling, 2 replies; 10+ messages in thread From: Jeff V. Merkey @ 2004-09-15 17:27 UTC (permalink / raw) To: Nick Piggin; +Cc: jmerkey, linux-kernel, jmerkey [-- Attachment #1: Type: text/plain, Size: 767 bytes --] Nick Piggin wrote: > jmerkey@galt.devicelogics.com wrote: > >> You bet. Send them to me. For some reason I am not able to post to >> LKML again. >> >> Jeff >> > OK, this is against 2.6.9-rc2. Let me know how you go. Thanks > > > Nick, The problem is corrected with this patch. I am running with 3GB of kernel memory and 1GB user space with the userspace splitting patch with very heavy swapping and user space app activity and no failed allocations. This patch should be rolled into 2.6.9-rc2 since it fixes the problem. With standard 3GB User/1GB kernel address space, it also fixes the problems with X server running out of memory and the apps crashing. Jeff Here's the stats from the test of the patch against 2.6.8-rc2 with the patch applied [-- Attachment #2: proc.meminfo --] [-- Type: text/plain, Size: 0 bytes --] [-- Attachment #3: proc.vmstat --] [-- Type: text/plain, Size: 0 bytes --] [-- Attachment #4: proc.slabinfo --] [-- Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness 2004-09-15 17:27 ` Jeff V. Merkey @ 2004-09-15 17:33 ` Jeff V. Merkey 2004-09-16 1:46 ` Nick Piggin 1 sibling, 0 replies; 10+ messages in thread From: Jeff V. Merkey @ 2004-09-15 17:33 UTC (permalink / raw) To: Jeff V. Merkey; +Cc: Nick Piggin, jmerkey, linux-kernel, jmerkey [-- Attachment #1: Type: text/plain, Size: 871 bytes --] Jeff V. Merkey wrote: > Nick Piggin wrote: > >> jmerkey@galt.devicelogics.com wrote: >> >>> You bet. Send them to me. For some reason I am not able to post to >>> LKML again. >>> >>> Jeff >>> >> OK, this is against 2.6.9-rc2. Let me know how you go. Thanks >> >> >> > > Nick, > > The problem is corrected with this patch. I am running with 3GB of > kernel memory > and 1GB user space with the userspace splitting patch with very heavy > swapping > and user space app activity and no failed allocations. This patch > should be rolled > into 2.6.9-rc2 since it fixes the problem. With standard 3GB User/1GB > kernel > address space, it also fixes the problems with X server running out of > memory > and the apps crashing. > > Jeff > > Here's the stats from the test of the patch against 2.6.8-rc2 with the > patch applied > > Attachments included. Jeff [-- Attachment #2: proc.meminfo --] [-- Type: text/plain, Size: 572 bytes --] MemTotal: 2983616 kB MemFree: 576608 kB Buffers: 42116 kB Cached: 86000 kB SwapCached: 2364 kB Active: 133756 kB Inactive: 25340 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 2983616 kB LowFree: 576608 kB SwapTotal: 1052248 kB SwapFree: 1011856 kB Dirty: 4136 kB Writeback: 0 kB Mapped: 35872 kB Slab: 2239264 kB Committed_AS: 97816 kB PageTables: 1076 kB VmallocTotal: 122824 kB VmallocUsed: 2896 kB VmallocChunk: 119604 kB [-- Attachment #3: proc.vmstat --] [-- Type: text/plain, Size: 704 bytes --] nr_dirty 1070 nr_writeback 0 nr_unstable 0 nr_page_table_pages 257 nr_mapped 5452 nr_slab 559846 pgpgin 259093865 pgpgout 68682338 pswpin 59363 pswpout 292378 pgalloc_high 0 pgalloc_normal 30770083 pgalloc_dma 2951033 pgfree 33868082 pgactivate 1505831 pgdeactivate 1709234 pgfault 64727816 pgmajfault 15099 pgrefill_high 0 pgrefill_normal 1685663 pgrefill_dma 1153475 pgsteal_high 0 pgsteal_normal 1043923 pgsteal_dma 424170 pgscan_kswapd_high 0 pgscan_kswapd_normal 1209615 pgscan_kswapd_dma 1983944 pgscan_direct_high 0 pgscan_direct_normal 206712 pgscan_direct_dma 9603 pginodesteal 11 slabs_scanned 342016 kswapd_steal 1298184 kswapd_inodesteal 9310 pageoutrun 2291 allocstall 4830 pgrotated 446422 [-- Attachment #4: proc.slabinfo --] [-- Type: text/plain, Size: 13583 bytes --] slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> bt_sock 3 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 rpc_buffers 8 8 2048 2 1 : tunables 24 12 0 : slabdata 4 4 0 rpc_tasks 8 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0 rpc_inode_cache 6 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0 ip_fib_hash 11 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 scsi_cmd_cache 52 110 384 10 1 : tunables 54 27 0 : slabdata 8 11 0 sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0 sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0 sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0 sgpool-8 97 217 128 31 1 : tunables 120 60 0 : slabdata 6 7 0 dm_tio 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 dm_io 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 uhci_urb_priv 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0 dn_fib_info_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 dn_dst_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 dn_neigh_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 decnet_socket_cache 0 0 768 5 1 : tunables 54 27 0 : slabdata 0 0 0 clip_arp_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 xfrm6_tunnel_spi 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 fib6_nodes 13 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 ip6_dst_cache 51 90 256 15 1 : tunables 120 60 0 : slabdata 6 6 0 ndisc_cache 5 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0 raw6_sock 0 0 640 6 1 : tunables 54 27 0 : slabdata 0 0 0 udp6_sock 0 0 640 6 1 : tunables 54 27 0 : slabdata 0 0 0 tcp6_sock 5 7 1152 7 2 : tunables 24 12 0 : slabdata 1 1 0 unix_sock 50 50 384 10 1 : tunables 54 27 0 : slabdata 5 5 0 ip_mrt_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 tcp_tw_bucket 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 tcp_bind_bucket 8 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_open_request 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 inet_peer_cache 1 61 64 61 1 : tunables 120 60 0 : slabdata 1 1 0 secpath_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 xfrm_dst_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 ip_dst_cache 23 30 256 15 1 : tunables 120 60 0 : slabdata 2 2 0 arp_cache 1 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 raw4_sock 0 0 512 7 1 : tunables 54 27 0 : slabdata 0 0 0 udp_sock 3 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0 tcp_sock 20 20 1024 4 1 : tunables 54 27 0 : slabdata 5 5 0 flow_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 isofs_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0 fat_inode_cache 0 0 340 11 1 : tunables 54 27 0 : slabdata 0 0 0 ext2_inode_cache 0 0 400 10 1 : tunables 54 27 0 : slabdata 0 0 0 journal_handle 16 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0 journal_head 521 891 48 81 1 : tunables 120 60 0 : slabdata 11 11 0 revoke_table 4 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0 revoke_record 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 ext3_inode_cache 7263 7263 440 9 1 : tunables 54 27 0 : slabdata 807 807 0 ext3_xattr 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0 reiser_inode_cache 0 0 368 11 1 : tunables 54 27 0 : slabdata 0 0 0 dquot 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_epi 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 kioctx 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 kiocb 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 dnotify_cache 1 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 fasync_cache 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 shmem_inode_cache 4 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uid_cache 9 61 64 61 1 : tunables 120 60 0 : slabdata 1 1 0 cfq_pool 64 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 crq_pool 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0 as_arq 135 195 60 65 1 : tunables 120 60 0 : slabdata 3 3 0 blkdev_ioc 109 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_queue 21 24 452 8 1 : tunables 54 27 0 : slabdata 3 3 0 blkdev_requests 122 182 152 26 1 : tunables 120 60 0 : slabdata 7 7 0 biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0 biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0 biovec-64 257 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0 biovec-16 131340 131340 256 15 1 : tunables 120 60 0 : slabdata 8756 8756 0 biovec-4 305 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0 biovec-1 368 678 16 226 1 : tunables 120 60 0 : slabdata 3 3 0 bio 131396 131516 64 61 1 : tunables 120 60 0 : slabdata 2156 2156 0 file_lock_cache 6 45 88 45 1 : tunables 120 60 0 : slabdata 1 1 0 sock_inode_cache 100 100 384 10 1 : tunables 54 27 0 : slabdata 10 10 0 skbuff_head_cache 2130 2130 256 15 1 : tunables 120 60 0 : slabdata 142 142 0 sock 30 30 384 10 1 : tunables 54 27 0 : slabdata 3 3 0 proc_inode_cache 435 481 308 13 1 : tunables 54 27 0 : slabdata 37 37 0 sigqueue 27 27 148 27 1 : tunables 120 60 0 : slabdata 1 1 0 radix_tree_node 7378 7378 276 14 1 : tunables 54 27 0 : slabdata 527 527 0 bdev_cache 8 14 512 7 1 : tunables 54 27 0 : slabdata 2 2 0 mnt_cache 21 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 inode_cache 3406 3406 292 13 1 : tunables 54 27 0 : slabdata 262 262 0 dentry_cache 15148 15148 140 28 1 : tunables 120 60 0 : slabdata 541 541 0 filp 705 945 256 15 1 : tunables 120 60 0 : slabdata 63 63 0 names_cache 19 19 4096 1 1 : tunables 24 12 0 : slabdata 19 19 0 idr_layer_cache 100 116 136 29 1 : tunables 120 60 0 : slabdata 4 4 0 buffer_head 18041 67230 48 81 1 : tunables 120 60 0 : slabdata 830 830 0 mm_struct 102 102 640 6 1 : tunables 54 27 0 : slabdata 17 17 0 vm_area_struct 1831 2256 84 47 1 : tunables 120 60 0 : slabdata 48 48 0 fs_cache 119 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 files_cache 98 98 512 7 1 : tunables 54 27 0 : slabdata 14 14 0 signal_cache 155 155 128 31 1 : tunables 120 60 0 : slabdata 5 5 0 sighand_cache 89 115 1408 5 2 : tunables 24 12 0 : slabdata 23 23 0 task_struct 93 114 1360 3 1 : tunables 24 12 0 : slabdata 38 38 0 anon_vma 921 1221 8 407 1 : tunables 120 60 0 : slabdata 3 3 0 pgd 73 73 4096 1 1 : tunables 24 12 0 : slabdata 73 73 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 2 2 131072 1 32 : tunables 8 4 0 : slabdata 2 2 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 32834 32834 65536 1 16 : tunables 8 4 0 : slabdata 32834 32834 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 1 1 32768 1 8 : tunables 8 4 0 : slabdata 1 1 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 2 2 16384 1 4 : tunables 8 4 0 : slabdata 2 2 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 124 128 8192 1 2 : tunables 8 4 0 : slabdata 124 128 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0 size-4096 2095 2095 4096 1 1 : tunables 24 12 0 : slabdata 2095 2095 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0 size-2048 32867 32868 2048 2 1 : tunables 24 12 0 : slabdata 16434 16434 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0 size-1024 141 144 1024 4 1 : tunables 54 27 0 : slabdata 36 36 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 size-512 200 560 512 8 1 : tunables 54 27 0 : slabdata 70 70 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 size-256 189 480 256 15 1 : tunables 120 60 0 : slabdata 32 32 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 size-128 2181 2263 128 31 1 : tunables 120 60 0 : slabdata 73 73 0 size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 size-64 921 1281 64 61 1 : tunables 120 60 0 : slabdata 21 21 0 size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 size-32 52715 52955 32 119 1 : tunables 120 60 0 : slabdata 445 445 0 kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness 2004-09-15 17:27 ` Jeff V. Merkey 2004-09-15 17:33 ` Jeff V. Merkey @ 2004-09-16 1:46 ` Nick Piggin 2004-09-16 5:56 ` Jens Axboe 1 sibling, 1 reply; 10+ messages in thread From: Nick Piggin @ 2004-09-16 1:46 UTC (permalink / raw) To: Jeff V. Merkey; +Cc: jmerkey, linux-kernel, jmerkey Jeff V. Merkey wrote: > Nick Piggin wrote: >> OK, this is against 2.6.9-rc2. Let me know how you go. Thanks >> >> >> > > Nick, > > The problem is corrected with this patch. I am running with 3GB of > kernel memory > and 1GB user space with the userspace splitting patch with very heavy > swapping > and user space app activity and no failed allocations. This patch > should be rolled > into 2.6.9-rc2 since it fixes the problem. With standard 3GB User/1GB > kernel > address space, it also fixes the problems with X server running out of > memory > and the apps crashing. > Hi Jeff, Thanks, that is very cool. The memory problems you're seeing aren't actually a regression (it's always been like that), and I still haven't got hold of some gigabit networking hardware to test it thoroughly, so as such so it may be difficult to get this into 2.6.9. Hopefully soon though. I can provide you (or anyone) with up to date patches on request though, so just let me know. > Jeff > > Here's the stats from the test of the patch against 2.6.8-rc2 with the > patch applied > > Scanning stats look good at a quick glance. kswapd doesn't seem to be going crazy. However, size-65536 32834 32834 65536 1 16 This slab entry is taking up about 2GB of unreclaimable memory (order-4, no less). This must be a leak... does the number continue to rise as your system runs? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness 2004-09-16 1:46 ` Nick Piggin @ 2004-09-16 5:56 ` Jens Axboe 0 siblings, 0 replies; 10+ messages in thread From: Jens Axboe @ 2004-09-16 5:56 UTC (permalink / raw) To: Nick Piggin; +Cc: Jeff V. Merkey, jmerkey, linux-kernel, jmerkey On Thu, Sep 16 2004, Nick Piggin wrote: > Jeff V. Merkey wrote: > >Jeff > > > >Here's the stats from the test of the patch against 2.6.8-rc2 with the > >patch applied > > > > > > Scanning stats look good at a quick glance. kswapd doesn't seem to be > going crazy. > > However, > size-65536 32834 32834 65536 1 16 > > This slab entry is taking up about 2GB of unreclaimable memory (order-4, > no less). This must be a leak... does the number continue to rise as > your system runs? There's also a huge amount of 16-page bio + vecs in flight: biovec-16 131340 That would point to a leak as well, most likely. -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-09-16 6:03 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-08 16:48 2.6.8.1 mempool subsystem sickness jmerkey
2004-09-08 23:05 ` Nick Piggin
[not found] <091420042058.15928.41475B8000002BA100003E382200763704970A059D0A0306@comcast.net>
2004-09-14 20:32 ` Jeff V. Merkey
2004-09-14 22:59 ` Nick Piggin
[not found] ` <20040914223122.GA3325@galt.devicelogics.com>
2004-09-14 23:51 ` Nick Piggin
2004-09-15 0:51 ` Gene Heskett
2004-09-15 17:27 ` Jeff V. Merkey
2004-09-15 17:33 ` Jeff V. Merkey
2004-09-16 1:46 ` Nick Piggin
2004-09-16 5:56 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox