* ~500 megs cached yet 2.6.5 goes into swap hell @ 2004-04-28 21:27 Brett E. 2004-04-29 0:01 ` Andrew Morton 2004-04-29 0:04 ` Brett E. 0 siblings, 2 replies; 128+ messages in thread From: Brett E. @ 2004-04-28 21:27 UTC (permalink / raw) To: linux-kernel mailing list [-- Attachment #1: Type: text/plain, Size: 362 bytes --] Same thing happens on 2.4.18. I attached sar, slabinfo and /proc/meminfo data on the 2.6.5 machine. I reproduce this behavior by simply untarring a 260meg file on a production server, the machine becomes sluggish as it swaps to disk. Is there a way to limit the cache so this machine, which has 1 gigabyte of memory, doesn't dip into swap? Thanks, Brett [-- Attachment #2: attach.1 --] [-- Type: text/plain, Size: 15171 bytes --] 06:18:52 PM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad 06:18:53 PM 55332 1238644 95.72 14660 497888 450740 79364 14.97 9692 06:18:54 PM 55268 1238708 95.73 14660 497888 450740 79364 14.97 9692 06:18:55 PM 40060 1253916 96.90 14860 512920 450740 79364 14.97 9692 06:18:57 PM 6120 1287856 99.53 15340 546644 450740 79364 14.97 9692 06:18:59 PM 6632 1287344 99.49 15864 550880 450740 79364 14.97 9692 06:19:00 PM 6440 1287536 99.50 16020 552628 450740 79364 14.97 9692 06:19:02 PM 7648 1286328 99.41 15980 548452 450740 79364 14.97 9692 06:19:03 PM 6504 1287472 99.50 16008 548832 450740 79364 14.97 9692 06:19:04 PM 7592 1286384 99.41 15980 530160 450740 79364 14.97 9692 06:19:05 PM 6192 1287784 99.52 15716 499008 450740 79364 14.97 9692 06:19:06 PM 6544 1287432 99.49 15732 494640 450740 79364 14.97 9692 06:19:07 PM 7104 1286872 99.45 15768 488756 450740 79364 14.97 9692 06:19:08 PM 7592 1286384 99.41 15844 488680 450740 79364 14.97 9692 06:19:10 PM 7416 1286560 99.43 15936 479136 450740 79364 14.97 9692 06:19:13 PM 7024 1286952 99.46 15912 467808 450744 79360 14.97 9688 06:19:14 PM 7096 1286880 99.45 15664 427736 450744 79360 14.97 9684 06:19:15 PM 7240 1286736 99.44 15604 415692 450744 79360 14.97 9684 06:19:16 PM 6712 1287264 99.48 15616 414524 450744 79360 14.97 9684 06:19:18 PM 6200 1287776 99.52 15652 409660 450744 79360 14.97 9684 06:19:19 PM 10600 1283376 99.18 15724 407004 450744 79360 14.97 9684 06:18:52 PM pgpgin/s pgpgout/s fault/s majflt/s 06:18:53 PM 0.00 712.00 1236.00 0.00 06:18:54 PM 12.12 8.08 1067.68 0.00 06:18:55 PM 7497.03 11.88 2844.55 0.00 06:18:57 PM 10626.00 310.00 1422.50 0.00 06:18:59 PM 11758.00 196.00 346.50 0.00 06:19:00 PM 7828.00 608.00 136.00 0.00 06:19:02 PM 145.27 1136.32 1108.96 0.00 06:19:03 PM 905.05 13822.22 663.64 0.00 06:19:04 PM 689.11 2384.16 9437.62 0.00 06:19:05 PM 499.01 9572.28 13467.33 0.00 06:19:06 PM 3444.00 1340.00 1825.00 0.00 06:19:07 PM 7720.00 2032.00 3034.00 0.00 06:19:08 PM 5420.00 1304.00 688.00 0.00 06:19:10 PM 4045.77 4304.48 2188.56 0.00 06:19:13 PM 1079.07 5528.68 2046.90 0.00 06:19:14 PM 696.00 920.00 15650.00 0.00 06:19:15 PM 1478.79 1187.88 5046.46 0.00 06:19:16 PM 1000.00 2752.94 539.22 0.00 meminfo: meminfo: MemTotal: 1293976 kB MemFree: 8320 kB Buffers: 13396 kB Cached: 436428 kB SwapCached: 9516 kB Active: 810472 kB Inactive: 346816 kB HighTotal: 393216 kB HighFree: 1152 kB LowTotal: 900760 kB LowFree: 7168 kB SwapTotal: 530104 kB SwapFree: 450796 kB Dirty: 33704 kB Writeback: 10268 kB Mapped: 710732 kB Slab: 115240 kB Committed_AS: 942592 kB PageTables: 4612 kB VmallocTotal: 114680 kB VmallocUsed: 560 kB VmallocChunk: 114120 kB slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> rpc_buffers 8 8 2048 2 1 : tunables 24 12 8 : slabdata 4 4 0 rpc_tasks 8 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0 rpc_inode_cache 12 14 512 7 1 : tunables 54 27 8 : slabdata 2 2 0 unix_sock 192 203 512 7 1 : tunables 54 27 8 : slabdata 29 29 0 ip_conntrack 9926 14860 384 10 1 : tunables 54 27 8 : slabdata 1486 1486 216 tcp_tw_bucket 2028 6450 128 30 1 : tunables 120 60 8 : slabdata 215 215 384 tcp_bind_bucket 207 800 16 200 1 : tunables 120 60 8 : slabdata 4 4 16 tcp_open_request 113 290 64 58 1 : tunables 120 60 8 : slabdata 5 5 3 inet_peer_cache 2 58 64 58 1 : tunables 120 60 8 : slabdata 1 1 0 ip_fib_hash 18 200 16 200 1 : tunables 120 60 8 : slabdata 1 1 0 ip_dst_cache 23046 23145 256 15 1 : tunables 120 60 8 : slabdata 1543 1543 0 arp_cache 11 30 256 15 1 : tunables 120 60 8 : slabdata 2 2 0 raw4_sock 0 0 512 7 1 : tunables 54 27 8 : slabdata 0 0 0 udp_sock 10 21 512 7 1 : tunables 54 27 8 : slabdata 3 3 0 tcp_sock 248 408 1024 4 1 : tunables 54 27 8 : slabdata 102 102 0 flow_cache 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 udf_inode_cache 0 0 512 7 1 : tunables 54 27 8 : slabdata 0 0 0 nfs_write_data 36 42 512 7 1 : tunables 54 27 8 : slabdata 6 6 0 nfs_read_data 32 35 512 7 1 : tunables 54 27 8 : slabdata 5 5 0 nfs_inode_cache 15 24 640 6 1 : tunables 54 27 8 : slabdata 4 4 0 nfs_page 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 isofs_inode_cache 0 0 384 10 1 : tunables 54 27 8 : slabdata 0 0 0 fat_inode_cache 0 0 512 7 1 : tunables 54 27 8 : slabdata 0 0 0 ext2_inode_cache 7294 7294 512 7 1 : tunables 54 27 8 : slabdata 1042 1042 0 journal_handle 0 0 28 123 1 : tunables 120 60 8 : slabdata 0 0 0 journal_head 0 0 48 77 1 : tunables 120 60 8 : slabdata 0 0 0 revoke_table 0 0 12 250 1 : tunables 120 60 8 : slabdata 0 0 0 revoke_record 0 0 16 200 1 : tunables 120 60 8 : slabdata 0 0 0 ext3_inode_cache 0 0 512 7 1 : tunables 54 27 8 : slabdata 0 0 0 ext3_xattr 0 0 48 77 1 : tunables 120 60 8 : slabdata 0 0 0 eventpoll_pwq 0 0 36 99 1 : tunables 120 60 8 : slabdata 0 0 0 eventpoll_epi 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 kioctx 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 kiocb 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 dnotify_cache 0 0 20 166 1 : tunables 120 60 8 : slabdata 0 0 0 file_lock_cache 9 40 96 40 1 : tunables 120 60 8 : slabdata 1 1 0 fasync_cache 0 0 16 200 1 : tunables 120 60 8 : slabdata 0 0 0 shmem_inode_cache 3 7 512 7 1 : tunables 54 27 8 : slabdata 1 1 0 posix_timers_cache 0 0 88 43 1 : tunables 120 60 8 : slabdata 0 0 0 uid_cache 5 112 32 112 1 : tunables 120 60 8 : slabdata 1 1 0 sgpool-128 32 32 2048 2 1 : tunables 24 12 8 : slabdata 16 16 0 sgpool-64 32 32 1024 4 1 : tunables 54 27 8 : slabdata 8 8 0 sgpool-32 32 32 512 8 1 : tunables 54 27 8 : slabdata 4 4 0 sgpool-16 32 45 256 15 1 : tunables 120 60 8 : slabdata 3 3 0 sgpool-8 32 60 128 30 1 : tunables 120 60 8 : slabdata 2 2 0 deadline_drq 0 0 52 71 1 : tunables 120 60 8 : slabdata 0 0 0 as_arq 296 348 64 58 1 : tunables 120 60 8 : slabdata 6 6 60 blkdev_requests 312 312 160 24 1 : tunables 120 60 8 : slabdata 13 13 60 biovec-BIO_MAX_PAGES 256 256 3072 2 2 : tunables 24 12 8 : slabdata 128 128 0 biovec-128 256 260 1536 5 2 : tunables 24 12 8 : slabdata 52 52 0 biovec-64 629 640 768 5 1 : tunables 54 27 8 : slabdata 128 128 38 biovec-16 315 315 256 15 1 : tunables 120 60 8 : slabdata 21 21 0 biovec-4 348 348 64 58 1 : tunables 120 60 8 : slabdata 6 6 0 biovec-1 520 600 16 200 1 : tunables 120 60 8 : slabdata 3 3 60 bio 870 870 64 58 1 : tunables 120 60 8 : slabdata 15 15 180 sock_inode_cache 573 910 512 7 1 : tunables 54 27 8 : slabdata 130 130 0 skbuff_head_cache 296 870 256 15 1 : tunables 120 60 8 : slabdata 58 58 30 sock 4 10 384 10 1 : tunables 54 27 8 : slabdata 1 1 0 proc_inode_cache 1417 1530 384 10 1 : tunables 54 27 8 : slabdata 153 153 0 sigqueue 130 130 144 26 1 : tunables 120 60 8 : slabdata 5 5 0 radix_tree_node 7117 8955 260 15 1 : tunables 54 27 8 : slabdata 597 597 189 bdev_cache 6 7 512 7 1 : tunables 54 27 8 : slabdata 1 1 0 mnt_cache 20 58 64 58 1 : tunables 120 60 8 : slabdata 1 1 0 inode_cache 566 580 384 10 1 : tunables 54 27 8 : slabdata 58 58 0 dentry_cache 167775 176055 256 15 1 : tunables 120 60 8 : slabdata 11737 11737 0 filp 2057 2790 256 15 1 : tunables 120 60 8 : slabdata 186 186 0 names_cache 25 25 4096 1 1 : tunables 24 12 8 : slabdata 25 25 0 idr_layer_cache 3 28 136 28 1 : tunables 120 60 8 : slabdata 1 1 0 buffer_head 35463 50481 52 71 1 : tunables 120 60 8 : slabdata 711 711 0 mm_struct 331 360 640 6 1 : tunables 54 27 8 : slabdata 60 60 0 vm_area_struct 10667 12586 64 58 1 : tunables 120 60 8 : slabdata 217 217 0 fs_cache 331 464 64 58 1 : tunables 120 60 8 : slabdata 8 8 0 files_cache 346 371 512 7 1 : tunables 54 27 8 : slabdata 53 53 0 signal_cache 447 696 64 58 1 : tunables 120 60 8 : slabdata 12 12 0 sighand_cache 345 380 1408 5 2 : tunables 24 12 8 : slabdata 76 76 0 task_struct 434 450 1456 5 2 : tunables 24 12 8 : slabdata 90 90 0 pte_chain 139628 145500 128 30 1 : tunables 120 60 8 : slabdata 4850 4850 0 pgd 330 330 4096 1 1 : tunables 24 12 8 : slabdata 330 330 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 1 1 16384 1 4 : tunables 8 4 0 : slabdata 1 1 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 446 446 8192 1 2 : tunables 8 4 0 : slabdata 446 446 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 8 : slabdata 0 0 0 size-4096 65 66 4096 1 1 : tunables 24 12 8 : slabdata 65 66 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0 0 size-2048 245 294 2048 2 1 : tunables 24 12 8 : slabdata 147 147 4 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0 size-1024 109 128 1024 4 1 : tunables 54 27 8 : slabdata 32 32 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0 size-512 268 488 512 8 1 : tunables 54 27 8 : slabdata 61 61 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 size-256 424 465 256 15 1 : tunables 120 60 8 : slabdata 31 31 0 size-128(DMA) 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 size-128 2387 3090 128 30 1 : tunables 120 60 8 : slabdata 103 103 0 size-64(DMA) 0 0 64 58 1 : tunables 120 60 8 : slabdata 0 0 0 size-64 334 406 64 58 1 : tunables 120 60 8 : slabdata 7 7 0 size-32(DMA) 0 0 32 112 1 : tunables 120 60 8 : slabdata 0 0 0 size-32 744 784 32 112 1 : tunables 120 60 8 : slabdata 7 7 0 kmem_cache 104 104 148 26 1 : tunables 120 60 8 : slabdata 4 4 0 ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-28 21:27 ~500 megs cached yet 2.6.5 goes into swap hell Brett E. @ 2004-04-29 0:01 ` Andrew Morton 2004-04-29 0:10 ` Jeff Garzik 2004-04-29 0:44 ` Brett E. 2004-04-29 0:04 ` Brett E. 1 sibling, 2 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 0:01 UTC (permalink / raw) To: brettspamacct; +Cc: linux-kernel "Brett E." <brettspamacct@fastclick.com> wrote: > > I attached sar, slabinfo and /proc/meminfo data on the 2.6.5 machine. I > reproduce this behavior by simply untarring a 260meg file on a > production server, the machine becomes sluggish as it swaps to disk. I see no swapout from the info which you sent. A `vmstat 1' trace would be more useful. > Is there a way to limit the cache so this machine, which has 1 gigabyte of > memory, doesn't dip into swap? Decrease /proc/sys/vm/swappiness? Swapout is good. It frees up unused memory. I run my desktop machines at swappiness=100. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:01 ` Andrew Morton @ 2004-04-29 0:10 ` Jeff Garzik 2004-04-29 0:21 ` Nick Piggin 2004-04-29 0:49 ` Brett E. 2004-04-29 0:44 ` Brett E. 1 sibling, 2 replies; 128+ messages in thread From: Jeff Garzik @ 2004-04-29 0:10 UTC (permalink / raw) To: Andrew Morton; +Cc: brettspamacct, linux-kernel Andrew Morton wrote: > Swapout is good. It frees up unused memory. I run my desktop machines at > swappiness=100. The definition of "unused" is quite subjective and app-dependent... I've see reports with increasing frequency about the swappiness of the 2.6.x kernels, from people who were already annoyed at the swappiness of 2.4.x kernels :) Favorite pathological (and quite common) examples are the various 4am cron jobs that scan your entire filesystem. Running that process overnight on a quiet machines practically guarantees a huge burst of disk activity, with unwanted results: 1) Inode and page caches are blown away 2) A lot of your desktop apps are swapped out Additionally, a (IMO valid) maxim of sysadmins has been "a properly configured server doesn't swap". There should be no reason why this maxim becomes invalid over time. When Linux starts to swap out apps the sysadmin knows will be useful in an hour, or six hours, or a day just because it needs a bit more file cache, I get worried. There IMO should be some way to balance the amount of anon-vma's such that the sysadmin can say "stop taking 70% of my box's memory for disposable cache, use it instead for apps you would otherwise swap out, you memory-hungry kernel you." Jeff ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:10 ` Jeff Garzik @ 2004-04-29 0:21 ` Nick Piggin 2004-04-29 0:50 ` Wakko Warner ` (2 more replies) 2004-04-29 0:49 ` Brett E. 1 sibling, 3 replies; 128+ messages in thread From: Nick Piggin @ 2004-04-29 0:21 UTC (permalink / raw) To: Jeff Garzik; +Cc: Andrew Morton, brettspamacct, linux-kernel Jeff Garzik wrote: > Andrew Morton wrote: > >> Swapout is good. It frees up unused memory. I run my desktop >> machines at >> swappiness=100. > > > > The definition of "unused" is quite subjective and app-dependent... > > I've see reports with increasing frequency about the swappiness of the > 2.6.x kernels, from people who were already annoyed at the swappiness of > 2.4.x kernels :) > > Favorite pathological (and quite common) examples are the various 4am > cron jobs that scan your entire filesystem. Running that process > overnight on a quiet machines practically guarantees a huge burst of > disk activity, with unwanted results: > 1) Inode and page caches are blown away > 2) A lot of your desktop apps are swapped out > > Additionally, a (IMO valid) maxim of sysadmins has been "a properly > configured server doesn't swap". There should be no reason why this > maxim becomes invalid over time. When Linux starts to swap out apps the > sysadmin knows will be useful in an hour, or six hours, or a day just > because it needs a bit more file cache, I get worried. > I don't know. What if you have some huge application that only runs once per day for 10 minutes? Do you want it to be consuming 100MB of your memory for the other 23 hours and 50 minutes for no good reason? Anyway, I have a small set of VM patches which attempt to improve this sort of behaviour if anyone is brave enough to try them. Against -mm kernels only I'm afraid (the objrmap work causes some porting difficulty). ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:21 ` Nick Piggin @ 2004-04-29 0:50 ` Wakko Warner 2004-04-29 0:53 ` Jeff Garzik ` (2 more replies) 2004-04-29 0:58 ` Marc Singer 2004-04-29 20:01 ` Horst von Brand 2 siblings, 3 replies; 128+ messages in thread From: Wakko Warner @ 2004-04-29 0:50 UTC (permalink / raw) To: Nick Piggin; +Cc: linux-kernel > I don't know. What if you have some huge application that only > runs once per day for 10 minutes? Do you want it to be consuming > 100MB of your memory for the other 23 hours and 50 minutes for > no good reason? I keep soffice open all the time. The box in question has 512mb of ram. This is one app, even though I use it infrequently, would prefer that it never be swapped out. Mainly when I want to use it, I *WANT* it now (ie not waiting for it to come back from swap) This is just my oppinion. I personally feel that cache should use available memory, not already used memory (swapping apps out for more cache). -- Lab tests show that use of micro$oft causes cancer in lab animals ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:50 ` Wakko Warner @ 2004-04-29 0:53 ` Jeff Garzik 2004-04-29 0:54 ` Nick Piggin 2004-04-29 21:45 ` Denis Vlasenko 2 siblings, 0 replies; 128+ messages in thread From: Jeff Garzik @ 2004-04-29 0:53 UTC (permalink / raw) To: Wakko Warner; +Cc: Nick Piggin, linux-kernel Wakko Warner wrote: > This is just my oppinion. I personally feel that cache should use available > memory, not already used memory (swapping apps out for more cache). Strongly agreed, though there are pathological cases that prevent this from being something that's easy to implement on a global basis. Jeff ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:50 ` Wakko Warner 2004-04-29 0:53 ` Jeff Garzik @ 2004-04-29 0:54 ` Nick Piggin 2004-04-29 1:51 ` Tim Connors 2004-04-29 21:45 ` Denis Vlasenko 2 siblings, 1 reply; 128+ messages in thread From: Nick Piggin @ 2004-04-29 0:54 UTC (permalink / raw) To: Wakko Warner; +Cc: linux-kernel Wakko Warner wrote: >>I don't know. What if you have some huge application that only >>runs once per day for 10 minutes? Do you want it to be consuming >>100MB of your memory for the other 23 hours and 50 minutes for >>no good reason? > > > I keep soffice open all the time. The box in question has 512mb of ram. > This is one app, even though I use it infrequently, would prefer that it > never be swapped out. Mainly when I want to use it, I *WANT* it now (ie not > waiting for it to come back from swap) > > This is just my oppinion. I personally feel that cache should use available > memory, not already used memory (swapping apps out for more cache). > On the other hand, suppose that with soffice resident the entire time, you don't have enough memory to cache an entire kernel tree (or video you are editing, or whatever). Now your find | xargs grep keeps taking 30s every time you run it, or your video is un-editable... ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:54 ` Nick Piggin @ 2004-04-29 1:51 ` Tim Connors 0 siblings, 0 replies; 128+ messages in thread From: Tim Connors @ 2004-04-29 1:51 UTC (permalink / raw) To: linux-kernel Nick Piggin <nickpiggin@yahoo.com.au> said on Thu, 29 Apr 2004 10:54:36 +1000: > Wakko Warner wrote: > >>I don't know. What if you have some huge application that only > >>runs once per day for 10 minutes? Do you want it to be consuming > >>100MB of your memory for the other 23 hours and 50 minutes for > >>no good reason? > > > > > > I keep soffice open all the time. The box in question has 512mb of ram. > > This is one app, even though I use it infrequently, would prefer that it > > never be swapped out. Mainly when I want to use it, I *WANT* it now (ie not > > waiting for it to come back from swap) > > > > This is just my oppinion. I personally feel that cache should use available > > memory, not already used memory (swapping apps out for more cache). > > > > On the other hand, suppose that with soffice resident the entire > time, you don't have enough memory to cache an entire kernel tree > (or video you are editing, or whatever). For the kernel example, I only ever compile once before rebooting[1] :) This I think is the kind of thing that a kernel will never automatically detect. This *must* be in the hands of the administrator, who will know what they are doing (hopefully). [1] I have never had enough memory on machines that I use to compile kernels, to cache an entire tree anyway -- I'd much rather mozilla use it than a cache which will never be reused -- TimC -- http://astronomy.swin.edu.au/staff/tconnors/ [transporting bed]... across several suburbs and a large salt water harbour. Well, they thoughtfully bridged the harbour in the 1930s, so the problem was actually transporting it across several suburbs and a long single span bridge. -- Hipatia ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:50 ` Wakko Warner 2004-04-29 0:53 ` Jeff Garzik 2004-04-29 0:54 ` Nick Piggin @ 2004-04-29 21:45 ` Denis Vlasenko 2 siblings, 0 replies; 128+ messages in thread From: Denis Vlasenko @ 2004-04-29 21:45 UTC (permalink / raw) To: Wakko Warner, Nick Piggin; +Cc: linux-kernel On Thursday 29 April 2004 03:50, Wakko Warner wrote: > > I don't know. What if you have some huge application that only > > runs once per day for 10 minutes? Do you want it to be consuming > > 100MB of your memory for the other 23 hours and 50 minutes for > > no good reason? > > I keep soffice open all the time. The box in question has 512mb of ram. > This is one app, even though I use it infrequently, would prefer that it > never be swapped out. Mainly when I want to use it, I *WANT* it now (ie > not waiting for it to come back from swap) I'm afraid a part of the problem is that there are apps which are way too bloated. Fighting bloat is thankless and hard, so almost everybody simply throws RAM at the problem. Well. Having thrown lotsa RAM at the problem, it may feel 'better' until you realize you need not only RAM but *also* disk bandwidth to move bloat from disk to RAM and back. Come on, lets admit it. Proper fix to 'I want OpenOffice to be responsive' problem is to make it several times smaller. Everything else is more or less a workaround. It's a pity size optimizations are not too popular even on lkml. > This is just my oppinion. I personally feel that cache should use > available memory, not already used memory (swapping apps out for more > cache). -- vda ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:21 ` Nick Piggin 2004-04-29 0:50 ` Wakko Warner @ 2004-04-29 0:58 ` Marc Singer 2004-04-29 3:48 ` Nick Piggin 2004-04-29 20:01 ` Horst von Brand 2 siblings, 1 reply; 128+ messages in thread From: Marc Singer @ 2004-04-29 0:58 UTC (permalink / raw) To: Nick Piggin; +Cc: Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel On Thu, Apr 29, 2004 at 10:21:24AM +1000, Nick Piggin wrote: > Anyway, I have a small set of VM patches which attempt to improve > this sort of behaviour if anyone is brave enough to try them. > Against -mm kernels only I'm afraid (the objrmap work causes some > porting difficulty). Is this the same patch you wanted me to try? Remember, the embedded system where NFS IO was pushing my application out of memory. Setting swappiness to zero was a temporary fix. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:58 ` Marc Singer @ 2004-04-29 3:48 ` Nick Piggin 2004-04-29 4:20 ` Marc Singer 0 siblings, 1 reply; 128+ messages in thread From: Nick Piggin @ 2004-04-29 3:48 UTC (permalink / raw) To: Marc Singer Cc: Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel, Russell King Marc Singer wrote: > On Thu, Apr 29, 2004 at 10:21:24AM +1000, Nick Piggin wrote: > >>Anyway, I have a small set of VM patches which attempt to improve >>this sort of behaviour if anyone is brave enough to try them. >>Against -mm kernels only I'm afraid (the objrmap work causes some >>porting difficulty). > > > Is this the same patch you wanted me to try? > > Remember, the embedded system where NFS IO was pushing my > application out of memory. Setting swappiness to zero was a > temporary fix. > > Yes this is the same patch I wanted you to try. Yes I remember your problem! Didn't anyone come up with a patch for you to test the stale PTE theory? If so, what where the results? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 3:48 ` Nick Piggin @ 2004-04-29 4:20 ` Marc Singer 2004-04-29 4:26 ` Nick Piggin ` (2 more replies) 0 siblings, 3 replies; 128+ messages in thread From: Marc Singer @ 2004-04-29 4:20 UTC (permalink / raw) To: Nick Piggin Cc: Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel, Russell King On Thu, Apr 29, 2004 at 01:48:02PM +1000, Nick Piggin wrote: > Marc Singer wrote: > >On Thu, Apr 29, 2004 at 10:21:24AM +1000, Nick Piggin wrote: > > > >>Anyway, I have a small set of VM patches which attempt to improve > >>this sort of behaviour if anyone is brave enough to try them. > >>Against -mm kernels only I'm afraid (the objrmap work causes some > >>porting difficulty). > > > > > >Is this the same patch you wanted me to try? > > > > Remember, the embedded system where NFS IO was pushing my > > application out of memory. Setting swappiness to zero was a > > temporary fix. > > > > > > Yes this is the same patch I wanted you to try. Yes I > remember your problem! > > Didn't anyone come up with a patch for you to test the > stale PTE theory? If so, what where the results? Russell King is working on a lot of things for the MMU code in ARM. I'm waiting to see where he ends up. I believe he's planning on removing the lazy PTE release logic. I hacked at it for some time. And I'm convinced that I correctly forced the TLBs to be flushed. Still, I was never able to get the system to behave. Now, I just read a comment you or WLI made about the page cache use-once logic. I wonder if that's the real culprit? As I wrote to Andrew Morton, the kernel seems to be assigning an awful lot of value to page cache pages that are used once (or twice?). I know that it would be expensive to perform an HTG aging algorithm where the head of the LRU list is really LRU. Does your patch pursue this line of thought? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 4:20 ` Marc Singer @ 2004-04-29 4:26 ` Nick Piggin 2004-04-29 14:49 ` Marc Singer 2004-04-29 6:38 ` William Lee Irwin III 2004-04-29 7:36 ` Russell King 2 siblings, 1 reply; 128+ messages in thread From: Nick Piggin @ 2004-04-29 4:26 UTC (permalink / raw) To: Marc Singer Cc: Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel, Russell King Marc Singer wrote: > On Thu, Apr 29, 2004 at 01:48:02PM +1000, Nick Piggin wrote: > >>Marc Singer wrote: >> >>>On Thu, Apr 29, 2004 at 10:21:24AM +1000, Nick Piggin wrote: >>> >>> >>>>Anyway, I have a small set of VM patches which attempt to improve >>>>this sort of behaviour if anyone is brave enough to try them. >>>>Against -mm kernels only I'm afraid (the objrmap work causes some >>>>porting difficulty). >>> >>> >>>Is this the same patch you wanted me to try? >>> >>> Remember, the embedded system where NFS IO was pushing my >>> application out of memory. Setting swappiness to zero was a >>> temporary fix. >>> >>> >> >>Yes this is the same patch I wanted you to try. Yes I >>remember your problem! >> >>Didn't anyone come up with a patch for you to test the >>stale PTE theory? If so, what where the results? > > > Russell King is working on a lot of things for the MMU code in ARM. > I'm waiting to see where he ends up. I believe he's planning on > removing the lazy PTE release logic. > > I hacked at it for some time. And I'm convinced that I correctly > forced the TLBs to be flushed. Still, I was never able to get the > system to behave. > > Now, I just read a comment you or WLI made about the page cache > use-once logic. I wonder if that's the real culprit? As I wrote to > Andrew Morton, the kernel seems to be assigning an awful lot of value > to page cache pages that are used once (or twice?). I know that it > would be expensive to perform an HTG aging algorithm where the head of > the LRU list is really LRU. Does your patch pursue this line of > thought? > Yes it includes something which should help that. Along with the "split active lists" that I mentioned might help your problem when WLI first came up with the change to the swappiness calculation for your problem. It would be great if you had time to give my patch a run. It hasn't been widely stress tested yet though, so no production systems, of course! ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 4:26 ` Nick Piggin @ 2004-04-29 14:49 ` Marc Singer 2004-04-30 4:08 ` Nick Piggin 0 siblings, 1 reply; 128+ messages in thread From: Marc Singer @ 2004-04-29 14:49 UTC (permalink / raw) To: Nick Piggin Cc: Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel, Russell King On Thu, Apr 29, 2004 at 02:26:17PM +1000, Nick Piggin wrote: > Yes it includes something which should help that. Along with > the "split active lists" that I mentioned might help your > problem when WLI first came up with the change to the > swappiness calculation for your problem. > > It would be great if you had time to give my patch a run. > It hasn't been widely stress tested yet though, so no > production systems, of course! As I said, I'm game to have a go. The trouble was that it doesn't apply. My development kernel has an RMK patch applied that seems to conflict with the MM patch on which you depend. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 14:49 ` Marc Singer @ 2004-04-30 4:08 ` Nick Piggin 2004-04-30 22:31 ` Marc Singer 0 siblings, 1 reply; 128+ messages in thread From: Nick Piggin @ 2004-04-30 4:08 UTC (permalink / raw) To: Marc Singer Cc: Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel, Russell King Marc Singer wrote: > On Thu, Apr 29, 2004 at 02:26:17PM +1000, Nick Piggin wrote: > >>Yes it includes something which should help that. Along with >>the "split active lists" that I mentioned might help your >>problem when WLI first came up with the change to the >>swappiness calculation for your problem. >> >>It would be great if you had time to give my patch a run. >>It hasn't been widely stress tested yet though, so no >>production systems, of course! > > > As I said, I'm game to have a go. The trouble was that it doesn't > apply. My development kernel has an RMK patch applied that seems to > conflict with the MM patch on which you depend. > You would probably be better off trying a simpler change first actually: in mm/vmscan.c, shrink_list(), change: if (res == WRITEPAGE_ACTIVATE) { ClearPageReclaim(page); goto activate_locked; } to if (res == WRITEPAGE_ACTIVATE) { ClearPageReclaim(page); goto keep_locked; } I think it is not the correct solution, but should narrow down your problem. Let us know how it goes. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 4:08 ` Nick Piggin @ 2004-04-30 22:31 ` Marc Singer 0 siblings, 0 replies; 128+ messages in thread From: Marc Singer @ 2004-04-30 22:31 UTC (permalink / raw) To: Nick Piggin Cc: Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel, Russell King On Fri, Apr 30, 2004 at 02:08:03PM +1000, Nick Piggin wrote: > You would probably be better off trying a simpler change > first actually: > > in mm/vmscan.c, shrink_list(), change: > > if (res == WRITEPAGE_ACTIVATE) { > ClearPageReclaim(page); > goto activate_locked; > } > > to > > if (res == WRITEPAGE_ACTIVATE) { > ClearPageReclaim(page); > goto keep_locked; > } > > I think it is not the correct solution, but should narrow > down your problem. Let us know how it goes. OK, thanks. It might be a few days before I can get to this. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 4:20 ` Marc Singer 2004-04-29 4:26 ` Nick Piggin @ 2004-04-29 6:38 ` William Lee Irwin III 2004-04-29 7:36 ` Russell King 2 siblings, 0 replies; 128+ messages in thread From: William Lee Irwin III @ 2004-04-29 6:38 UTC (permalink / raw) To: Marc Singer Cc: Nick Piggin, Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel, Russell King On Wed, Apr 28, 2004 at 09:20:47PM -0700, Marc Singer wrote: > Now, I just read a comment you or WLI made about the page cache > use-once logic. I wonder if that's the real culprit? As I wrote to > Andrew Morton, the kernel seems to be assigning an awful lot of value > to page cache pages that are used once (or twice?). I know that it > would be expensive to perform an HTG aging algorithm where the head of > the LRU list is really LRU. Does your patch pursue this line of > thought? I don't recall ever having seen an actual pure LRU patch. The physical scanning infrastructure should be enough to implement most global replacement algorithms with. It's always good to compare alternatives. Also, we should have an implementation of random replacement just as a control case to verify we do better than random. -- wli ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 4:20 ` Marc Singer 2004-04-29 4:26 ` Nick Piggin 2004-04-29 6:38 ` William Lee Irwin III @ 2004-04-29 7:36 ` Russell King 2004-04-29 10:44 ` Nick Piggin 2 siblings, 1 reply; 128+ messages in thread From: Russell King @ 2004-04-29 7:36 UTC (permalink / raw) To: Marc Singer Cc: Nick Piggin, Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel On Wed, Apr 28, 2004 at 09:20:47PM -0700, Marc Singer wrote: > Russell King is working on a lot of things for the MMU code in ARM. > I'm waiting to see where he ends up. I believe he's planning on > removing the lazy PTE release logic. Essentially it came to a grinding halt due to the shere size of the task of sorting out the crappy includes, which is far to large for a stable kernel. I may go back to the original problem and sort it a different way, but for the time being, I'm occupied in other areas. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 7:36 ` Russell King @ 2004-04-29 10:44 ` Nick Piggin 2004-04-29 11:04 ` Russell King 0 siblings, 1 reply; 128+ messages in thread From: Nick Piggin @ 2004-04-29 10:44 UTC (permalink / raw) To: Russell King Cc: Marc Singer, Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel Russell King wrote: > On Wed, Apr 28, 2004 at 09:20:47PM -0700, Marc Singer wrote: > >>Russell King is working on a lot of things for the MMU code in ARM. >>I'm waiting to see where he ends up. I believe he's planning on >>removing the lazy PTE release logic. > > > Essentially it came to a grinding halt due to the shere size of the > task of sorting out the crappy includes, which is far to large for a > stable kernel. > > I may go back to the original problem and sort it a different way, > but for the time being, I'm occupied in other areas. > Anyway, Marc said he tried flushing the tlb and that didn't solve his problem. The problem might be the one identified in the thread: 2.6.6-rc{1,2} bad VM/NFS interaction in case of dirty page writeback ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 10:44 ` Nick Piggin @ 2004-04-29 11:04 ` Russell King 2004-04-29 14:52 ` Marc Singer 0 siblings, 1 reply; 128+ messages in thread From: Russell King @ 2004-04-29 11:04 UTC (permalink / raw) To: Nick Piggin Cc: Marc Singer, Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel On Thu, Apr 29, 2004 at 08:44:25PM +1000, Nick Piggin wrote: > Anyway, Marc said he tried flushing the tlb and that didn't > solve his problem. Nevertheless, when you have a TLB with ASIDs, there will be even less pressure to flush these entries from the TLB, so in effect we might as well save the expense of implementing the page aging in the first place. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 11:04 ` Russell King @ 2004-04-29 14:52 ` Marc Singer 0 siblings, 0 replies; 128+ messages in thread From: Marc Singer @ 2004-04-29 14:52 UTC (permalink / raw) To: Nick Piggin, Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel On Thu, Apr 29, 2004 at 12:04:19PM +0100, Russell King wrote: > On Thu, Apr 29, 2004 at 08:44:25PM +1000, Nick Piggin wrote: > > Anyway, Marc said he tried flushing the tlb and that didn't > > solve his problem. > > Nevertheless, when you have a TLB with ASIDs, there will be even less > pressure to flush these entries from the TLB, so in effect we might > as well save the expense of implementing the page aging in the first > place. Uh, oh. My FLA translator just broke. Whatsa ASID? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:21 ` Nick Piggin 2004-04-29 0:50 ` Wakko Warner 2004-04-29 0:58 ` Marc Singer @ 2004-04-29 20:01 ` Horst von Brand 2004-04-29 20:18 ` Martin J. Bligh ` (4 more replies) 2 siblings, 5 replies; 128+ messages in thread From: Horst von Brand @ 2004-04-29 20:01 UTC (permalink / raw) To: Nick Piggin; +Cc: Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel Nick Piggin <nickpiggin@yahoo.com.au> said: [...] > I don't know. What if you have some huge application that only > runs once per day for 10 minutes? Do you want it to be consuming > 100MB of your memory for the other 23 hours and 50 minutes for > no good reason? How on earth is the kernel supposed to know that for this one particular job you don't care if it takes 3 hours instead of 10 minutes, just because you don't want to spare enough preciousss RAM? -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 20:01 ` Horst von Brand @ 2004-04-29 20:18 ` Martin J. Bligh 2004-04-29 20:33 ` David B. Stevens ` (3 subsequent siblings) 4 siblings, 0 replies; 128+ messages in thread From: Martin J. Bligh @ 2004-04-29 20:18 UTC (permalink / raw) To: Horst von Brand, Nick Piggin Cc: Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel --On Thursday, April 29, 2004 16:01:11 -0400 Horst von Brand <vonbrand@inf.utfsm.cl> wrote: > Nick Piggin <nickpiggin@yahoo.com.au> said: > > [...] > >> I don't know. What if you have some huge application that only >> runs once per day for 10 minutes? Do you want it to be consuming >> 100MB of your memory for the other 23 hours and 50 minutes for >> no good reason? > > How on earth is the kernel supposed to know that for this one particular > job you don't care if it takes 3 hours instead of 10 minutes, just because > you don't want to spare enough preciousss RAM? Nice value is the obvious interface for such information. M. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 20:01 ` Horst von Brand 2004-04-29 20:18 ` Martin J. Bligh @ 2004-04-29 20:33 ` David B. Stevens 2004-04-29 22:42 ` Steve Youngs 2004-04-29 20:36 ` Paul Jackson ` (2 subsequent siblings) 4 siblings, 1 reply; 128+ messages in thread From: David B. Stevens @ 2004-04-29 20:33 UTC (permalink / raw) To: Horst von Brand Cc: Nick Piggin, Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel Horst von Brand wrote: > Nick Piggin <nickpiggin@yahoo.com.au> said: > > [...] > > >>I don't know. What if you have some huge application that only >>runs once per day for 10 minutes? Do you want it to be consuming >>100MB of your memory for the other 23 hours and 50 minutes for >>no good reason? > > > How on earth is the kernel supposed to know that for this one particular > job you don't care if it takes 3 hours instead of 10 minutes, just because > you don't want to spare enough preciousss RAM? Maybe the kernel should be told by the apps exactly what they require in the way of memory and maybe how to slice up what the app gets for memory from the kernel. This would not be the first time that applications had to specify such information. That was what REGION= and other such parameters were all about in other operating systems. Then the kernel would have free use of what was left until the next app started etc .... Cheers, Dave ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 20:33 ` David B. Stevens @ 2004-04-29 22:42 ` Steve Youngs 0 siblings, 0 replies; 128+ messages in thread From: Steve Youngs @ 2004-04-29 22:42 UTC (permalink / raw) To: Linux Kernel Cc: Horst von Brand, Nick Piggin, Jeff Garzik, Andrew Morton, brettspamacct, David B. Stevens * David B Stevens <dsteven3@maine.rr.com> writes: > Maybe the kernel should be told by the apps exactly what they > require in the way of memory So what happens when Mr BloatyApp says: "Yo, Mr Kernel, gimme all ya got baby!" -- |---<Steve Youngs>---------------<GnuPG KeyID: A94B3003>---| | Ashes to ashes, dust to dust. | | The proof of the pudding, is under the crust. | |----------------------------------<steve@youngs.au.com>---| ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 20:01 ` Horst von Brand 2004-04-29 20:18 ` Martin J. Bligh 2004-04-29 20:33 ` David B. Stevens @ 2004-04-29 20:36 ` Paul Jackson 2004-04-29 21:19 ` Andrew Morton 2004-04-29 21:38 ` Timothy Miller 2004-04-30 5:15 ` Nick Piggin 2004-04-30 6:20 ` Tim Connors 4 siblings, 2 replies; 128+ messages in thread From: Paul Jackson @ 2004-04-29 20:36 UTC (permalink / raw) To: Horst von Brand; +Cc: nickpiggin, jgarzik, akpm, brettspamacct, linux-kernel > How on earth is the kernel supposed to know that for this one particular > job you don't care if it takes 3 hours instead of 10 minutes, I'd pay ten bucks (yeah, I'm a cheapskate) for an option that I could twiddle that would mark my nightly updatedb and backup jobs as ones to use reduced memory footprint (both for file caching and backing user virtual address space), even if it took much longer. So, rather than protest in mock outrage that it's impossible for the kernel to know this, instead answer the question as stated in all seriousness ... well ... how _could_ the kernel know, and what _could_ the kernel do if it knew. What mechanism(s) would be needed so that the kernel could restrict a jobs memory usage? Heh - indeed perhaps the answer is closer than I realize. For SGI's big NUMA boxes, managing memory placement is sufficiently critical that we are inventing or encouraging ways (such as Andi Kleen's numa stuff) to control memory placement per node per job. Perhaps this needs to be extended to portions of a node (this job can only use 1 Gb of the memory on that 2 Gb node) and to other memory uses (file cache, not just user space memory). -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 20:36 ` Paul Jackson @ 2004-04-29 21:19 ` Andrew Morton 2004-04-29 21:34 ` Paul Jackson 2004-05-06 13:08 ` Pavel Machek 2004-04-29 21:38 ` Timothy Miller 1 sibling, 2 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 21:19 UTC (permalink / raw) To: Paul Jackson; +Cc: vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel Paul Jackson <pj@sgi.com> wrote: > > > How on earth is the kernel supposed to know that for this one particular > > job you don't care if it takes 3 hours instead of 10 minutes, > > I'd pay ten bucks (yeah, I'm a cheapskate) for an option that I could > twiddle that would mark my nightly updatedb and backup jobs as ones to > use reduced memory footprint (both for file caching and backing user > virtual address space), even if it took much longer. > > So, rather than protest in mock outrage that it's impossible for the > kernel to know this, instead answer the question as stated in all > seriousness ... well ... how _could_ the kernel know, and what _could_ > the kernel do if it knew. What mechanism(s) would be needed so that > the kernel could restrict a jobs memory usage? Two things: a) a knob to say "only reclaim pagecache". We have that now. b) a knob to say "reclaim vfs caches harder". That's simply a matter of boosting the return value from shrink_dcache_memory() and perhaps shrink_icache_memory(). It's not quite what you're after, but it's close. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 21:19 ` Andrew Morton @ 2004-04-29 21:34 ` Paul Jackson 2004-04-29 21:57 ` Andrew Morton 2004-05-06 13:08 ` Pavel Machek 1 sibling, 1 reply; 128+ messages in thread From: Paul Jackson @ 2004-04-29 21:34 UTC (permalink / raw) To: Andrew Morton; +Cc: vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel Andrew wrote: > Two things: > a) a knob to say "only reclaim pagecache". We have that now. > b) a knob to say "reclaim vfs caches harder" ... Are these knobs system wide in affect, or per job? I am presuming system wide. When I'm working late, I want my updatedb/backup jobs to scrunch themselves into a corner, even as my builds and gui desktop continue to fly and suck up RAM. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 21:34 ` Paul Jackson @ 2004-04-29 21:57 ` Andrew Morton 2004-04-29 22:18 ` Paul Jackson 2004-04-30 0:04 ` Andy Isaacson 0 siblings, 2 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 21:57 UTC (permalink / raw) To: Paul Jackson; +Cc: vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel Paul Jackson <pj@sgi.com> wrote: > > Andrew wrote: > > Two things: > > a) a knob to say "only reclaim pagecache". We have that now. > > b) a knob to say "reclaim vfs caches harder" ... > > Are these knobs system wide in affect, or per job? > I am presuming system wide. yup, system-wide. > When I'm working late, I want my updatedb/backup jobs > to scrunch themselves into a corner, even as my builds > and gui desktop continue to fly and suck up RAM. Sure. That's not purely a cacheing thing though. Even if the background activity was clamped to just a few megs of cache you'll find that the seek activity is a killer, and needs a limitation mechanism. Although the anticipatory scheduler helps here a lot. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 21:57 ` Andrew Morton @ 2004-04-29 22:18 ` Paul Jackson 2004-04-30 0:04 ` Andy Isaacson 1 sibling, 0 replies; 128+ messages in thread From: Paul Jackson @ 2004-04-29 22:18 UTC (permalink / raw) To: Andrew Morton, Shailabh Nagar Cc: vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel Andrew wrote: > Even if the background activity was clamped to just a few megs > of cache you'll find that the seek activity is a killer, and > needs a limitation mechanism. True - the seek activity is another critical resource that would need to be throttled to keep updatedb/backup from interferring with my late night labours. Let's see, that's: 1) cpu scheduling ticks 2) memory for virtual address backing store 3) memory for file related caching 4) disk arm motion Hmmm ... actually not so much a numa-placement extension, but rather a CKRM opportunity. CKRM focuses on measuring and restraining how much of specified critical resources a task is using; numa placement on which cpus or memory nodes are allowed to be used at all. See further the CKRM thread of Shailabh Nagar, also running on lkml today. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 21:57 ` Andrew Morton 2004-04-29 22:18 ` Paul Jackson @ 2004-04-30 0:04 ` Andy Isaacson 2004-04-30 0:32 ` Andrew Morton 1 sibling, 1 reply; 128+ messages in thread From: Andy Isaacson @ 2004-04-30 0:04 UTC (permalink / raw) To: Andrew Morton Cc: Paul Jackson, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel On Thu, Apr 29, 2004 at 02:57:25PM -0700, Andrew Morton wrote: > > When I'm working late, I want my updatedb/backup jobs > > to scrunch themselves into a corner, even as my builds > > and gui desktop continue to fly and suck up RAM. > > Sure. That's not purely a cacheing thing though. Even if the background > activity was clamped to just a few megs of cache you'll find that the > seek activity is a killer, and needs a limitation mechanism. Although the > anticipatory scheduler helps here a lot. I grant that in the updatedb case (or the backup case), the seeks are going to suck and they're inherently on the same spindle as the user's data, so there's no fixing it (short of a real "IO nice"). But in a related case, I have a background daemon that does a lot of IO (mostly sequential, one page at a time read/modify/write of a multi-GB file) to a filesystem on a separate spindle from my main filesystems. I'd like to use a similar mechanism to say "don't let this program eat my pagecache" that will let the daemon crunch away without severely impacting my desktop work. -andy ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 0:04 ` Andy Isaacson @ 2004-04-30 0:32 ` Andrew Morton 2004-04-30 0:54 ` Paul Jackson 2004-04-30 7:52 ` Jeff Garzik 0 siblings, 2 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-30 0:32 UTC (permalink / raw) To: Andy Isaacson Cc: pj, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel Andy Isaacson <adi@hexapodia.org> wrote: > > But in a related case, I have a background daemon that does a lot of IO > (mostly sequential, one page at a time read/modify/write of a multi-GB > file) to a filesystem on a separate spindle from my main filesystems. > I'd like to use a similar mechanism to say "don't let this program eat > my pagecache" that will let the daemon crunch away without severely > impacting my desktop work. fadvise(POSIX_FADV_DONTNEED) is ideal for this. Run it once per megabyte or so. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 0:32 ` Andrew Morton @ 2004-04-30 0:54 ` Paul Jackson 2004-04-30 5:38 ` Andy Isaacson 2004-04-30 7:52 ` Jeff Garzik 1 sibling, 1 reply; 128+ messages in thread From: Paul Jackson @ 2004-04-30 0:54 UTC (permalink / raw) To: Andrew Morton Cc: adi, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel Andrew wrote: > fadvise(POSIX_FADV_DONTNEED) is ideal for this. Perhaps ... perhaps not. Just as the knobs "only reclaim pagecache" and "reclaim vfs caches harder" had too big a scope (system-wide), using fadvise might have too small a scope (currently cached pages of current task only). If his background daemon is some shell script, say, that uses 'cat' to generate the i/o to the other spindle, then he probably wants to be marking that daemon job "don't let this entire job eat my pagecache", not rebuilding a hacked up cat command with added POSIX_FADV_DONTNEED calls every megabyte. CKRM to the rescue ... ?? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 0:54 ` Paul Jackson @ 2004-04-30 5:38 ` Andy Isaacson 2004-04-30 6:00 ` Nick Piggin 0 siblings, 1 reply; 128+ messages in thread From: Andy Isaacson @ 2004-04-30 5:38 UTC (permalink / raw) To: Paul Jackson Cc: Andrew Morton, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel On Thu, Apr 29, 2004 at 05:54:42PM -0700, Paul Jackson wrote: > Andrew wrote: > > fadvise(POSIX_FADV_DONTNEED) is ideal for this. > > Perhaps ... perhaps not. > > Just as the knobs "only reclaim pagecache" and "reclaim vfs caches > harder" had too big a scope (system-wide), using fadvise might have too > small a scope (currently cached pages of current task only). > > If his background daemon is some shell script, say, that uses 'cat' to > generate the i/o to the other spindle, then he probably wants to be > marking that daemon job "don't let this entire job eat my pagecache", > not rebuilding a hacked up cat command with added POSIX_FADV_DONTNEED > calls every megabyte. Well, in this case it's bespoke C code so adding the fadvise isn't terribly difficult. (The structure of the code doesn't lend itself to "do this every 10 MB" but I'm sure I can hack something up.) It would be nicer if the kernel would do the right thing without needing to have its hand held, but the fadvise will solve my immediate need. (Assuming it works on 2.4.) -andy ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 5:38 ` Andy Isaacson @ 2004-04-30 6:00 ` Nick Piggin 0 siblings, 0 replies; 128+ messages in thread From: Nick Piggin @ 2004-04-30 6:00 UTC (permalink / raw) To: Andy Isaacson Cc: Paul Jackson, Andrew Morton, vonbrand, jgarzik, brettspamacct, linux-kernel Andy Isaacson wrote: > On Thu, Apr 29, 2004 at 05:54:42PM -0700, Paul Jackson wrote: > >>Andrew wrote: >> >>>fadvise(POSIX_FADV_DONTNEED) is ideal for this. >> >>Perhaps ... perhaps not. >> >>Just as the knobs "only reclaim pagecache" and "reclaim vfs caches >>harder" had too big a scope (system-wide), using fadvise might have too >>small a scope (currently cached pages of current task only). >> >>If his background daemon is some shell script, say, that uses 'cat' to >>generate the i/o to the other spindle, then he probably wants to be >>marking that daemon job "don't let this entire job eat my pagecache", >>not rebuilding a hacked up cat command with added POSIX_FADV_DONTNEED >>calls every megabyte. > > > Well, in this case it's bespoke C code so adding the fadvise isn't > terribly difficult. (The structure of the code doesn't lend itself to > "do this every 10 MB" but I'm sure I can hack something up.) > > It would be nicer if the kernel would do the right thing without needing > to have its hand held, but the fadvise will solve my immediate need. > (Assuming it works on 2.4.) Right for one thing will always be wrong for another. If you want some specific behaviour then you might have to hold hands. That is just the way it goes. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 0:32 ` Andrew Morton 2004-04-30 0:54 ` Paul Jackson @ 2004-04-30 7:52 ` Jeff Garzik 2004-04-30 8:02 ` Andrew Morton 1 sibling, 1 reply; 128+ messages in thread From: Jeff Garzik @ 2004-04-30 7:52 UTC (permalink / raw) To: Andrew Morton Cc: Andy Isaacson, pj, vonbrand, nickpiggin, brettspamacct, linux-kernel Andrew Morton wrote: > Andy Isaacson <adi@hexapodia.org> wrote: > >> But in a related case, I have a background daemon that does a lot of IO >> (mostly sequential, one page at a time read/modify/write of a multi-GB >> file) to a filesystem on a separate spindle from my main filesystems. >> I'd like to use a similar mechanism to say "don't let this program eat >> my pagecache" that will let the daemon crunch away without severely >> impacting my desktop work. > > > fadvise(POSIX_FADV_DONTNEED) is ideal for this. Run it once per megabyte > or so. Sweet. I'm so happy you added posix_fadvise (way back when), and even happier to hear this. Does our fadvise support len==0 ("I mean the whole file")? That's defined in POSIX, and would allow a compliant app to simply POSIX_FADV_DONTNEED once at the beginning. Jeff ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 7:52 ` Jeff Garzik @ 2004-04-30 8:02 ` Andrew Morton 2004-04-30 8:09 ` Jeff Garzik 0 siblings, 1 reply; 128+ messages in thread From: Andrew Morton @ 2004-04-30 8:02 UTC (permalink / raw) To: Jeff Garzik; +Cc: adi, pj, vonbrand, nickpiggin, brettspamacct, linux-kernel Jeff Garzik <jgarzik@pobox.com> wrote: > > > fadvise(POSIX_FADV_DONTNEED) is ideal for this. Run it once per megabyte > > or so. > > > Sweet. I'm so happy you added posix_fadvise (way back when), and even > happier to hear this. There are a number of other goodies we could add to it, as linux extensions. > Does our fadvise support len==0 ("I mean the whole file")? That's > defined in POSIX, and would allow a compliant app to simply > POSIX_FADV_DONTNEED once at the beginning. Well I'll be darned. --- 25/mm/fadvise.c~fadvise-len-fix 2004-04-30 00:58:00.437598504 -0700 +++ 25-akpm/mm/fadvise.c 2004-04-30 00:59:03.237051536 -0700 @@ -38,6 +38,9 @@ asmlinkage long sys_fadvise64_64(int fd, goto out; } + if (len == 0) /* 0 == "all data following offset" */ + len = -1; + bdi = mapping->backing_dev_info; switch (advice) { _ ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 8:02 ` Andrew Morton @ 2004-04-30 8:09 ` Jeff Garzik 0 siblings, 0 replies; 128+ messages in thread From: Jeff Garzik @ 2004-04-30 8:09 UTC (permalink / raw) To: Andrew Morton; +Cc: adi, pj, vonbrand, nickpiggin, brettspamacct, linux-kernel Andrew Morton wrote: > Jeff Garzik <jgarzik@pobox.com> wrote: >> Does our fadvise support len==0 ("I mean the whole file")? That's >> defined in POSIX, and would allow a compliant app to simply >> POSIX_FADV_DONTNEED once at the beginning. > > > Well I'll be darned. FWIW the specific language is "If len is zero, all data following offset is specified." (for others, you probably already have this somewhere) http://www.opengroup.org/onlinepubs/007904975/functions/posix_fadvise.html top level SuSv3: http://www.opengroup.org/onlinepubs/007904975/toc.htm Jeff ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 21:19 ` Andrew Morton 2004-04-29 21:34 ` Paul Jackson @ 2004-05-06 13:08 ` Pavel Machek 2004-05-07 15:53 ` Hugh Dickins 1 sibling, 1 reply; 128+ messages in thread From: Pavel Machek @ 2004-05-06 13:08 UTC (permalink / raw) To: Andrew Morton Cc: Paul Jackson, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel Hi! > > > How on earth is the kernel supposed to know that for this one particular > > > job you don't care if it takes 3 hours instead of 10 minutes, > > > > I'd pay ten bucks (yeah, I'm a cheapskate) for an option that I could > > twiddle that would mark my nightly updatedb and backup jobs as ones to > > use reduced memory footprint (both for file caching and backing user > > virtual address space), even if it took much longer. > > > > So, rather than protest in mock outrage that it's impossible for the > > kernel to know this, instead answer the question as stated in all > > seriousness ... well ... how _could_ the kernel know, and what _could_ > > the kernel do if it knew. What mechanism(s) would be needed so that > > the kernel could restrict a jobs memory usage? > > Two things: > > a) a knob to say "only reclaim pagecache". We have that now. > > b) a knob to say "reclaim vfs caches harder". That's simply a matter of boosting > the return value from shrink_dcache_memory() and perhaps shrink_icache_memory(). > > It's not quite what you're after, but it's close. Perhaps what we really want is "swap_back_in" script? That way you could do "updatedb; swap_back_in" in cron and be happy. Pavel -- When do you have heart between your knees? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-05-06 13:08 ` Pavel Machek @ 2004-05-07 15:53 ` Hugh Dickins 2004-05-07 16:57 ` Pavel Machek 0 siblings, 1 reply; 128+ messages in thread From: Hugh Dickins @ 2004-05-07 15:53 UTC (permalink / raw) To: Pavel Machek Cc: Andrew Morton, Paul Jackson, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel On Thu, 6 May 2004, Pavel Machek wrote: > > Perhaps what we really want is "swap_back_in" script? That way you > could do "updatedb; swap_back_in" in cron and be happy. swapoff -a; swapon -a Hugh ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-05-07 15:53 ` Hugh Dickins @ 2004-05-07 16:57 ` Pavel Machek 2004-05-07 17:30 ` Timothy Miller 2004-05-12 17:52 ` Rob Landley 0 siblings, 2 replies; 128+ messages in thread From: Pavel Machek @ 2004-05-07 16:57 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, Paul Jackson, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel Hi! > > Perhaps what we really want is "swap_back_in" script? That way you > > could do "updatedb; swap_back_in" in cron and be happy. > > swapoff -a; swapon -a Good point... it will not bring back executable pages, through. Pavel -- Horseback riding is like software... ...vgf orggre jura vgf serr. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-05-07 16:57 ` Pavel Machek @ 2004-05-07 17:30 ` Timothy Miller 2004-05-07 17:43 ` Hugh Dickins 2004-05-07 17:48 ` Mark Frazer 2004-05-12 17:52 ` Rob Landley 1 sibling, 2 replies; 128+ messages in thread From: Timothy Miller @ 2004-05-07 17:30 UTC (permalink / raw) To: Pavel Machek Cc: Hugh Dickins, Andrew Morton, Paul Jackson, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel Pavel Machek wrote: > Hi! > > >>>Perhaps what we really want is "swap_back_in" script? That way you >>>could do "updatedb; swap_back_in" in cron and be happy. >> >>swapoff -a; swapon -a > > > Good point... it will not bring back executable pages, through. > > Pavel Wouldn't this also be a problem if you are using more memory than you have physical RAM? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-05-07 17:30 ` Timothy Miller @ 2004-05-07 17:43 ` Hugh Dickins 2004-05-07 17:48 ` Mark Frazer 1 sibling, 0 replies; 128+ messages in thread From: Hugh Dickins @ 2004-05-07 17:43 UTC (permalink / raw) To: Timothy Miller Cc: Pavel Machek, Andrew Morton, Paul Jackson, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel On Fri, 7 May 2004, Timothy Miller wrote: > > > >>>Perhaps what we really want is "swap_back_in" script? That way you > >>>could do "updatedb; swap_back_in" in cron and be happy. > >> > >>swapoff -a; swapon -a > > Wouldn't this also be a problem if you are using more memory than you > have physical RAM? On 2.4 it certainly would be a problem (hang with others OOM-killed). On 2.6 it shouldn't be a problem: the swapoff may fail upfront if there's way too little memory, or it may get itself OOM-killed if it runs out on the way, but it ought not to upset other tasks. But of course, Pavel is right that it does nothing for file backed. Hugh ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-05-07 17:30 ` Timothy Miller 2004-05-07 17:43 ` Hugh Dickins @ 2004-05-07 17:48 ` Mark Frazer 1 sibling, 0 replies; 128+ messages in thread From: Mark Frazer @ 2004-05-07 17:48 UTC (permalink / raw) To: Timothy Miller Cc: Pavel Machek, Hugh Dickins, Andrew Morton, Paul Jackson, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel Timothy Miller <miller@techsource.com> [04/05/07 13:26]: > >>>Perhaps what we really want is "swap_back_in" script? That way you > >>>could do "updatedb; swap_back_in" in cron and be happy. > >> > >>swapoff -a; swapon -a > > > > > >Good point... it will not bring back executable pages, through. > > > > Pavel > > Wouldn't this also be a problem if you are using more memory than you > have physical RAM? #!/bin/bash swapused=$(( $(sed -n -e 's/ \+/-/g' -e '/^Swap:/p' /proc/meminfo | cut -d'-' -f2,4) )) bufsused=$(( $(sed -n -e 's/ \+/+/g' -e '/^Mem:/p' /proc/meminfo | cut -d'+' -f6,7) )) if [ $bufsused -gt $(( 11 * swapused / 10 )) ] then swapoff -a; swapon -a fi or something like that -- How can I live my life if I can't tell good from evil? - Fry ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-05-07 16:57 ` Pavel Machek 2004-05-07 17:30 ` Timothy Miller @ 2004-05-12 17:52 ` Rob Landley 2004-05-17 20:16 ` Hugh Dickins 1 sibling, 1 reply; 128+ messages in thread From: Rob Landley @ 2004-05-12 17:52 UTC (permalink / raw) To: Pavel Machek, Hugh Dickins Cc: Andrew Morton, Paul Jackson, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel On Friday 07 May 2004 11:57, Pavel Machek wrote: > Hi! > > > > Perhaps what we really want is "swap_back_in" script? That way you > > > could do "updatedb; swap_back_in" in cron and be happy. > > > > swapoff -a; swapon -a > > Good point... it will not bring back executable pages, through. > > Pavel What would the above do if there wasn't enough memory to swap everything back in? (Presumably, the swapoff would fail?) Rob -- www.linucon.org: Linux Expo and Science Fiction Convention October 8-10, 2004 in Austin Texas. (I'm the con chair.) ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-05-12 17:52 ` Rob Landley @ 2004-05-17 20:16 ` Hugh Dickins 0 siblings, 0 replies; 128+ messages in thread From: Hugh Dickins @ 2004-05-17 20:16 UTC (permalink / raw) To: Rob Landley Cc: Pavel Machek, Andrew Morton, Paul Jackson, vonbrand, nickpiggin, jgarzik, brettspamacct, linux-kernel On Wed, 12 May 2004, Rob Landley wrote: > On Friday 07 May 2004 11:57, Pavel Machek wrote: > > Hi! > > > > > > Perhaps what we really want is "swap_back_in" script? That way you > > > > could do "updatedb; swap_back_in" in cron and be happy. > > > > > > swapoff -a; swapon -a > > > > Good point... it will not bring back executable pages, through. > > > > Pavel > > What would the above do if there wasn't enough memory to swap everything back > in? (Presumably, the swapoff would fail?) Repeating my earlier reply to a similar question... On 2.4 it certainly would be a problem (hang with others OOM-killed). On 2.6 it shouldn't be a problem: the swapoff may fail upfront if there's way too little memory, or it may get itself OOM-killed if it runs out on the way, but it ought not to upset other tasks. But of course, Pavel is right that it does nothing for file backed. Hugh ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 20:36 ` Paul Jackson 2004-04-29 21:19 ` Andrew Morton @ 2004-04-29 21:38 ` Timothy Miller 2004-04-29 21:47 ` Paul Jackson 1 sibling, 1 reply; 128+ messages in thread From: Timothy Miller @ 2004-04-29 21:38 UTC (permalink / raw) To: Paul Jackson Cc: Horst von Brand, nickpiggin, jgarzik, akpm, brettspamacct, linux-kernel Paul Jackson wrote: > Heh - indeed perhaps the answer is closer than I realize. For SGI's big > NUMA boxes, managing memory placement is sufficiently critical that we > are inventing or encouraging ways (such as Andi Kleen's numa stuff) to > control memory placement per node per job. Perhaps this needs to be > extended to portions of a node (this job can only use 1 Gb of the memory > on that 2 Gb node) and to other memory uses (file cache, not just user > space memory). > Is updatedb run with a nice level greater than zero? Perhaps nice level could influence how much a process is allowed to affect page cache. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 21:38 ` Timothy Miller @ 2004-04-29 21:47 ` Paul Jackson 2004-04-29 22:18 ` Timothy Miller 0 siblings, 1 reply; 128+ messages in thread From: Paul Jackson @ 2004-04-29 21:47 UTC (permalink / raw) To: Timothy Miller Cc: vonbrand, nickpiggin, jgarzik, akpm, brettspamacct, linux-kernel Timothy wrote: > Perhaps nice level could influence how much a process is allowed to > affect page cache. I'm from the school that says 'nice' applies to scheduling priority, not memory usage. I'd expect a different knob, a per-task inherited value as is 'nice', to control memory usage. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 21:47 ` Paul Jackson @ 2004-04-29 22:18 ` Timothy Miller 2004-04-29 22:46 ` Paul Jackson 2004-04-30 3:37 ` Tim Connors 0 siblings, 2 replies; 128+ messages in thread From: Timothy Miller @ 2004-04-29 22:18 UTC (permalink / raw) To: Paul Jackson Cc: vonbrand, nickpiggin, jgarzik, akpm, brettspamacct, linux-kernel Paul Jackson wrote: > Timothy wrote: > >>Perhaps nice level could influence how much a process is allowed to >>affect page cache. > > > I'm from the school that says 'nice' applies to scheduling priority, > not memory usage. > > I'd expect a different knob, a per-task inherited value as is 'nice', > to control memory usage. > Linux kernel developers seem to be of the mind that you cannot trust what applications tell you about themselves, so it's better to use heuristics to GUESS how to schedule something, rather than to add YET ANOTHER property to it. Nick, Con, Ingo, and others have done an impressive job of taking the guess/heuristic approach to scheduling. I don't see why that can't be taken further. Also, there seems to be strong resistance to adding a property to something which is not easily accessible through existing UNIX tools. "nice" and "renice" commands have been around forever. Adding another control requires new commands, new libc functions, changes to "top", etc. Besides, when would you want to have a sched-nice of -20 and an io-nice of 20, or a sched-nice of 20 and an io-nice of -20? Things like that would make no sense. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 22:18 ` Timothy Miller @ 2004-04-29 22:46 ` Paul Jackson 2004-04-29 23:08 ` Timothy Miller 2004-04-30 3:37 ` Tim Connors 1 sibling, 1 reply; 128+ messages in thread From: Paul Jackson @ 2004-04-29 22:46 UTC (permalink / raw) To: Timothy Miller Cc: vonbrand, nickpiggin, jgarzik, akpm, brettspamacct, linux-kernel Timothy wrote: > Linux kernel developers seem to be of the mind that you cannot trust > what applications tell you about themselves, so it's better to use > heuristics to GUESS how to schedule something, rather than to add YET > ANOTHER property to it. Both are needed. The thing has to work pretty well, for most people, most of the time, without human intervention. And there needs to be knobs to optimize performance. Even with no conscious end-user administration, a knob on the cron job that runs updatedb, setup by the distribution packager, could have wide spread impact on the responsiveness of a system, when the user sits down with the first cup of coffee to scan the morning headlines and incoming email er eh spam. As to whether it's two nice calls, or one with dual affect, let's not confuse the kernel API with that seen by the user. The kernel should provide a minimum spanning set of orthogonal mechanisms, and not be second guessing whether the user is out of their ever loving mind to be asking for a hot cpu, cold io, job. In other words, I wouldn't agree with your take that it's a matter of not trusting the application, better to GUESS. Rather I would say that there is a preference, and a good one at that, to not use an excessive number of knobs as a cop-out to avoid working hard to get the widest practical range of cases to behave reasonably, without intervention, and a preference to keep what knobs that are there short, sweet and minimally interacting. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 22:46 ` Paul Jackson @ 2004-04-29 23:08 ` Timothy Miller 2004-04-30 12:31 ` Bart Samwel 0 siblings, 1 reply; 128+ messages in thread From: Timothy Miller @ 2004-04-29 23:08 UTC (permalink / raw) To: Paul Jackson Cc: vonbrand, nickpiggin, jgarzik, akpm, brettspamacct, linux-kernel Paul Jackson wrote: > > In other words, I wouldn't agree with your take that it's a matter of > not trusting the application, better to GUESS. Okay. > Rather I would say that > there is a preference, and a good one at that, to not use an excessive > number of knobs as a cop-out to avoid working hard to get the widest > practical range of cases to behave reasonably, without intervention, and > a preference to keep what knobs that are there short, sweet and > minimally interacting. > Agreed. And this is why I suggested not adding another knob but rather going with the existing nice value. Mind you, this shouldn't necessarily be done without some kind of experimentation. Put two knobs in the kernel and try varying them to each other to see what sorts of jobs, if any, would benefit in a disparity between cpu-nice and io-nice. If there IS a significant difference, then add the extra knob. If there isn't, then don't. Another possibility would be to have one knob that controls cpu-nice, and another knob that controls io-nice minus cpu-nice, so if you REALLY want to make them different, you can, but typically, they are set to be the same. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 23:08 ` Timothy Miller @ 2004-04-30 12:31 ` Bart Samwel 2004-04-30 15:35 ` Clay Haapala 2004-04-30 22:11 ` Paul Jackson 0 siblings, 2 replies; 128+ messages in thread From: Bart Samwel @ 2004-04-30 12:31 UTC (permalink / raw) To: Timothy Miller Cc: Paul Jackson, vonbrand, nickpiggin, jgarzik, Andrew Morton, brettspamacct, linux-kernel On Fri, 2004-04-30 at 01:08, Timothy Miller wrote: > Agreed. And this is why I suggested not adding another knob but rather > going with the existing nice value. > > Mind you, this shouldn't necessarily be done without some kind of > experimentation. Put two knobs in the kernel and try varying them to > each other to see what sorts of jobs, if any, would benefit in a > disparity between cpu-nice and io-nice. If there IS a significant > difference, then add the extra knob. If there isn't, then don't. Thought experiment: what would happen when you set the hypothetical cpu-nice and io-nice knobs very differently? * cpu-nice 20, io-nice -20: Read I/O will finish immediately, but then the process will have to wait for ages to get a CPU slice to process the data, so why would you want to read it so quickly? The process can do as much write I/O as it wants, but why is it not okay to take ages to write the data if it's okay to take ages to produce it? * cpu-nice -20, io-nice 20: Read I/O will take ages, but once the data gets there, the processor is immediately taken to process the data as fast as possible. If it was okay to take ages to read the data, why would you want to process it as soon as you can? It makes some sense for write I/O though: produce data as fast as the other processes will allow you to write it. But if you're going to hog the CPU completely, why give other processes the chance to do a lot of I/O while they don't get the CPU time to submit any I/O? Going for a smaller difference makes more sense. As far as I can tell, giving the knobs very different values doesn't make much sense. The same arguments go for medium-sized differences. And if we're going to give the knobs only *slightly* different values, we might as well make them the same. If we really need cpu-nice = 0 and io-nice = 3 somewhere, then I think that's a sign of a kernel problem, where the kernel's various nice-knobs aren't calibrated correctly to result in the same amount of "niceness" when they're set to the same value. And cpu-nice = io-nice = 3 would probably have about the same effect. BTW, if there *are* going to be more knobs, I suggest adding "memory-nice" as well. :) If you set memory-nice to 20, then the process will not kick out much memory from other processes (it will require more I/O -- but that can be throttled using io-nice). If you set memory-nice to -20, then the process will kick out the memory of all other processes if it needs to. --Bart ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 12:31 ` Bart Samwel @ 2004-04-30 15:35 ` Clay Haapala 2004-04-30 15:44 ` Bart Samwel 2004-04-30 22:11 ` Paul Jackson 1 sibling, 1 reply; 128+ messages in thread From: Clay Haapala @ 2004-04-30 15:35 UTC (permalink / raw) To: bart Cc: Timothy Miller, Paul Jackson, vonbrand, nickpiggin, jgarzik, Andrew Morton, brettspamacct, linux-kernel On Fri, 30 Apr 2004, Bart Samwel uttered the following: > > Thought experiment: what would happen when you set the hypothetical > cpu-nice and io-nice knobs very differently? > Dunno why, but this talk of knobs makes me think of the "effects-mix" knob on my bass amp that controls how much effects-loop signal is mixed with the "dry" guitar signal. Getting back to kernel talk, we have a "swappiness" knob, right? Should there be, or is there already, a way to dynamically vary the effect of swappiness [within a range], based on some monitored system characteristics such as keyboard/mouse (HID) input or some other identifiable profile? Perhaps this is similar to nice/fairness logic in the process schedulers. Using HID as a profile, if I'm up late working on a paper in OO and using a browser like Mozilla when updatedb fires up, the fact that there is recent keyboard/mouse input has been seen would modify swappiness down. However, if I've fallen asleep in my chair for an hour when updatedb fires up, no recent input events will have been detected, and updatedb gets the high range of swappiness effect. If I happen to wake up in the middle of it, I just have to accept it'll take time to wake my apps up, but at least they will get progressively more responsive as I use 'em. I use the term "profile" because I wouldn't want to have just HID events be the trigger. If a machine's main use is database or web-serving, perhaps the appropriate events to monitor would be, say, traffic on specified TCP ports or network interfaces. The amount of extra work should be no more than what goes on with entropy generation, I would think. -- Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056 6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD "Oh, *that* Physics Prize. Well, I just substituted 'stupidity' for 'dark matter' in the equations, and it all came together." ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 15:35 ` Clay Haapala @ 2004-04-30 15:44 ` Bart Samwel 0 siblings, 0 replies; 128+ messages in thread From: Bart Samwel @ 2004-04-30 15:44 UTC (permalink / raw) To: Clay Haapala Cc: Timothy Miller, Paul Jackson, vonbrand, nickpiggin, jgarzik, Andrew Morton, brettspamacct, linux-kernel On Fri, 2004-04-30 at 17:35, Clay Haapala wrote: > On Fri, 30 Apr 2004, Bart Samwel uttered the following: > > > > Thought experiment: what would happen when you set the hypothetical > > cpu-nice and io-nice knobs very differently? > > > Dunno why, but this talk of knobs makes me think of the "effects-mix" > knob on my bass amp that controls how much effects-loop signal is > mixed with the "dry" guitar signal. > > Getting back to kernel talk, we have a "swappiness" knob, right? > Should there be, or is there already, a way to dynamically vary the > effect of swappiness [within a range], based on some monitored system > characteristics such as keyboard/mouse (HID) input or some other > identifiable profile? Perhaps this is similar to nice/fairness logic > in the process schedulers. This kind of thing is exactly what has been avoided by using interactivity boosts, and taking that into account in an "io-nice" value as well should solve that. Other profiles might be interesting though. Interactive tasks have a tendency to be interactive for a short while, and then be noninteractive for a long time. I'm thinking that it might be worthwhile to do something with that, i.e. to keep a bonus for "past interactivity" on some pages based on the fact that they were originally loaded by a still-existing process that was once/is marked as interactive. --Bart ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 12:31 ` Bart Samwel 2004-04-30 15:35 ` Clay Haapala @ 2004-04-30 22:11 ` Paul Jackson 1 sibling, 0 replies; 128+ messages in thread From: Paul Jackson @ 2004-04-30 22:11 UTC (permalink / raw) To: bart Cc: miller, vonbrand, nickpiggin, jgarzik, akpm, brettspamacct, linux-kernel > Thought experiment: what would happen when you set the hypothetical > cpu-nice and io-nice knobs very differently? If there was one, single implementation hook in the kernel where making some decision depend on a user setting cleanly adapted both i/o and cpu priority, then yes, your thought experiment would recommend that this one hook was sufficient, and lead to a single user knob to control it. But in this case, there are obviously two implementation hooks - the classic one in the scheduler that affects cpu usage, and another off in some i/o code that affects i/o usage. So then the question comes - do we have one knob over this that is ganged to both hooks, or do we have two knobs, one per hook. Ganging these two hooks together, to control them in synchrony to a single user setting, is a policy choice. It's saying that we don't think you will ever want to run them out of sync, so as a matter of policy, we are ganging them together. I prefer to avoid nonessential policy in the kernel. Best to simply expose each independent kernel facility, 1-to-1, to the user. Let them decide when and if if these two settings should be ganged. I find gratuitous (not needed for system reliability) policy in the kernel to be a greater negative than another system call. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 22:18 ` Timothy Miller 2004-04-29 22:46 ` Paul Jackson @ 2004-04-30 3:37 ` Tim Connors 1 sibling, 0 replies; 128+ messages in thread From: Tim Connors @ 2004-04-30 3:37 UTC (permalink / raw) Cc: Paul Jackson, vonbrand, nickpiggin, jgarzik, akpm, brettspamacct, linux-kernel Timothy Miller <miller@techsource.com> said on Thu, 29 Apr 2004 18:18:06 -0400: > Paul Jackson wrote: > > Timothy wrote: > > > >>Perhaps nice level could influence how much a process is allowed to > >>affect page cache. > > > > > > I'm from the school that says 'nice' applies to scheduling priority, > > not memory usage. > > > > I'd expect a different knob, a per-task inherited value as is 'nice', > > to control memory usage. > > Linux kernel developers seem to be of the mind that you cannot trust > what applications tell you about themselves, so it's better to use > heuristics to GUESS how to schedule something, rather than to add YET > ANOTHER property to it. Why is that? On the desktop system/workstation, which is what we are talking about here -- we want the desktop system in particular to be responsive -- the user wouldn't try to do anythign malicious, so why not trust the applications? openoffice and mozilla and my visualisation software are going to know what they want out of the kernel (possibly with safegaurds such that they only tell the kernel what they want if the kernel happens to be in some tested range, perhaps), the kernel sure as hell won't know what my custom built application wants via heuristics, because I am doing something that no-one else is, and so my exact workloads haven't been experienced or designed for. On a server, you can have a /proc file to tell the kernel to ignore everything an application tells you, or ignore/believe application with uid in ranges xx--yy. -- TimC -- http://astronomy.swin.edu.au/staff/tconnors/ Beware of Programmers who carry screwdrivers. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 20:01 ` Horst von Brand ` (2 preceding siblings ...) 2004-04-29 20:36 ` Paul Jackson @ 2004-04-30 5:15 ` Nick Piggin 2004-04-30 6:20 ` Tim Connors 4 siblings, 0 replies; 128+ messages in thread From: Nick Piggin @ 2004-04-30 5:15 UTC (permalink / raw) To: Horst von Brand; +Cc: Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel Horst von Brand wrote: > Nick Piggin <nickpiggin@yahoo.com.au> said: > > [...] > > >>I don't know. What if you have some huge application that only >>runs once per day for 10 minutes? Do you want it to be consuming >>100MB of your memory for the other 23 hours and 50 minutes for >>no good reason? > > > How on earth is the kernel supposed to know that for this one particular > job you don't care if it takes 3 hours instead of 10 minutes, just because > you don't want to spare enough preciousss RAM? It doesn't know that. But if you restrict this guy's working set to a tiny amount and just allow it to thrash away, then if nothing else, all that wasted disk IO will slow all your other stuff down too. However that is something we can allow you to tune, via RSS limits. I am maintaining Rik's patch for that and will send it on when rmap optimisation work is more finalised. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 20:01 ` Horst von Brand ` (3 preceding siblings ...) 2004-04-30 5:15 ` Nick Piggin @ 2004-04-30 6:20 ` Tim Connors 2004-04-30 6:34 ` Nick Piggin 4 siblings, 1 reply; 128+ messages in thread From: Tim Connors @ 2004-04-30 6:20 UTC (permalink / raw) To: Horst von Brand Cc: Nick Piggin, Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel Horst von Brand <vonbrand@inf.utfsm.cl> said on Thu, 29 Apr 2004 16:01:11 -0400: > Nick Piggin <nickpiggin@yahoo.com.au> said: > > [...] > > > I don't know. What if you have some huge application that only > > runs once per day for 10 minutes? Do you want it to be consuming > > 100MB of your memory for the other 23 hours and 50 minutes for > > no good reason? > > How on earth is the kernel supposed to know that for this one particular > job you don't care if it takes 3 hours instead of 10 minutes, just because > you don't want to spare enough preciousss RAM? Note that we are not talking about having insufficient memory. In my case (2.4 kernel - ie, 2.6 with swapiness 0%) there is more than enough memory to contain all my working set - it's only because cache is too eager to claim memory that is otherwise in use that non-optimalities occur. -- TimC -- http://astronomy.swin.edu.au/staff/tconnors/ If I'd known computer science was going to be like this, I'd never have given up being a rock 'n' roll star. -- G. Hirst ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 6:20 ` Tim Connors @ 2004-04-30 6:34 ` Nick Piggin 2004-04-30 7:05 ` Tim Connors 0 siblings, 1 reply; 128+ messages in thread From: Nick Piggin @ 2004-04-30 6:34 UTC (permalink / raw) To: Tim Connors Cc: Horst von Brand, Jeff Garzik, Andrew Morton, brettspamacct, linux-kernel Tim Connors wrote: > Horst von Brand <vonbrand@inf.utfsm.cl> said on Thu, 29 Apr 2004 16:01:11 -0400: > >>Nick Piggin <nickpiggin@yahoo.com.au> said: >> >>[...] >> >> >>>I don't know. What if you have some huge application that only >>>runs once per day for 10 minutes? Do you want it to be consuming >>>100MB of your memory for the other 23 hours and 50 minutes for >>>no good reason? >> >>How on earth is the kernel supposed to know that for this one particular >>job you don't care if it takes 3 hours instead of 10 minutes, just because >>you don't want to spare enough preciousss RAM? > > > Note that we are not talking about having insufficient memory. In my > case (2.4 kernel - ie, 2.6 with swapiness 0%) there is more than > enough memory to contain all my working set - it's only because cache > is too eager to claim memory that is otherwise in use that > non-optimalities occur. > Well depends on what you mean by working set. In our memory manager, there is a point where often used "file cache" (ie. unmapped cache) is considered preferable to unused or little used "application memory" (mapped memory). There will be a point where even the most swap phobic desktop users will want to start swapping. I missed the description of your exact problem... was it in this thread somewhere? Testing 2.6 would be appreciated if possible too. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 6:34 ` Nick Piggin @ 2004-04-30 7:05 ` Tim Connors 2004-04-30 7:15 ` Nick Piggin 2004-04-30 9:18 ` Re[2]: " vda 0 siblings, 2 replies; 128+ messages in thread From: Tim Connors @ 2004-04-30 7:05 UTC (permalink / raw) To: Nick Piggin Cc: Horst von Brand, Jeff Garzik, Andrew Morton, brettspamacct, Linux Kernel Mailing List On Fri, 30 Apr 2004, Nick Piggin wrote: > Tim Connors wrote: > > Horst von Brand <vonbrand@inf.utfsm.cl> said on Thu, 29 Apr 2004 16:01:11 -0400: > > > >>Nick Piggin <nickpiggin@yahoo.com.au> said: > >> > >>[...] > >> > >> > >>>I don't know. What if you have some huge application that only > >>>runs once per day for 10 minutes? Do you want it to be consuming > >>>100MB of your memory for the other 23 hours and 50 minutes for > >>>no good reason? > >> > >>How on earth is the kernel supposed to know that for this one particular > >>job you don't care if it takes 3 hours instead of 10 minutes, just because > >>you don't want to spare enough preciousss RAM? > > > > > > Note that we are not talking about having insufficient memory. In my > > case (2.4 kernel - ie, 2.6 with swapiness 0%) there is more than > > enough memory to contain all my working set - it's only because cache > > is too eager to claim memory that is otherwise in use that > > non-optimalities occur. > > > > Well depends on what you mean by working set. > > In our memory manager, there is a point where often used > "file cache" (ie. unmapped cache) is considered preferable > to unused or little used "application memory" (mapped > memory). Sure - and indeed I have current swap usage (now that I am not doing anything) of 300MB - that's good because I am not using whatever's in there. > I missed the description of your exact problem... was it in > this thread somewhere? Testing 2.6 would be appreciated if > possible too. http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.3/1033.html http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.3/1394.html In short: I have 512MB RAM. The files I am reading are read over NFS, created remotely. I read them once, and then discard them (either delete them, or keep them around, but don't read them again -- obviously, in the latter case, they have an opportunity to pollute the cache for a long time). If I do read them twice, it's only been a few times that I have noticed any speedup the second time around, even for the smaller files (if they're small enough not to cause a problem with swapping vital bits of software out, then they are small enough that extra reads are hardly noticable anyway - given the damn fast raid disks behind NFS we have) For one of the file types, these can be several hundred megs, and are read by an astronomical package - I have no idea how they are read, but because it is fits, maybe it has to go back and read the header several times, but I doubt the image data is read more than once. Come display time, memory usage in this example is roughly the size of the FITS file - several hundred megs. I don't recall how big X is, in such a situation, but the sum of them both, plus recently used apps like mozilla, is below RAM size. Watching top, I can see, during the read, mozilla rsize memory usage go down rapidly - about as rapidly as cache usage going up. Parts of X and the window manager also get swapped out, so when I move to another virtual page, I get to watch fvwm redraw the screen - this is not too painful though, because only a few megs need be swapped back in (although the HD is seeking all over the place as things thrash about, so it does still take non-negligible amount of time). Mozilla takes about 30 seconds to swap back in (~50-100MB - again, lots of thrashing from the HD). I don't recall once mozilla is swapped back in, whether cache usage has dropped again, or whether the visualisation software loses its pages instead. I'll try to test 2.6 here (half the battle is convincing the sysadmins this is a worthwhile persuit) - I use that at home quite succesfully, but I don't do big files or visualisation there. -- TimC -- http://astronomy.swin.edu.au/staff/tconnors/ 'It's amazing I won. I was running against peace, prosperity and incumbency.' -- George W. Bush. June 14, 2001, to Swedish PM Goran Perrson, unaware that a live television camera was still rolling. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 7:05 ` Tim Connors @ 2004-04-30 7:15 ` Nick Piggin 2004-04-30 9:18 ` Re[2]: " vda 1 sibling, 0 replies; 128+ messages in thread From: Nick Piggin @ 2004-04-30 7:15 UTC (permalink / raw) To: Tim Connors Cc: Horst von Brand, Jeff Garzik, Andrew Morton, brettspamacct, Linux Kernel Mailing List Tim Connors wrote: > On Fri, 30 Apr 2004, Nick Piggin wrote: > >>In our memory manager, there is a point where often used >>"file cache" (ie. unmapped cache) is considered preferable >>to unused or little used "application memory" (mapped >>memory). > > > Sure - and indeed I have current swap usage (now that I am not doing > anything) of 300MB - that's good because I am not using whatever's in > there. > > >>I missed the description of your exact problem... was it in >>this thread somewhere? Testing 2.6 would be appreciated if >>possible too. > > > http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.3/1033.html > http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.3/1394.html > > In short: I have 512MB RAM. The files I am reading are read over NFS, Ah, thanks for the description. 2.6 has a problem with NFS filesystems that would cause symptoms like yours. I'm not sure whether 2.4 has something similar or not. You can probably expect a fix for 2.6.6 but I'm not sure if there is a patch that has been agreed upon yet. In short, there probably isn't much point testing 2.6 right now. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re[2]: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 7:05 ` Tim Connors 2004-04-30 7:15 ` Nick Piggin @ 2004-04-30 9:18 ` vda 2004-04-30 9:33 ` Arjan van de Ven 1 sibling, 1 reply; 128+ messages in thread From: vda @ 2004-04-30 9:18 UTC (permalink / raw) To: Tim Connors Cc: Nick Piggin, Horst von Brand, Jeff Garzik, Andrew Morton, brettspamacct, Linux Kernel Mailing List Hello Tim, Friday, April 30, 2004, 10:05:19 AM, you wrote: TC> Parts of X and the window manager also get swapped out, so when I move to TC> another virtual page, I get to watch fvwm redraw the screen - this is not TC> too painful though, because only a few megs need be swapped back in TC> (although the HD is seeking all over the place as things thrash about, so TC> it does still take non-negligible amount of time). Mozilla takes about 30 TC> seconds to swap back in (~50-100MB - again, lots of thrashing from the I don't want to say that you're seeing optimal behavior, just a different angle of view: wny in hell browser should have such ridiculously large RSS? Why it tries to keep so much stuff in the RAM? Multimedia content (jpegs etc) is typically cached in filesystem, so Mozilla polluted pagecache with it when it saved JPEGs to the cache *and* then it keeps 'em in RAM too, which doubles RAM usage. Most probably more, there is severe internal fragmentation problems after you use such a large application for several hours straight. Why not reread JPEG whenever you need it? If you are using Mozilla right now, it will be in pagecache. When you are away, cache will be discarded, no need to page out Mozilla pages with JPEG content - because there aren't Mozilla pages with JPEG content! RSS is smaller, less internal fragmentation, everyone's happy. (I don't specifically target Mozilla, it have shown some improvement recently. Replace with your favorite monstrosity) Kernel folks probably can improve kernel behavior. Next version of $BloatyApp will happily "use" gained performance and improved RAM management as an excuse for even less optimal code. It's a vicious circle. -- vda ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: Re[2]: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 9:18 ` Re[2]: " vda @ 2004-04-30 9:33 ` Arjan van de Ven 2004-04-30 11:33 ` Denis Vlasenko 2004-04-30 16:19 ` Timothy Miller 0 siblings, 2 replies; 128+ messages in thread From: Arjan van de Ven @ 2004-04-30 9:33 UTC (permalink / raw) To: vda Cc: Tim Connors, Nick Piggin, Horst von Brand, Jeff Garzik, Andrew Morton, brettspamacct, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 289 bytes --] > Multimedia content (jpegs etc) is typically cached in > filesystem, so Mozilla polluted pagecache with it when > it saved JPEGs to the cache *and* then it keeps 'em in RAM > too, which doubles RAM usage. well if mozilla just mmap's the jpegs there is no double caching ..... [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 9:33 ` Arjan van de Ven @ 2004-04-30 11:33 ` Denis Vlasenko 2004-04-30 16:19 ` Timothy Miller 1 sibling, 0 replies; 128+ messages in thread From: Denis Vlasenko @ 2004-04-30 11:33 UTC (permalink / raw) To: arjanv Cc: Tim Connors, Nick Piggin, Horst von Brand, Jeff Garzik, Andrew Morton, brettspamacct, Linux Kernel Mailing List On Friday 30 April 2004 12:33, Arjan van de Ven wrote: > > Multimedia content (jpegs etc) is typically cached in > > filesystem, so Mozilla polluted pagecache with it when > > it saved JPEGs to the cache *and* then it keeps 'em in RAM > > too, which doubles RAM usage. > > well if mozilla just mmap's the jpegs there is no double caching ..... I may be wrong but Mozilla keeps unpacked bitmap in malloc() space. The point is, $BloatyApp will keep bloating up while you are working upon improving kernel. I guess it's very clear which process is easier. You cannot win that race. This is OpenOffice on idle 128Mb RAM, 1000MHz Duron machine with KDE, Mozilla and KMail running: # time swriter;time swriter real 0m33.906s user 0m10.163s sys 0m0.705s real 0m24.025s user 0m10.069s sys 0m0.546s I closed windows as soon as it appeared. Freshly started swriter in top: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 2081 root 15 0 93980 41M 80300 S 1,3 34,0 0:09 0 soffice.bin 93 megs. 10 seconds of 1GHz CPU time taken... -- vda ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 9:33 ` Arjan van de Ven 2004-04-30 11:33 ` Denis Vlasenko @ 2004-04-30 16:19 ` Timothy Miller 1 sibling, 0 replies; 128+ messages in thread From: Timothy Miller @ 2004-04-30 16:19 UTC (permalink / raw) To: arjanv Cc: vda, Tim Connors, Nick Piggin, Horst von Brand, Jeff Garzik, Andrew Morton, brettspamacct, Linux Kernel Mailing List Arjan van de Ven wrote: >>Multimedia content (jpegs etc) is typically cached in >>filesystem, so Mozilla polluted pagecache with it when >>it saved JPEGs to the cache *and* then it keeps 'em in RAM >>too, which doubles RAM usage. > > > well if mozilla just mmap's the jpegs there is no double caching ..... > What is cached in memory? The original JPEG or the decoded raw image? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:10 ` Jeff Garzik 2004-04-29 0:21 ` Nick Piggin @ 2004-04-29 0:49 ` Brett E. 2004-04-29 1:00 ` Andrew Morton ` (2 more replies) 1 sibling, 3 replies; 128+ messages in thread From: Brett E. @ 2004-04-29 0:49 UTC (permalink / raw) To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel Jeff Garzik wrote: > Andrew Morton wrote: > >> Swapout is good. It frees up unused memory. I run my desktop >> machines at >> swappiness=100. > > > > The definition of "unused" is quite subjective and app-dependent... > > I've see reports with increasing frequency about the swappiness of the > 2.6.x kernels, from people who were already annoyed at the swappiness of > 2.4.x kernels :) > > Favorite pathological (and quite common) examples are the various 4am > cron jobs that scan your entire filesystem. Running that process > overnight on a quiet machines practically guarantees a huge burst of > disk activity, with unwanted results: > 1) Inode and page caches are blown away > 2) A lot of your desktop apps are swapped out > > Additionally, a (IMO valid) maxim of sysadmins has been "a properly > configured server doesn't swap". There should be no reason why this > maxim becomes invalid over time. When Linux starts to swap out apps the > sysadmin knows will be useful in an hour, or six hours, or a day just > because it needs a bit more file cache, I get worried. > > There IMO should be some way to balance the amount of anon-vma's such > that the sysadmin can say "stop taking 70% of my box's memory for > disposable cache, use it instead for apps you would otherwise swap out, > you memory-hungry kernel you." > > Jeff Or how about "Use ALL the cache you want Mr. Kernel. But when I want more physical memory pages, just reap cache pages and only swap out when the cache is down to a certain size(configurable, say 100megs or something)." ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:49 ` Brett E. @ 2004-04-29 1:00 ` Andrew Morton 2004-04-29 1:24 ` Jeff Garzik ` (2 more replies) 2004-04-29 1:41 ` Tim Connors 2004-04-29 9:43 ` Helge Hafting 2 siblings, 3 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 1:00 UTC (permalink / raw) To: brettspamacct; +Cc: jgarzik, linux-kernel "Brett E." <brettspamacct@fastclick.com> wrote: > > Or how about "Use ALL the cache you want Mr. Kernel. But when I want > more physical memory pages, just reap cache pages and only swap out when > the cache is down to a certain size(configurable, say 100megs or > something)." Have you tried decreasing /proc/sys/vm/swappiness? That's what it is for. My point is that decreasing the tendency of the kernel to swap stuff out is wrong. You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine. Get it out on the disk, use the memory for something useful. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:00 ` Andrew Morton @ 2004-04-29 1:24 ` Jeff Garzik 2004-04-29 1:40 ` Andrew Morton 2004-04-29 1:30 ` Paul Mackerras 2004-04-29 1:46 ` Rik van Riel 2 siblings, 1 reply; 128+ messages in thread From: Jeff Garzik @ 2004-04-29 1:24 UTC (permalink / raw) To: Andrew Morton; +Cc: brettspamacct, linux-kernel Andrew Morton wrote: > "Brett E." <brettspamacct@fastclick.com> wrote: > >> Or how about "Use ALL the cache you want Mr. Kernel. But when I want >> more physical memory pages, just reap cache pages and only swap out when >> the cache is down to a certain size(configurable, say 100megs or >> something)." > > > Have you tried decreasing /proc/sys/vm/swappiness? That's what it is for. > > My point is that decreasing the tendency of the kernel to swap stuff out is > wrong. You really don't want hundreds of megabytes of BloatyApp's > untouched memory floating about in the machine. Get it out on the disk, > use the memory for something useful. Well, if it's truly untouched, then it never needs to be allocated a page or swapped out at all... just accounted for (overcommit on/off, etc. here) But I assume you are not talking about that, but instead talking about _rarely_ used pages, that were filled with some amount of data at some point in time. These are at the heart of the thread (or my point, at least) -- BloatyApp may be Oracle with a huge cache of its own, for which swapping out may be a huge mistake. Or Mozilla. After some amount of disk IO on my 512MB machine, Mozilla would be swapped out... when I had only been typing an email minutes before. BloatyApp? yes. Should it have been swapped out? Absolutely not. The 'SIZE' in top was only 160M and there were no other major apps running. Applications are increasingly playing second fiddle to cache ;-( Regardless of /proc/sys/vm/swappiness, I think it's a valid concern of sysadmins who request "hard cache limit", because they are seeing pathological behavior such that apps get swapped out when cache is over 50% of all available memory. Jeff ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:24 ` Jeff Garzik @ 2004-04-29 1:40 ` Andrew Morton 2004-04-29 1:47 ` Rik van Riel ` (2 more replies) 0 siblings, 3 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 1:40 UTC (permalink / raw) To: Jeff Garzik; +Cc: brettspamacct, linux-kernel Jeff Garzik <jgarzik@pobox.com> wrote: > > Andrew Morton wrote: > > "Brett E." <brettspamacct@fastclick.com> wrote: > > > >> Or how about "Use ALL the cache you want Mr. Kernel. But when I want > >> more physical memory pages, just reap cache pages and only swap out when > >> the cache is down to a certain size(configurable, say 100megs or > >> something)." > > > > > > Have you tried decreasing /proc/sys/vm/swappiness? That's what it is for. > > > > My point is that decreasing the tendency of the kernel to swap stuff out is > > wrong. You really don't want hundreds of megabytes of BloatyApp's > > untouched memory floating about in the machine. Get it out on the disk, > > use the memory for something useful. > > Well, if it's truly untouched, then it never needs to be allocated a > page or swapped out at all... just accounted for (overcommit on/off, > etc. here) > > But I assume you are not talking about that, but instead talking about > _rarely_ used pages, that were filled with some amount of data at some > point in time. Of course. My fairly modest desktop here stabilises at about 300 megs swapped out, with negligible swapin. That's all just crap which apps aren't using any more. Getting that memory out on disk, relatively freely is an important optimisation. > These are at the heart of the thread (or my point, at > least) -- BloatyApp may be Oracle with a huge cache of its own, for > which swapping out may be a huge mistake. Or Mozilla. After some > amount of disk IO on my 512MB machine, Mozilla would be swapped out... > when I had only been typing an email minutes before. OK, so it takes four seconds to swap mozilla back in, and you noticed it. Did you notice that those three kernel builds you just did ran in twenty seconds less time because they had more cache available? Nope. > Regardless of /proc/sys/vm/swappiness, I think it's a valid concern of > sysadmins who request "hard cache limit", because they are seeing > pathological behavior such that apps get swapped out when cache is over > 50% of all available memory. We should be sceptical of this. If they can provide *numbers* then fine. Otherwise, the subjective "oh gee, that took a long time" seat-of-the-pants stuff does not impress. If they want to feel better about it then sure, set swappiness to zero and live with less cache for the things which need it... Let me point out that the kernel right now, with default swappiness very much tends to reclaim cache rather than swapping stuff out. The top-of-thread report was incorrect, due to a misreading of kernel instrumentation. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:40 ` Andrew Morton @ 2004-04-29 1:47 ` Rik van Riel 2004-04-29 18:14 ` Adam Kropelin 2004-04-29 2:19 ` Tim Connors 2004-04-29 16:24 ` Martin J. Bligh 2 siblings, 1 reply; 128+ messages in thread From: Rik van Riel @ 2004-04-29 1:47 UTC (permalink / raw) To: Andrew Morton; +Cc: Jeff Garzik, brettspamacct, linux-kernel On Wed, 28 Apr 2004, Andrew Morton wrote: > OK, so it takes four seconds to swap mozilla back in, and you noticed it. > > Did you notice that those three kernel builds you just did ran in twenty > seconds less time because they had more cache available? Nope. That's exactly why desktops should be optimised to give the best performance where the user notices it most... -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:47 ` Rik van Riel @ 2004-04-29 18:14 ` Adam Kropelin 2004-04-30 3:17 ` Tim Connors 0 siblings, 1 reply; 128+ messages in thread From: Adam Kropelin @ 2004-04-29 18:14 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrew Morton, Jeff Garzik, brettspamacct, linux-kernel On Wed, Apr 28, 2004 at 09:47:45PM -0400, Rik van Riel wrote: > On Wed, 28 Apr 2004, Andrew Morton wrote: > > > OK, so it takes four seconds to swap mozilla back in, and you noticed it. > > > > Did you notice that those three kernel builds you just did ran in twenty > > seconds less time because they had more cache available? Nope. > > That's exactly why desktops should be optimised to give > the best performance where the user notices it most... Agreed. Looking at it from the standpoint of relative change, the time to bring the mozilla window to the foreground is increased by orders of magnitude while the kernel builds improve by a (relatively) small percent. Humans easily notice change in orders of magnitude and such changes can feel painful. Benchmarks notice $SMALLNUM percent long before a human will, especially if s/he has left the room because the job was going to take 10 minutes anyway. The 30 seconds saved off the compile run just isn't worth it sometimes if its side-effect is to disrupt the user's workflow. The 'swappiness' tunable may well give enough control over the situation to suit all sorts of users. If nothing else, this thread has raised awareness that such a tunable exists and can be played with to influence the kernel's decision-making. Distros, too, should give consideration to appropriate default settings to serve their intended users. --Adam ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 18:14 ` Adam Kropelin @ 2004-04-30 3:17 ` Tim Connors 0 siblings, 0 replies; 128+ messages in thread From: Tim Connors @ 2004-04-30 3:17 UTC (permalink / raw) Cc: Rik van Riel, Andrew Morton, Jeff Garzik, brettspamacct, linux-kernel Adam Kropelin <akropel1@rochester.rr.com> said on Thu, 29 Apr 2004 14:14:13 -0400: > On Wed, Apr 28, 2004 at 09:47:45PM -0400, Rik van Riel wrote: > > On Wed, 28 Apr 2004, Andrew Morton wrote: > > > > > OK, so it takes four seconds to swap mozilla back in, and you noticed it. > > > > > > Did you notice that those three kernel builds you just did ran in twenty > > > seconds less time because they had more cache available? Nope. > > > > That's exactly why desktops should be optimised to give > > the best performance where the user notices it most... ... > The 'swappiness' tunable may well give enough control over the situation > to suit all sorts of users. If nothing else, this thread has raised > awareness that such a tunable exists and can be played with to influence > the kernel's decision-making. Distros, too, should give consideration to > appropriate default settings to serve their intended users. Actually, I decided to investigate how 2.4 compares (we're still stuck on 2.4) According to this: http://www.uwsg.iu.edu/hypermail/linux/kernel/0210.1/0011.html 2.6 with swapiness of 0% is same as 2.4.19 - I assume 2.4.19's VM is the same as 2.4.26 (given feature freeze). I have always been completely unimpressed with the 2.4 VM - before and after the big change in ~2.4.10. It has *always* preferred to use cache in preference to a recently used application. So will this still apply to 2.6 with swapiness of 0%? I might try to get my sysadmin to put on 2.6, becuase 2.4 is quite unusable for some of the work I do (if I need mozilla at the same time as my visualisation software, which allocates a good 3/4 of RAM, after reading a file that is about that size, leaving still enough for mozilla and X combined, mozilla and parts of X still get swapped out - and the cahce is wasted, since I only ever read the file once, and it is written on another host) -- TimC -- http://astronomy.swin.edu.au/staff/tconnors/ "32-bit patch for a 16-bit GUI shell running on top of an 8-bit operating system written for a 4-bit processor by a 2-bit company who cannot stand 1 bit of competition." ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:40 ` Andrew Morton 2004-04-29 1:47 ` Rik van Riel @ 2004-04-29 2:19 ` Tim Connors 2004-04-29 16:24 ` Martin J. Bligh 2 siblings, 0 replies; 128+ messages in thread From: Tim Connors @ 2004-04-29 2:19 UTC (permalink / raw) To: Jeff Garzik; +Cc: brettspamacct, linux-kernel Andrew Morton <akpm@osdl.org> said on Wed, 28 Apr 2004 18:40:08 -0700: > Jeff Garzik <jgarzik@pobox.com> wrote: > > These are at the heart of the thread (or my point, at > > least) -- BloatyApp may be Oracle with a huge cache of its own, for > > which swapping out may be a huge mistake. Or Mozilla. After some > > amount of disk IO on my 512MB machine, Mozilla would be swapped out... > > when I had only been typing an email minutes before. > > OK, so it takes four seconds to swap mozilla back in, and you noticed it. Actually, about 20-30 seconds on all of my boxs (no, I have no idea why so slow even on the P4 I have here - swapping has always seemed overly slow on this machine, and yes, DMA is turned on) with a ~100MB mozilla image (plus the parts of X that get swapped out and need to be swapped in before the user sees any effect - X takes up about ~100MB res memory typically here, since I tend to have so many apps with cached pixmaps open and in current use). > Did you notice that those three kernel builds you just did ran in twenty > seconds less time because they had more cache available? Nope. Nope, because I never run 3 builds before rebooting - I do however run a lot of software that only ever reads a file once (the file was written on another host on the cluster, so the caching done at write time is of no benefit to us here. This is something that should be up to the admin, because the kernel *cannot* know what I want. And I don't think /proc/.../swapiness is enough to define what we want. > > Regardless of /proc/sys/vm/swappiness, I think it's a valid concern of > > sysadmins who request "hard cache limit", because they are seeing > > pathological behavior such that apps get swapped out when cache is over > > 50% of all available memory. > > We should be sceptical of this. If they can provide *numbers* then fine. > Otherwise, the subjective "oh gee, that took a long time" seat-of-the-pants > stuff does not impress. If they want to feel better about it then sure, > set swappiness to zero and live with less cache for the things which need > it... OK - I'll try to get around to giving you a vmstat 1 and maybe top output, and timing things next time I run one of these big visualisation jobs (it'd be very nice if this was all backported to 2.4, since this is what we are mostly using here -- I think I can find a 2.6 machine though)... -- TimC -- http://astronomy.swin.edu.au/staff/tconnors/ My code is giving me mixed signals. SIGSEGV then SIGILL then SIGBUS. -- me ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:40 ` Andrew Morton 2004-04-29 1:47 ` Rik van Riel 2004-04-29 2:19 ` Tim Connors @ 2004-04-29 16:24 ` Martin J. Bligh 2004-04-29 16:36 ` Chris Friesen 2 siblings, 1 reply; 128+ messages in thread From: Martin J. Bligh @ 2004-04-29 16:24 UTC (permalink / raw) To: Andrew Morton, Jeff Garzik; +Cc: brettspamacct, linux-kernel >> These are at the heart of the thread (or my point, at >> least) -- BloatyApp may be Oracle with a huge cache of its own, for >> which swapping out may be a huge mistake. Or Mozilla. After some >> amount of disk IO on my 512MB machine, Mozilla would be swapped out... >> when I had only been typing an email minutes before. > > OK, so it takes four seconds to swap mozilla back in, and you noticed it. > > Did you notice that those three kernel builds you just did ran in twenty > seconds less time because they had more cache available? Nope. The latency for interactive stuff is definitely more noticeable though, and thus arguably more important. Perhaps we should be tying the scheduler in more tightly with the VM - we've already decided there which apps are "interactive" and thus need low latency ... shouldn't we be giving a boost to their RAM pages as well, and favour keeping those paged in over other pages (whether other apps, or cache) logically? It's all latency still ... M. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 16:24 ` Martin J. Bligh @ 2004-04-29 16:36 ` Chris Friesen 2004-04-29 16:56 ` Martin J. Bligh 0 siblings, 1 reply; 128+ messages in thread From: Chris Friesen @ 2004-04-29 16:36 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Andrew Morton, Jeff Garzik, brettspamacct, linux-kernel Martin J. Bligh wrote: > The latency for interactive stuff is definitely more noticeable though, and > thus arguably more important. Perhaps we should be tying the scheduler in > more tightly with the VM - we've already decided there which apps are > "interactive" and thus need low latency ... shouldn't we be giving a boost > to their RAM pages as well, and favour keeping those paged in over other > pages (whether other apps, or cache) logically? It's all latency still ... I like this idea. Maybe make it more general though--tasks with high scheduler priority also get more of a memory priority boost. This will factor in the static priority as well as the interactivity bonus. Chris ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 16:36 ` Chris Friesen @ 2004-04-29 16:56 ` Martin J. Bligh 0 siblings, 0 replies; 128+ messages in thread From: Martin J. Bligh @ 2004-04-29 16:56 UTC (permalink / raw) To: Chris Friesen; +Cc: Andrew Morton, Jeff Garzik, brettspamacct, linux-kernel >> The latency for interactive stuff is definitely more noticeable though, and >> thus arguably more important. Perhaps we should be tying the scheduler in >> more tightly with the VM - we've already decided there which apps are >> "interactive" and thus need low latency ... shouldn't we be giving a boost >> to their RAM pages as well, and favour keeping those paged in over other >> pages (whether other apps, or cache) logically? It's all latency still ... > > I like this idea. Maybe make it more general though--tasks with high scheduler priority also get more of a memory priority boost. This will factor in the static priority as well as the interactivity bonus. Yeah, see also my other mail in that thread - if we moved to file-object (address_space) and task anon (mm) based tracking, it should be much easier. Also fits in nicely with Hugh's anon_mm code. M. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:00 ` Andrew Morton 2004-04-29 1:24 ` Jeff Garzik @ 2004-04-29 1:30 ` Paul Mackerras 2004-04-29 1:31 ` Paul Mackerras 2004-04-29 1:53 ` Andrew Morton 2004-04-29 1:46 ` Rik van Riel 2 siblings, 2 replies; 128+ messages in thread From: Paul Mackerras @ 2004-04-29 1:30 UTC (permalink / raw) To: Andrew Morton; +Cc: brettspamacct, jgarzik, linux-kernel Andrew Morton writes: > My point is that decreasing the tendency of the kernel to swap stuff out is > wrong. You really don't want hundreds of megabytes of BloatyApp's > untouched memory floating about in the machine. Get it out on the disk, > use the memory for something useful. What I have noticed with 2.6.6-rc1 on my dual G5 is that if I rsync a gigabyte or so of data over to another machine, it then takes several seconds to change focus from one window to another. I can see it slowly redraw the window title bars. It looks like the window manager is getting swapped/paged out. This machine has 2.5GB of ram, so I really don't see why it would need to swap at all. There should be plenty of page cache pages that are clean and not in use by any process that could be discarded. It seems like as soon as there is any memory shortage at all it picks on the window manager and chucks out all its pages. :( Paul. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:30 ` Paul Mackerras @ 2004-04-29 1:31 ` Paul Mackerras 2004-04-29 1:53 ` Andrew Morton 1 sibling, 0 replies; 128+ messages in thread From: Paul Mackerras @ 2004-04-29 1:31 UTC (permalink / raw) To: Andrew Morton; +Cc: brettspamacct, jgarzik, linux-kernel I wrote: > What I have noticed with 2.6.6-rc1 on my dual G5 is that if I rsync a > gigabyte or so of data over to another machine, it then takes several > seconds to change focus from one window to another. I can see it > slowly redraw the window title bars. It looks like the window manager > is getting swapped/paged out. I meant to add that this is with swappiness = 60. Paul. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:30 ` Paul Mackerras 2004-04-29 1:31 ` Paul Mackerras @ 2004-04-29 1:53 ` Andrew Morton 2004-04-29 2:40 ` Andrew Morton 2004-04-29 3:57 ` Nick Piggin 1 sibling, 2 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 1:53 UTC (permalink / raw) To: Paul Mackerras; +Cc: brettspamacct, jgarzik, linux-kernel Paul Mackerras <paulus@samba.org> wrote: > > Andrew Morton writes: > > > My point is that decreasing the tendency of the kernel to swap stuff out is > > wrong. You really don't want hundreds of megabytes of BloatyApp's > > untouched memory floating about in the machine. Get it out on the disk, > > use the memory for something useful. > > What I have noticed with 2.6.6-rc1 on my dual G5 is that if I rsync a > gigabyte or so of data over to another machine, it then takes several > seconds to change focus from one window to another. I can see it > slowly redraw the window title bars. It looks like the window manager > is getting swapped/paged out. > > This machine has 2.5GB of ram, so I really don't see why it would need > to swap at all. There should be plenty of page cache pages that are > clean and not in use by any process that could be discarded. It seems > like as soon as there is any memory shortage at all it picks on the > window manager and chucks out all its pages. :( > I suspect rsync is taking two passes across the source files for its checksumming thing. If so, this will defeat the pagecache use-once logic. The kernel sees the second touch of the pages and assumes that there will be a third touch. I use scp ;) ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:53 ` Andrew Morton @ 2004-04-29 2:40 ` Andrew Morton 2004-04-29 2:58 ` Paul Mackerras 2004-04-29 16:50 ` Martin J. Bligh 2004-04-29 3:57 ` Nick Piggin 1 sibling, 2 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 2:40 UTC (permalink / raw) To: paulus, brettspamacct, jgarzik, linux-kernel Andrew Morton <akpm@osdl.org> wrote: > > Paul Mackerras <paulus@samba.org> wrote: > > > ... > > What I have noticed with 2.6.6-rc1 on my dual G5 is that if I rsync a > > gigabyte or so of data over to another machine, it then takes several > > seconds to change focus from one window to another. I can see it > > slowly redraw the window title bars. It looks like the window manager > > is getting swapped/paged out. > > > > This machine has 2.5GB of ram, so I really don't see why it would need > > to swap at all. There should be plenty of page cache pages that are > > clean and not in use by any process that could be discarded. It seems > > like as soon as there is any memory shortage at all it picks on the > > window manager and chucks out all its pages. :( > > > > I suspect rsync is taking two passes across the source files for its > checksumming thing. If so, this will defeat the pagecache use-once logic. > The kernel sees the second touch of the pages and assumes that there will > be a third touch. OK, a bit of fiddling does indicate that if a file is present on both client and server, and is modified on the client, the rsync client will indeed touch the pagecache pages twice. Does this describe the files which you're copying at all? One thing you could do is to run `watch -n1 cat /proc/meminfo'. Cause lots of memory to be freed up then do the copy. Monitor the size of the active and inactive lists. If the active list is growing then we know that rsync is touching pages twice. That would be an unfortunate special-case. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 2:40 ` Andrew Morton @ 2004-04-29 2:58 ` Paul Mackerras 2004-04-29 3:09 ` Andrew Morton ` (2 more replies) 2004-04-29 16:50 ` Martin J. Bligh 1 sibling, 3 replies; 128+ messages in thread From: Paul Mackerras @ 2004-04-29 2:58 UTC (permalink / raw) To: Andrew Morton; +Cc: brettspamacct, jgarzik, linux-kernel Andrew Morton writes: > OK, a bit of fiddling does indicate that if a file is present on both > client and server, and is modified on the client, the rsync client will > indeed touch the pagecache pages twice. Does this describe the files which > you're copying at all? The client/server thing is a bit misleading, what matters is the direction of the transfer. In the case I saw this morning, the G5 was the sender. In any case I was using the -W switch, which tells it not to use the rsync algorithm but just transfer the whole file. So I believe that rsync on the G5 side was just reading the file through once. I have also noticed similar behaviour after doing a bk pull on a kernel tree. The really strange thing is that the behaviour seems to get worse the more RAM you have. I haven't noticed any problem at all on my laptop with 768MB, only on the G5, which has 2.5GB. (The laptop is still on 2.6.2-rc3 though, so I will try a newer kernel on it.) Regards, Paul. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 2:58 ` Paul Mackerras @ 2004-04-29 3:09 ` Andrew Morton 2004-04-29 3:14 ` William Lee Irwin III 2004-04-29 6:12 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 3:09 UTC (permalink / raw) To: Paul Mackerras; +Cc: brettspamacct, jgarzik, linux-kernel Paul Mackerras <paulus@samba.org> wrote: > > Andrew Morton writes: > > > OK, a bit of fiddling does indicate that if a file is present on both > > client and server, and is modified on the client, the rsync client will > > indeed touch the pagecache pages twice. Does this describe the files which > > you're copying at all? > > The client/server thing is a bit misleading, what matters is the > direction of the transfer. In the case I saw this morning, the G5 was > the sender. In any case I was using the -W switch, which tells it not > to use the rsync algorithm but just transfer the whole file. So I > believe that rsync on the G5 side was just reading the file through > once. > > I have also noticed similar behaviour after doing a bk pull on a > kernel tree. > > The really strange thing is that the behaviour seems to get worse the > more RAM you have. I haven't noticed any problem at all on my laptop > with 768MB, only on the G5, which has 2.5GB. (The laptop is still on > 2.6.2-rc3 though, so I will try a newer kernel on it.) > Is the laptop x86 or ppc? IIRC there were problems with the pte-referenced handling on ppc? Or was it ppc64? It shouldn't make any difference in this case I guess. To investigate this sort of thing you're better off using just a local `dd' to ascertain the pattern which is causing the problem. Keep things simple. What happens if you do a 4G writeout with dd? Is there any swapout? There shouldn't be much at all. If the big dd indeed does not cause swapout, then what is different about rsync? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 2:58 ` Paul Mackerras 2004-04-29 3:09 ` Andrew Morton @ 2004-04-29 3:14 ` William Lee Irwin III 2004-04-29 6:12 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 128+ messages in thread From: William Lee Irwin III @ 2004-04-29 3:14 UTC (permalink / raw) To: Paul Mackerras; +Cc: Andrew Morton, brettspamacct, jgarzik, linux-kernel On Thu, Apr 29, 2004 at 12:58:13PM +1000, Paul Mackerras wrote: > The client/server thing is a bit misleading, what matters is the > direction of the transfer. In the case I saw this morning, the G5 was > the sender. In any case I was using the -W switch, which tells it not > to use the rsync algorithm but just transfer the whole file. So I > believe that rsync on the G5 side was just reading the file through > once. > I have also noticed similar behaviour after doing a bk pull on a > kernel tree. > The really strange thing is that the behaviour seems to get worse the > more RAM you have. I haven't noticed any problem at all on my laptop > with 768MB, only on the G5, which has 2.5GB. (The laptop is still on > 2.6.2-rc3 though, so I will try a newer kernel on it.) Looks like you've got a system with an issue. Any chance you could send logs from an instrumented test run? Thanks. -- wli ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 2:58 ` Paul Mackerras 2004-04-29 3:09 ` Andrew Morton 2004-04-29 3:14 ` William Lee Irwin III @ 2004-04-29 6:12 ` Benjamin Herrenschmidt 2004-04-29 6:22 ` Andrew Morton 2004-04-29 6:31 ` William Lee Irwin III 2 siblings, 2 replies; 128+ messages in thread From: Benjamin Herrenschmidt @ 2004-04-29 6:12 UTC (permalink / raw) To: Paul Mackerras Cc: Andrew Morton, brettspamacct, Jeff Garzik, Linux Kernel list > The really strange thing is that the behaviour seems to get worse the > more RAM you have. I haven't noticed any problem at all on my laptop > with 768MB, only on the G5, which has 2.5GB. (The laptop is still on > 2.6.2-rc3 though, so I will try a newer kernel on it.) Your G5 also has a 2Gb IO hole in the middle of zone DMA, it's possible that the accounting doesn't work properly. Ben. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 6:12 ` Benjamin Herrenschmidt @ 2004-04-29 6:22 ` Andrew Morton 2004-04-29 6:25 ` Benjamin Herrenschmidt 2004-04-29 6:31 ` William Lee Irwin III 1 sibling, 1 reply; 128+ messages in thread From: Andrew Morton @ 2004-04-29 6:22 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: paulus, brettspamacct, jgarzik, linux-kernel Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > > > The really strange thing is that the behaviour seems to get worse the > > more RAM you have. I haven't noticed any problem at all on my laptop > > with 768MB, only on the G5, which has 2.5GB. (The laptop is still on > > 2.6.2-rc3 though, so I will try a newer kernel on it.) > > Your G5 also has a 2Gb IO hole in the middle of zone DMA, it's possible > that the accounting doesn't work properly. heh. It should have zone->spanned_pages - zone->present_pages = 2G. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 6:22 ` Andrew Morton @ 2004-04-29 6:25 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 128+ messages in thread From: Benjamin Herrenschmidt @ 2004-04-29 6:25 UTC (permalink / raw) To: Andrew Morton Cc: Paul Mackerras, brettspamacct, Jeff Garzik, Linux Kernel list On Thu, 2004-04-29 at 16:22, Andrew Morton wrote: > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > > > > > > The really strange thing is that the behaviour seems to get worse the > > > more RAM you have. I haven't noticed any problem at all on my laptop > > > with 768MB, only on the G5, which has 2.5GB. (The laptop is still on > > > 2.6.2-rc3 though, so I will try a newer kernel on it.) > > > > Your G5 also has a 2Gb IO hole in the middle of zone DMA, it's possible > > that the accounting doesn't work properly. > > heh. It should have zone->spanned_pages - zone->present_pages = 2G. That should be fine, I'll check later, I can't reboot mine right now. I'm initializing the zone with free_area_init_node() and I _am_ passing the hole size. Paul, also check if you have NUMA enabled in .config, it changes the way zones are initialized, I may have gotten that case wrong. Ben. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 6:12 ` Benjamin Herrenschmidt 2004-04-29 6:22 ` Andrew Morton @ 2004-04-29 6:31 ` William Lee Irwin III 1 sibling, 0 replies; 128+ messages in thread From: William Lee Irwin III @ 2004-04-29 6:31 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Paul Mackerras, Andrew Morton, brettspamacct, Jeff Garzik, Linux Kernel list At some point in the past, I wrote: >> The really strange thing is that the behaviour seems to get worse the >> more RAM you have. I haven't noticed any problem at all on my laptop >> with 768MB, only on the G5, which has 2.5GB. (The laptop is still on >> 2.6.2-rc3 though, so I will try a newer kernel on it.) On Thu, Apr 29, 2004 at 04:12:38PM +1000, Benjamin Herrenschmidt wrote: > Your G5 also has a 2Gb IO hole in the middle of zone DMA, it's possible > that the accounting doesn't work properly. Hmm, ->present_pages vs. ->spanned_pages distinction(s) should cover this, or should have at one point. How are those being set at the moment? -- wli ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 2:40 ` Andrew Morton 2004-04-29 2:58 ` Paul Mackerras @ 2004-04-29 16:50 ` Martin J. Bligh 1 sibling, 0 replies; 128+ messages in thread From: Martin J. Bligh @ 2004-04-29 16:50 UTC (permalink / raw) To: Andrew Morton, paulus, brettspamacct, jgarzik, linux-kernel >> I suspect rsync is taking two passes across the source files for its >> checksumming thing. If so, this will defeat the pagecache use-once logic. >> The kernel sees the second touch of the pages and assumes that there will >> be a third touch. > > OK, a bit of fiddling does indicate that if a file is present on both > client and server, and is modified on the client, the rsync client will > indeed touch the pagecache pages twice. Does this describe the files which > you're copying at all? > > One thing you could do is to run `watch -n1 cat /proc/meminfo'. Cause lots > of memory to be freed up then do the copy. Monitor the size of the active > and inactive lists. If the active list is growing then we know that rsync > is touching pages twice. > > That would be an unfortunate special-case. Personally, I think that the use-twice logic is a bit of a hack that mostly works. If we moved to a method where we kept an eye on which pages are associated with which address_space (for mapped pages) or which process (for anonymous pages) we'd have a much better shot at stopping any one process / file from monopolizing the whole of system memory. We'd also be able to favour memory for files that are still open over ones that have been closed, and recognize linear access scan patterns per file, and reclaim more agressively from the overscanned areas, and favour higher prio tasks over lower prio ones (including, but not limited to interactive). Global LRU (even with the tweaks it has in Linux) doesn't seem optimal. M. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:53 ` Andrew Morton 2004-04-29 2:40 ` Andrew Morton @ 2004-04-29 3:57 ` Nick Piggin 2004-04-29 14:29 ` Rik van Riel 1 sibling, 1 reply; 128+ messages in thread From: Nick Piggin @ 2004-04-29 3:57 UTC (permalink / raw) To: Andrew Morton; +Cc: Paul Mackerras, brettspamacct, jgarzik, linux-kernel Andrew Morton wrote: > Paul Mackerras <paulus@samba.org> wrote: > >>Andrew Morton writes: >> >> >>>My point is that decreasing the tendency of the kernel to swap stuff out is >>>wrong. You really don't want hundreds of megabytes of BloatyApp's >>>untouched memory floating about in the machine. Get it out on the disk, >>>use the memory for something useful. >> >>What I have noticed with 2.6.6-rc1 on my dual G5 is that if I rsync a >>gigabyte or so of data over to another machine, it then takes several >>seconds to change focus from one window to another. I can see it >>slowly redraw the window title bars. It looks like the window manager >>is getting swapped/paged out. >> >>This machine has 2.5GB of ram, so I really don't see why it would need >>to swap at all. There should be plenty of page cache pages that are >>clean and not in use by any process that could be discarded. It seems >>like as soon as there is any memory shortage at all it picks on the >>window manager and chucks out all its pages. :( >> > > > I suspect rsync is taking two passes across the source files for its > checksumming thing. If so, this will defeat the pagecache use-once logic. > The kernel sees the second touch of the pages and assumes that there will > be a third touch. > I'm not very impressed with the pagecache use-once logic, and I have a patch to remove it completely and treat non-mapped touches (IMO) more sanely. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 3:57 ` Nick Piggin @ 2004-04-29 14:29 ` Rik van Riel 2004-04-30 3:00 ` Nick Piggin 0 siblings, 1 reply; 128+ messages in thread From: Rik van Riel @ 2004-04-29 14:29 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, Paul Mackerras, brettspamacct, jgarzik, linux-kernel On Thu, 29 Apr 2004, Nick Piggin wrote: > I'm not very impressed with the pagecache use-once logic, and I > have a patch to remove it completely and treat non-mapped touches > (IMO) more sanely. The basic idea of use-once isn't bad (search for LIRS and ARC page replacement), however the Linux implementation doesn't have any of the checks and balances that the researched replacement algorithms have... However, adding the checks and balancing required for LIRS, ARC and CAR(S) isn't easy since it requires keeping track of a number of recently evicted pages. That could be quite a bit of infrastructure, though it might be well worth it. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 14:29 ` Rik van Riel @ 2004-04-30 3:00 ` Nick Piggin 2004-04-30 12:50 ` Rik van Riel 0 siblings, 1 reply; 128+ messages in thread From: Nick Piggin @ 2004-04-30 3:00 UTC (permalink / raw) To: Rik van Riel Cc: Andrew Morton, Paul Mackerras, brettspamacct, jgarzik, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1060 bytes --] Rik van Riel wrote: > On Thu, 29 Apr 2004, Nick Piggin wrote: > > >>I'm not very impressed with the pagecache use-once logic, and I >>have a patch to remove it completely and treat non-mapped touches >>(IMO) more sanely. > > > The basic idea of use-once isn't bad (search for LIRS and > ARC page replacement), however the Linux implementation > doesn't have any of the checks and balances that the > researched replacement algorithms have... > > However, adding the checks and balancing required for LIRS, > ARC and CAR(S) isn't easy since it requires keeping track of > a number of recently evicted pages. That could be quite a > bit of infrastructure, though it might be well worth it. > No, use once logic is good in theory I think. Unfortunately our implementation is quite fragile IMO (although it seems to have been "good enough"). This is what I'm currently doing (on top of a couple of other patches, but you get the idea). I should be able to transform it into a proper use-once logic if I pick up Nikita's inactive list second chance bit. [-- Attachment #2: vm-dropbehind.patch --] [-- Type: text/x-patch, Size: 3622 bytes --] Changes mark_page_accessed to only set the PageAccessed bit, and not move pages around the LRUs. This means we don't have to take the lru_lock, and it also makes page ageing and scanning consistient and all handled in mm/vmscan.c include/linux/buffer_head.h | 0 linux-2.6-npiggin/include/linux/swap.h | 5 ++- linux-2.6-npiggin/mm/filemap.c | 6 ---- linux-2.6-npiggin/mm/swap.c | 45 --------------------------------- 4 files changed, 4 insertions(+), 52 deletions(-) diff -puN mm/filemap.c~vm-dropbehind mm/filemap.c --- linux-2.6/mm/filemap.c~vm-dropbehind 2004-04-29 17:31:38.000000000 +1000 +++ linux-2.6-npiggin/mm/filemap.c 2004-04-29 17:31:38.000000000 +1000 @@ -663,11 +663,7 @@ page_ok: if (mapping_writably_mapped(mapping)) flush_dcache_page(page); - /* - * Mark the page accessed if we read the beginning. - */ - if (!offset) - mark_page_accessed(page); + mark_page_accessed(page); /* * Ok, we have the page, and it's up-to-date, so diff -puN mm/swap.c~vm-dropbehind mm/swap.c --- linux-2.6/mm/swap.c~vm-dropbehind 2004-04-29 17:31:38.000000000 +1000 +++ linux-2.6-npiggin/mm/swap.c 2004-04-29 17:31:38.000000000 +1000 @@ -100,51 +100,6 @@ int rotate_reclaimable_page(struct page return 0; } -/* - * FIXME: speed this up? - */ -void fastcall activate_page(struct page *page) -{ - struct zone *zone = page_zone(page); - - spin_lock_irq(&zone->lru_lock); - if (PageLRU(page) - && !PageActiveMapped(page) && !PageActiveUnmapped(page)) { - - del_page_from_inactive_list(zone, page); - - if (page_mapped(page)) { - SetPageActiveMapped(page); - add_page_to_active_mapped_list(zone, page); - } else { - SetPageActiveUnmapped(page); - add_page_to_active_unmapped_list(zone, page); - } - inc_page_state(pgactivate); - } - spin_unlock_irq(&zone->lru_lock); -} - -/* - * Mark a page as having seen activity. - * - * inactive,unreferenced -> inactive,referenced - * inactive,referenced -> active,unreferenced - * active,unreferenced -> active,referenced - */ -void fastcall mark_page_accessed(struct page *page) -{ - if (!PageActiveMapped(page) && !PageActiveUnmapped(page) - && PageReferenced(page) && PageLRU(page)) { - activate_page(page); - ClearPageReferenced(page); - } else if (!PageReferenced(page)) { - SetPageReferenced(page); - } -} - -EXPORT_SYMBOL(mark_page_accessed); - /** * lru_cache_add: add a page to the page lists * @page: the page to add diff -puN include/linux/swap.h~vm-dropbehind include/linux/swap.h --- linux-2.6/include/linux/swap.h~vm-dropbehind 2004-04-29 17:31:38.000000000 +1000 +++ linux-2.6-npiggin/include/linux/swap.h 2004-04-30 12:55:02.000000000 +1000 @@ -165,12 +165,13 @@ extern unsigned int nr_free_pagecache_pa /* linux/mm/swap.c */ extern void FASTCALL(lru_cache_add(struct page *)); extern void FASTCALL(lru_cache_add_active(struct page *)); -extern void FASTCALL(activate_page(struct page *)); -extern void FASTCALL(mark_page_accessed(struct page *)); extern void lru_add_drain(void); extern int rotate_reclaimable_page(struct page *page); extern void swap_setup(void); +/* Mark a page as having seen activity. */ +#define mark_page_accessed(page) SetPageReferenced(page) + /* linux/mm/vmscan.c */ extern int try_to_free_pages(struct zone **, unsigned int, unsigned int); extern int shrink_all_memory(int); diff -puN include/linux/mm_inline.h~vm-dropbehind include/linux/mm_inline.h diff -puN mm/memory.c~vm-dropbehind mm/memory.c diff -puN mm/shmem.c~vm-dropbehind mm/shmem.c diff -puN include/linux/buffer_head.h~vm-dropbehind include/linux/buffer_head.h _ ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 3:00 ` Nick Piggin @ 2004-04-30 12:50 ` Rik van Riel 2004-04-30 13:07 ` Nick Piggin 2004-04-30 13:18 ` Nikita Danilov 0 siblings, 2 replies; 128+ messages in thread From: Rik van Riel @ 2004-04-30 12:50 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, Paul Mackerras, brettspamacct, jgarzik, linux-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 1066 bytes --] On Fri, 30 Apr 2004, Nick Piggin wrote: > Rik van Riel wrote: > > The basic idea of use-once isn't bad (search for LIRS and > > ARC page replacement), however the Linux implementation > > doesn't have any of the checks and balances that the > > researched replacement algorithms have... > No, use once logic is good in theory I think. Unfortunately > our implementation is quite fragile IMO (although it seems > to have been "good enough"). Hey, that's what I said ;)))) > This is what I'm currently doing (on top of a couple of other > patches, but you get the idea). I should be able to transform > it into a proper use-once logic if I pick up Nikita's inactive > list second chance bit. Ummm nope, there just isn't enough info to keep things as balanced as ARC/LIRS/CAR(T) can do. No good way to auto-tune the sizes of the active and inactive lists. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: TEXT/X-PATCH; NAME="vm-dropbehind.patch", Size: 3622 bytes --] Changes mark_page_accessed to only set the PageAccessed bit, and not move pages around the LRUs. This means we don't have to take the lru_lock, and it also makes page ageing and scanning consistient and all handled in mm/vmscan.c include/linux/buffer_head.h | 0 linux-2.6-npiggin/include/linux/swap.h | 5 ++- linux-2.6-npiggin/mm/filemap.c | 6 ---- linux-2.6-npiggin/mm/swap.c | 45 --------------------------------- 4 files changed, 4 insertions(+), 52 deletions(-) diff -puN mm/filemap.c~vm-dropbehind mm/filemap.c --- linux-2.6/mm/filemap.c~vm-dropbehind 2004-04-29 17:31:38.000000000 +1000 +++ linux-2.6-npiggin/mm/filemap.c 2004-04-29 17:31:38.000000000 +1000 @@ -663,11 +663,7 @@ page_ok: if (mapping_writably_mapped(mapping)) flush_dcache_page(page); - /* - * Mark the page accessed if we read the beginning. - */ - if (!offset) - mark_page_accessed(page); + mark_page_accessed(page); /* * Ok, we have the page, and it's up-to-date, so diff -puN mm/swap.c~vm-dropbehind mm/swap.c --- linux-2.6/mm/swap.c~vm-dropbehind 2004-04-29 17:31:38.000000000 +1000 +++ linux-2.6-npiggin/mm/swap.c 2004-04-29 17:31:38.000000000 +1000 @@ -100,51 +100,6 @@ int rotate_reclaimable_page(struct page return 0; } -/* - * FIXME: speed this up? - */ -void fastcall activate_page(struct page *page) -{ - struct zone *zone = page_zone(page); - - spin_lock_irq(&zone->lru_lock); - if (PageLRU(page) - && !PageActiveMapped(page) && !PageActiveUnmapped(page)) { - - del_page_from_inactive_list(zone, page); - - if (page_mapped(page)) { - SetPageActiveMapped(page); - add_page_to_active_mapped_list(zone, page); - } else { - SetPageActiveUnmapped(page); - add_page_to_active_unmapped_list(zone, page); - } - inc_page_state(pgactivate); - } - spin_unlock_irq(&zone->lru_lock); -} - -/* - * Mark a page as having seen activity. - * - * inactive,unreferenced -> inactive,referenced - * inactive,referenced -> active,unreferenced - * active,unreferenced -> active,referenced - */ -void fastcall mark_page_accessed(struct page *page) -{ - if (!PageActiveMapped(page) && !PageActiveUnmapped(page) - && PageReferenced(page) && PageLRU(page)) { - activate_page(page); - ClearPageReferenced(page); - } else if (!PageReferenced(page)) { - SetPageReferenced(page); - } -} - -EXPORT_SYMBOL(mark_page_accessed); - /** * lru_cache_add: add a page to the page lists * @page: the page to add diff -puN include/linux/swap.h~vm-dropbehind include/linux/swap.h --- linux-2.6/include/linux/swap.h~vm-dropbehind 2004-04-29 17:31:38.000000000 +1000 +++ linux-2.6-npiggin/include/linux/swap.h 2004-04-30 12:55:02.000000000 +1000 @@ -165,12 +165,13 @@ extern unsigned int nr_free_pagecache_pa /* linux/mm/swap.c */ extern void FASTCALL(lru_cache_add(struct page *)); extern void FASTCALL(lru_cache_add_active(struct page *)); -extern void FASTCALL(activate_page(struct page *)); -extern void FASTCALL(mark_page_accessed(struct page *)); extern void lru_add_drain(void); extern int rotate_reclaimable_page(struct page *page); extern void swap_setup(void); +/* Mark a page as having seen activity. */ +#define mark_page_accessed(page) SetPageReferenced(page) + /* linux/mm/vmscan.c */ extern int try_to_free_pages(struct zone **, unsigned int, unsigned int); extern int shrink_all_memory(int); diff -puN include/linux/mm_inline.h~vm-dropbehind include/linux/mm_inline.h diff -puN mm/memory.c~vm-dropbehind mm/memory.c diff -puN mm/shmem.c~vm-dropbehind mm/shmem.c diff -puN include/linux/buffer_head.h~vm-dropbehind include/linux/buffer_head.h _ ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 12:50 ` Rik van Riel @ 2004-04-30 13:07 ` Nick Piggin 2004-04-30 13:18 ` Nikita Danilov 1 sibling, 0 replies; 128+ messages in thread From: Nick Piggin @ 2004-04-30 13:07 UTC (permalink / raw) To: Rik van Riel Cc: Andrew Morton, Paul Mackerras, brettspamacct, jgarzik, linux-kernel Rik van Riel wrote: > On Fri, 30 Apr 2004, Nick Piggin wrote: > >> Rik van Riel wrote: > > >> > The basic idea of use-once isn't bad (search for LIRS and >> > ARC page replacement), however the Linux implementation >> > doesn't have any of the checks and balances that the >> > researched replacement algorithms have... > > >> No, use once logic is good in theory I think. Unfortunately >> our implementation is quite fragile IMO (although it seems >> to have been "good enough"). > > > Hey, that's what I said ;)))) > Yes. I just thought you might have misunderstood me to think use once is no good at all. >> This is what I'm currently doing (on top of a couple of other >> patches, but you get the idea). I should be able to transform >> it into a proper use-once logic if I pick up Nikita's inactive >> list second chance bit. > > > Ummm nope, there just isn't enough info to keep things > as balanced as ARC/LIRS/CAR(T) can do. No good way to > auto-tune the sizes of the active and inactive lists. > I think perhaps it might be possible. I don't want to discourage you from looking into more interesting replacement schemes though. I don't doubt that our basic replacement can often be suboptimal ;) ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 12:50 ` Rik van Riel 2004-04-30 13:07 ` Nick Piggin @ 2004-04-30 13:18 ` Nikita Danilov 2004-04-30 13:39 ` Nick Piggin 1 sibling, 1 reply; 128+ messages in thread From: Nikita Danilov @ 2004-04-30 13:18 UTC (permalink / raw) To: Rik van Riel Cc: Nick Piggin, Andrew Morton, Paul Mackerras, brettspamacct, jgarzik, linux-kernel Rik van Riel writes: > On Fri, 30 Apr 2004, Nick Piggin wrote: > > Rik van Riel wrote: > > > > The basic idea of use-once isn't bad (search for LIRS and > > > ARC page replacement), however the Linux implementation > > > doesn't have any of the checks and balances that the > > > researched replacement algorithms have... > > > No, use once logic is good in theory I think. Unfortunately > > our implementation is quite fragile IMO (although it seems > > to have been "good enough"). > > Hey, that's what I said ;)))) > > > This is what I'm currently doing (on top of a couple of other > > patches, but you get the idea). I should be able to transform > > it into a proper use-once logic if I pick up Nikita's inactive > > list second chance bit. > > Ummm nope, there just isn't enough info to keep things > as balanced as ARC/LIRS/CAR(T) can do. No good way to > auto-tune the sizes of the active and inactive lists. While keeping "history" for non-resident pages is very good from many points of view (provides infrastructure for local replacement and working set tuning, for example) and in the long term, current scanner can still be improved somewhat. Here are results that I obtained some time ago. Test is to concurrently clone (bk) and build (make -jN) kernel source in M directories. For N = M = 11, TIMEFORMAT='%3R %3S %3U' REAL SYS USER "stock" 3818.320 568.999 4358.460 transfer-dirty-on-refill 3368.690 569.066 4377.845 check-PageSwapCache-after-add-to-swap 3237.632 576.208 4381.248 dont-unmap-on-pageout 3207.522 566.539 4374.504 async-writepage 3115.338 562.702 4325.212 (check-PageSwapCache-after-add-to-swap was added to mainline since them.) These patches weren't updated for some time. Last version is at ftp://ftp.namesys.com/pub/misc-patches/unsupported/extra/2004.03.25-2.6.5-rc2 [from Nick Piggin's patch] > > Changes mark_page_accessed to only set the PageAccessed bit, and > not move pages around the LRUs. This means we don't have to take > the lru_lock, and it also makes page ageing and scanning consistient > and all handled in mm/vmscan.c By the way, batch-mark_page_accessed patch at the URL above also tries to reduce lock contention in mark_page_accessed(), but through more standard approach of batching target pages in per-cpu pvec. Nikita. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-30 13:18 ` Nikita Danilov @ 2004-04-30 13:39 ` Nick Piggin 0 siblings, 0 replies; 128+ messages in thread From: Nick Piggin @ 2004-04-30 13:39 UTC (permalink / raw) To: Nikita Danilov Cc: Rik van Riel, Andrew Morton, Paul Mackerras, brettspamacct, jgarzik, linux-kernel Nikita Danilov wrote: > Here are results that I obtained some time ago. Test is to concurrently > clone (bk) and build (make -jN) kernel source in M directories. > > For N = M = 11, TIMEFORMAT='%3R %3S %3U' > > REAL SYS USER > "stock" 3818.320 568.999 4358.460 > transfer-dirty-on-refill 3368.690 569.066 4377.845 > check-PageSwapCache-after-add-to-swap 3237.632 576.208 4381.248 > dont-unmap-on-pageout 3207.522 566.539 4374.504 > async-writepage 3115.338 562.702 4325.212 > I like your transfer-dirty-on-refill change. It is definitely worthwhile to mark a page as dirty when it drops off the active list in order to hopefully get it written before it reaches the tail of the inactive list. > (check-PageSwapCache-after-add-to-swap was added to mainline since them.) > > These patches weren't updated for some time. Last version is at > ftp://ftp.namesys.com/pub/misc-patches/unsupported/extra/2004.03.25-2.6.5-rc2 > > [from Nick Piggin's patch] > > > > Changes mark_page_accessed to only set the PageAccessed bit, and > > not move pages around the LRUs. This means we don't have to take > > the lru_lock, and it also makes page ageing and scanning consistient > > and all handled in mm/vmscan.c > > By the way, batch-mark_page_accessed patch at the URL above also tries > to reduce lock contention in mark_page_accessed(), but through more > standard approach of batching target pages in per-cpu pvec. > This is a good patch too if mark_page_accessed is required to take the lock (which it currently is, of course). ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:00 ` Andrew Morton 2004-04-29 1:24 ` Jeff Garzik 2004-04-29 1:30 ` Paul Mackerras @ 2004-04-29 1:46 ` Rik van Riel 2004-04-29 1:57 ` Andrew Morton 2 siblings, 1 reply; 128+ messages in thread From: Rik van Riel @ 2004-04-29 1:46 UTC (permalink / raw) To: Andrew Morton; +Cc: brettspamacct, jgarzik, linux-kernel On Wed, 28 Apr 2004, Andrew Morton wrote: > You really don't want hundreds of megabytes of BloatyApp's untouched > memory floating about in the machine. But people do. The point here is LATENCY, when a user comes back from lunch and continues typing in OpenOffice, his system should behave just like he left it. Making the user have very bad interactivity for the first minute or so is a Bad Thing, even if the computer did run more efficiently while the user wasn't around to notice... IMHO, the VM on a desktop system really should be optimised to have the best interactive behaviour, meaning decent latency when switching applications. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:46 ` Rik van Riel @ 2004-04-29 1:57 ` Andrew Morton 2004-04-29 2:29 ` Marc Singer 2004-04-29 2:41 ` Rik van Riel 0 siblings, 2 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 1:57 UTC (permalink / raw) To: Rik van Riel; +Cc: brettspamacct, jgarzik, linux-kernel Rik van Riel <riel@redhat.com> wrote: > > IMHO, the VM on a desktop system really should be optimised to > have the best interactive behaviour, meaning decent latency > when switching applications. I'm gonna stick my fingers in my ears and sing "la la la" until people tell me "I set swappiness to zero and it didn't do what I wanted it to do". ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:57 ` Andrew Morton @ 2004-04-29 2:29 ` Marc Singer 2004-04-29 2:35 ` Andrew Morton 2004-04-29 2:41 ` Rik van Riel 1 sibling, 1 reply; 128+ messages in thread From: Marc Singer @ 2004-04-29 2:29 UTC (permalink / raw) To: Andrew Morton; +Cc: Rik van Riel, brettspamacct, jgarzik, linux-kernel On Wed, Apr 28, 2004 at 06:57:20PM -0700, Andrew Morton wrote: > Rik van Riel <riel@redhat.com> wrote: > > > > IMHO, the VM on a desktop system really should be optimised to > > have the best interactive behaviour, meaning decent latency > > when switching applications. > > I'm gonna stick my fingers in my ears and sing "la la la" until people tell > me "I set swappiness to zero and it didn't do what I wanted it to do". It does, but it's a bit too coarse of a solution. It just means that the page cache always loses. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 2:29 ` Marc Singer @ 2004-04-29 2:35 ` Andrew Morton 2004-04-29 3:10 ` Marc Singer 0 siblings, 1 reply; 128+ messages in thread From: Andrew Morton @ 2004-04-29 2:35 UTC (permalink / raw) To: Marc Singer; +Cc: riel, brettspamacct, jgarzik, linux-kernel Marc Singer <elf@buici.com> wrote: > > On Wed, Apr 28, 2004 at 06:57:20PM -0700, Andrew Morton wrote: > > Rik van Riel <riel@redhat.com> wrote: > > > > > > IMHO, the VM on a desktop system really should be optimised to > > > have the best interactive behaviour, meaning decent latency > > > when switching applications. > > > > I'm gonna stick my fingers in my ears and sing "la la la" until people tell > > me "I set swappiness to zero and it didn't do what I wanted it to do". > > It does, but it's a bit too coarse of a solution. It just means that > the page cache always loses. That's what people have been asking for. What are you suggesting should happen instead? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 2:35 ` Andrew Morton @ 2004-04-29 3:10 ` Marc Singer 2004-04-29 3:19 ` Andrew Morton 2004-04-29 8:02 ` Wichert Akkerman 0 siblings, 2 replies; 128+ messages in thread From: Marc Singer @ 2004-04-29 3:10 UTC (permalink / raw) To: Andrew Morton; +Cc: riel, brettspamacct, jgarzik, linux-kernel On Wed, Apr 28, 2004 at 07:35:41PM -0700, Andrew Morton wrote: > Marc Singer <elf@buici.com> wrote: > > > > On Wed, Apr 28, 2004 at 06:57:20PM -0700, Andrew Morton wrote: > > > Rik van Riel <riel@redhat.com> wrote: > > > > > > > > IMHO, the VM on a desktop system really should be optimised to > > > > have the best interactive behaviour, meaning decent latency > > > > when switching applications. > > > > > > I'm gonna stick my fingers in my ears and sing "la la la" until people tell > > > me "I set swappiness to zero and it didn't do what I wanted it to do". > > > > It does, but it's a bit too coarse of a solution. It just means that > > the page cache always loses. > > That's what people have been asking for. What are you suggesting should > happen instead? I'm thinking that the problem is that the page cache is greedier that most people expect. For example, if I could hold the page cache to be under a specific size, then I could do some performance measurements. E.g, compile kernel with a 768K page cache, 512K, 256K and 128K. On a machine with loads of RAM, where's the optimal page cache size? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 3:10 ` Marc Singer @ 2004-04-29 3:19 ` Andrew Morton 2004-04-29 4:13 ` Marc Singer 2004-04-29 16:51 ` Andy Isaacson 2004-04-29 8:02 ` Wichert Akkerman 1 sibling, 2 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 3:19 UTC (permalink / raw) To: Marc Singer; +Cc: riel, brettspamacct, jgarzik, linux-kernel Marc Singer <elf@buici.com> wrote: > > > That's what people have been asking for. What are you suggesting should > > happen instead? > > I'm thinking that the problem is that the page cache is greedier that > most people expect. For example, if I could hold the page cache to be > under a specific size, then I could do some performance measurements. > E.g, compile kernel with a 768K page cache, 512K, 256K and 128K. On a > machine with loads of RAM, where's the optimal page cache size? Nope, there's no point in leaving free memory floating about when the kernel can and will reclaim clean pagecache on demand. What you discuss above is just an implementation detail. Forget it. What are the requirements? Thus far I've seen a) updatedb causes cache reclaim b) updatedb causes swapout c) prefer that openoffice/mozilla not get paged out when there's heavy pagecache demand. For a) we don't really have a solution. Some have been proposed but they could have serious downsides. For b) and c) we can tune the pageout-vs-cache reclaim tendency with /proc/sys/vm/swappiness, only nobody seems to know that. What else is there? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 3:19 ` Andrew Morton @ 2004-04-29 4:13 ` Marc Singer 2004-04-29 4:33 ` Andrew Morton 2004-04-29 16:51 ` Andy Isaacson 1 sibling, 1 reply; 128+ messages in thread From: Marc Singer @ 2004-04-29 4:13 UTC (permalink / raw) To: Andrew Morton; +Cc: riel, brettspamacct, jgarzik, linux-kernel On Wed, Apr 28, 2004 at 08:19:24PM -0700, Andrew Morton wrote: > Marc Singer <elf@buici.com> wrote: > > > > > That's what people have been asking for. What are you suggesting should > > > happen instead? > > > > I'm thinking that the problem is that the page cache is greedier that > > most people expect. For example, if I could hold the page cache to be > > under a specific size, then I could do some performance measurements. > > E.g, compile kernel with a 768K page cache, 512K, 256K and 128K. On a > > machine with loads of RAM, where's the optimal page cache size? > > Nope, there's no point in leaving free memory floating about when the > kernel can and will reclaim clean pagecache on demand. It could work differently from that. For example, if we had 500M total, we map 200M, then we do 400M of IO. Perhaps we'd like to be able to say that a 400M page cache is too big. The problem isn't about reclaiming pagecache it's about the cost of swapping pages back in. The page cache can tend to favor swapping mapped pages over reclaiming it's own pages that are less likely to be used. Of course, it doesn't know that...which is the rub. If I thought I had an method for doing this, I'd write code to try it out. > What you discuss above is just an implementation detail. Forget it. What > are the requirements? Thus far I've seen The requirement is that we'd like to see pages aged more gracefully. A mapped page that is used continuously for ten minutes and then left to idle for 10 minutes is more valuable than an IO page that was read once and then not used for ten minutes. As the mapped page ages, it's value decays. > a) updatedb causes cache reclaim > > b) updatedb causes swapout > > c) prefer that openoffice/mozilla not get paged out when there's heavy > pagecache demand. > > For a) we don't really have a solution. Some have been proposed but they > could have serious downsides. > > For b) and c) we can tune the pageout-vs-cache reclaim tendency with > /proc/sys/vm/swappiness, only nobody seems to know that. I've read the source for where swappiness comes into play. Yet I cannot make a statement about what it means. Can you? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 4:13 ` Marc Singer @ 2004-04-29 4:33 ` Andrew Morton 2004-04-29 14:45 ` Marc Singer 0 siblings, 1 reply; 128+ messages in thread From: Andrew Morton @ 2004-04-29 4:33 UTC (permalink / raw) To: Marc Singer; +Cc: riel, brettspamacct, jgarzik, linux-kernel Marc Singer <elf@buici.com> wrote: > > It could work differently from that. For example, if we had 500M > total, we map 200M, then we do 400M of IO. Perhaps we'd like to be > able to say that a 400M page cache is too big. Try it - you'll find that the system will leave all of your 200M of mapped memory in place. You'll be left with 300M of pagecache from that I/O activity. There may be a small amount of unmapping activity if the I/O is a write, or if the system has a small highmem zone. Maybe. Beware that both ARM and NFS seem to be doing odd things, so try it on a PC+disk first ;) > The problem isn't > about reclaiming pagecache it's about the cost of swapping pages back > in. The page cache can tend to favor swapping mapped pages over > reclaiming it's own pages that are less likely to be used. Of course, > it doesn't know that...which is the rub. No, the system will only start to unmap pages if reclaim of unmapped pagecache is getting into difficulty. The threshold of "getting into difficulty" is controlled by /proc/sys/vm/swappiness. > The requirement is that we'd like to see pages aged more gracefully. > A mapped page that is used continuously for ten minutes and then left > to idle for 10 minutes is more valuable than an IO page that was read > once and then not used for ten minutes. As the mapped page ages, it's > value decays. yes, remembering aging info over that period of time is hard. We only have six levels of aging: referenced+active, unreferenced+active, referenced+inactive,unreferenced+inactive, plus position-on-lru*2. > I've read the source for where swappiness comes into play. Yet I > cannot make a statement about what it means. Can you? It controls the level of page reclaim distress at which we decide to start reclaiming mapped pages. We prefer to reclaim pagecache, but we have to start swapping at *some* level of reclaim failure. swappiness sets that level, in rather vague units. It might make sense to recast swappiness in terms of pages_reclaimed/pages_scanned, which is the real metric of page reclaim distress. But that would only affect the meaning of the actual number - it wouldn't change the tunable's effect on the system. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 4:33 ` Andrew Morton @ 2004-04-29 14:45 ` Marc Singer 0 siblings, 0 replies; 128+ messages in thread From: Marc Singer @ 2004-04-29 14:45 UTC (permalink / raw) To: Andrew Morton; +Cc: riel, brettspamacct, jgarzik, linux-kernel On Wed, Apr 28, 2004 at 09:33:59PM -0700, Andrew Morton wrote: > Marc Singer <elf@buici.com> wrote: > > > > It could work differently from that. For example, if we had 500M > > total, we map 200M, then we do 400M of IO. Perhaps we'd like to be > > able to say that a 400M page cache is too big. > > Try it - you'll find that the system will leave all of your 200M of mapped > memory in place. You'll be left with 300M of pagecache from that I/O > activity. There may be a small amount of unmapping activity if the I/O is > a write, or if the system has a small highmem zone. Maybe. Are you sure? Isn't that what the other posters are winging about? They do lots of IO and then they have to wait for the system to page Mozilla back in. > Beware that both ARM and NFS seem to be doing odd things, so try it on a > PC+disk first ;) Yeah, I know that there is still something odd in ARM-land. I assume that the other posters are using IA32. > No, the system will only start to unmap pages if reclaim of unmapped > pagecache is getting into difficulty. The threshold of "getting into > difficulty" is controlled by /proc/sys/vm/swappiness. What constitutes 'difficulty'? Perhaps this is rhetorical. > > I've read the source for where swappiness comes into play. Yet I > > cannot make a statement about what it means. Can you? > > It controls the level of page reclaim distress at which we decide to start > reclaiming mapped pages. > > We prefer to reclaim pagecache, but we have to start swapping at *some* > level of reclaim failure. swappiness sets that level, in rather vague > units. I'm not sure I see why we have to swap. If have of memory is mapped, and the user is using those pages with some frequency, perhaps we should never reclaim mapped pages. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 3:19 ` Andrew Morton 2004-04-29 4:13 ` Marc Singer @ 2004-04-29 16:51 ` Andy Isaacson 2004-04-29 20:42 ` Andrew Morton 2004-04-30 0:14 ` Lincoln Dale 1 sibling, 2 replies; 128+ messages in thread From: Andy Isaacson @ 2004-04-29 16:51 UTC (permalink / raw) To: Andrew Morton; +Cc: Marc Singer, riel, brettspamacct, jgarzik, linux-kernel On Wed, Apr 28, 2004 at 08:19:24PM -0700, Andrew Morton wrote: > What you discuss above is just an implementation detail. Forget it. What > are the requirements? Thus far I've seen > > a) updatedb causes cache reclaim > > b) updatedb causes swapout > > c) prefer that openoffice/mozilla not get paged out when there's heavy > pagecache demand. > > For a) we don't really have a solution. Some have been proposed but they > could have serious downsides. > > For b) and c) we can tune the pageout-vs-cache reclaim tendency with > /proc/sys/vm/swappiness, only nobody seems to know that. > > What else is there? What I want is for purely sequential workloads which far exceed cache size (dd, updatedb, tar czf /backup/home.nightly.tar.gz /home) to avoid thrashing my entire desktop out of memory. I DON'T CARE if the tar completed in 45 minutes rather than 80. (It wouldn't, anyways, because it only needs about 5 MB of cache to get every bit of the speedup it was going to get.) But the additional latency when I un-xlock in the morning is annoying, and there is no benefit. For a more useful example, ideally I *should not be able to tell* that "dd if=/hde1 of=/hdf1" is running. [1] There is *no* benefit to cacheing more than about 2 pages, under this workload. But with current kernels, IME, that workload results in a gargantuan buffer cache and lots of swapout of apps I was using 3 minutes ago. I've taken to walking away for some coffee, coming back when it's done, and "sudo swapoff /dev/hda3; sudo swapon -a" to avoid the latency that is so annoying when trying to use bloaty apps. [1] obviously I'll see some slowdown due to interrupts and PCI bandwidth; that's not what I'm railing against, here. -andy ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 16:51 ` Andy Isaacson @ 2004-04-29 20:42 ` Andrew Morton 2004-04-29 22:27 ` Andy Isaacson 2004-04-30 0:14 ` Lincoln Dale 1 sibling, 1 reply; 128+ messages in thread From: Andrew Morton @ 2004-04-29 20:42 UTC (permalink / raw) To: Andy Isaacson; +Cc: elf, riel, brettspamacct, jgarzik, linux-kernel Andy Isaacson <adi@hexapodia.org> wrote: > > What I want is for purely sequential workloads which far exceed cache > size (dd, updatedb, tar czf /backup/home.nightly.tar.gz /home) to avoid > thrashing my entire desktop out of memory. I DON'T CARE if the tar > completed in 45 minutes rather than 80. (It wouldn't, anyways, because > it only needs about 5 MB of cache to get every bit of the speedup it was > going to get.) But the additional latency when I un-xlock in the > morning is annoying, and there is no benefit. What kernel version are you using? If 2.6, what value of /proc/sys/vm/swappiness? > For a more useful example, ideally I *should not be able to tell* that > "dd if=/hde1 of=/hdf1" is running. I just did a 4GB `dd if=/dev/sda of=/x bs=1M' on a 1GB 2.6.6-rc2-mm2 swappiness=85 machine here and there was no swapout at all. Probably your machine has less memory. But without real, hard details nothing can be done. > There is *no* benefit to cacheing > more than about 2 pages, under this workload. Sure, we could do better things with the large streaming files, although the risk of accidentally screwing up particular workloads is high. But the use-once logic which we have in there at present does handle these cases quite well. > But with current kernels, > IME, that workload results in a gargantuan buffer cache and lots of > swapout of apps I was using 3 minutes ago. I've taken to walking away > for some coffee, coming back when it's done, and "sudo swapoff > /dev/hda3; sudo swapon -a" to avoid the latency that is so annoying when > trying to use bloaty apps. What kernel, what system specs, what swappiness setting? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 20:42 ` Andrew Morton @ 2004-04-29 22:27 ` Andy Isaacson 2004-04-29 23:19 ` Andrew Morton 0 siblings, 1 reply; 128+ messages in thread From: Andy Isaacson @ 2004-04-29 22:27 UTC (permalink / raw) To: Andrew Morton; +Cc: elf, riel, brettspamacct, jgarzik, linux-kernel On Thu, Apr 29, 2004 at 01:42:22PM -0700, Andrew Morton wrote: > Andy Isaacson <adi@hexapodia.org> wrote: > > What I want is for purely sequential workloads which far exceed cache > > size (dd, updatedb, tar czf /backup/home.nightly.tar.gz /home) to avoid > > thrashing my entire desktop out of memory. I DON'T CARE if the tar > > completed in 45 minutes rather than 80. (It wouldn't, anyways, because > > it only needs about 5 MB of cache to get every bit of the speedup it was > > going to get.) But the additional latency when I un-xlock in the > > morning is annoying, and there is no benefit. > > What kernel version are you using? If 2.6, what value of > /proc/sys/vm/swappiness? 2.4.various, including 2.4.25 and 2.4.26. I haven't taken the 2.6 plunge yet. Running on various x86 including - dual PIII 666 MHz 512 MB - SpeedStep PIII 700 MHz 128 MB - Athlon XP 2GHz 512 MB > > For a more useful example, ideally I *should not be able to tell* that > > "dd if=/hde1 of=/hdf1" is running. > > I just did a 4GB `dd if=/dev/sda of=/x bs=1M' on a 1GB 2.6.6-rc2-mm2 > swappiness=85 machine here and there was no swapout at all. > > Probably your machine has less memory. But without real, hard details > nothing can be done. I'm pleased to hear that 2.6 is apparently better behaved. In your test, what was the impact on the file cache? It's a big improvement to not be paging out to swap, but it's also important that sequential IO not evict my cached build tree. An interesting test would be to time a compilation of a source file with a large number of includes. For example, building linux-2.4.25/kernel/sysctl.c on my Athlon XP 2GHz, 512MB, 2.4.25 takes 2.8 seconds with (fairly) cold cache. (I didn't reboot, but I did take fairly extreme measures to force stuff out.) It takes 0.54 seconds with warm caches. After doing 1GB of sequential IO (wc -w /tmp/bigfile) I'm back up to 2.08 seconds. > > There is *no* benefit to cacheing > > more than about 2 pages, under this workload. > > Sure, we could do better things with the large streaming files, although > the risk of accidentally screwing up particular workloads is high. Yeah, I agree. For example, I've occasionally used cat(1) or wc(1) to prefetch files that I knew I was going to be accessing randomly; with my hypothetical "sequential IO doesn't cause cacheing" it would be much harder to do effective manual prefetching. > But the use-once logic which we have in there at present does handle these > cases quite well. Where is the use-once logic available? Is it in mainstream 2.6 or only in some development branches? I've not upgraded from 2.4 mostly because I didn't see much benefits evident in the discussions, but improved paging logic would be nice. > > But with current kernels, > > IME, that workload results in a gargantuan buffer cache and lots of > > swapout of apps I was using 3 minutes ago. I've taken to walking away > > for some coffee, coming back when it's done, and "sudo swapoff > > /dev/hda3; sudo swapon -a" to avoid the latency that is so annoying when > > trying to use bloaty apps. > > What kernel, what system specs, what swappiness setting? 2.4.25, Athlon XP 2 GHz, 512MB. I suppose you're not terribly interested in 2.4. I'll see if I can reasonably upgrade, if you can tell me what I should upgrade to for the good stuff. -andy ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 22:27 ` Andy Isaacson @ 2004-04-29 23:19 ` Andrew Morton 0 siblings, 0 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 23:19 UTC (permalink / raw) To: Andy Isaacson; +Cc: elf, riel, brettspamacct, jgarzik, linux-kernel Andy Isaacson <adi@hexapodia.org> wrote: > > > What kernel version are you using? If 2.6, what value of > > /proc/sys/vm/swappiness? > > 2.4.various, including 2.4.25 and 2.4.26. I haven't taken the 2.6 > plunge yet. OK. Please try 2.6 and let us know how it changes things. > > I just did a 4GB `dd if=/dev/sda of=/x bs=1M' on a 1GB 2.6.6-rc2-mm2 > > swappiness=85 machine here and there was no swapout at all. > > > > Probably your machine has less memory. But without real, hard details > > nothing can be done. > > I'm pleased to hear that 2.6 is apparently better behaved. In your > test, what was the impact on the file cache? It will have munched everything else. > It's a big improvement to > not be paging out to swap, but it's also important that sequential IO > not evict my cached build tree. Yup. We don't have any large-streaming-file heuristics in there. It is the case that if you have recently accessed a file *twice* then its pages will be preferred over the large streaming file. But if you've accessed the valuable file only once, the streaming I/O will evict it. > > > There is *no* benefit to cacheing > > > more than about 2 pages, under this workload. > > > > Sure, we could do better things with the large streaming files, although > > the risk of accidentally screwing up particular workloads is high. > > Yeah, I agree. For example, I've occasionally used cat(1) or wc(1) to > prefetch files that I knew I was going to be accessing randomly; with my > hypothetical "sequential IO doesn't cause cacheing" it would be much > harder to do effective manual prefetching. We have a new syscall in 2.6 (fadvise) with which an app can provide hints about its access patterns and its desired cache usage. So, for example, tar or rsync or whatever could (if told to do so by the user) deliberately throw away the pagecache after having accessed the file. But that does require application modifications. They're pretty simple though: + if (user said to throw away the cache) + posix_fadvise64(fd, 0, -1, POSIX_FADV_DONTNEED); close(fd); > > But the use-once logic which we have in there at present does handle these > > cases quite well. > > Where is the use-once logic available? Is it in mainstream 2.6 or only > in some development branches? I've not upgraded from 2.4 mostly because > I didn't see much benefits evident in the discussions, but improved > paging logic would be nice. It's in 2.4 also. I think 2.4 tends to do the wrong thing because dirty pages easily make it to the tail of the VM LRU's, which eventually causes the VM to go off and hunt down mapped pages instead. 2.6 takes more care to prevent dirty pages from hitting the tail of the LRU. > > > But with current kernels, > > > IME, that workload results in a gargantuan buffer cache and lots of > > > swapout of apps I was using 3 minutes ago. I've taken to walking away > > > for some coffee, coming back when it's done, and "sudo swapoff > > > /dev/hda3; sudo swapon -a" to avoid the latency that is so annoying when > > > trying to use bloaty apps. > > > > What kernel, what system specs, what swappiness setting? > > 2.4.25, Athlon XP 2 GHz, 512MB. I suppose you're not terribly > interested in 2.4. I'll see if I can reasonably upgrade, if you can > tell me what I should upgrade to for the good stuff. 2.6.6-rc3 would be suitable. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 16:51 ` Andy Isaacson 2004-04-29 20:42 ` Andrew Morton @ 2004-04-30 0:14 ` Lincoln Dale 1 sibling, 0 replies; 128+ messages in thread From: Lincoln Dale @ 2004-04-30 0:14 UTC (permalink / raw) To: Andy Isaacson Cc: Andrew Morton, Marc Singer, riel, brettspamacct, jgarzik, linux-kernel At 02:51 AM 30/04/2004, Andy Isaacson wrote: >What I want is for purely sequential workloads which far exceed cache >size (dd, updatedb, tar czf /backup/home.nightly.tar.gz /home) to avoid >thrashing my entire desktop out of memory. I DON'T CARE if the tar >completed in 45 minutes rather than 80. (It wouldn't, anyways, because >it only needs about 5 MB of cache to get every bit of the speedup it was >going to get.) But the additional latency when I un-xlock in the >morning is annoying, and there is no benefit. > >For a more useful example, ideally I *should not be able to tell* that >"dd if=/hde1 of=/hdf1" is running. [1] There is *no* benefit to cacheing >more than about 2 pages, under this workload. But with current kernels, >IME, that workload results in a gargantuan buffer cache and lots of >swapout of apps I was using 3 minutes ago. I've taken to walking away >for some coffee, coming back when it's done, and "sudo swapoff >/dev/hda3; sudo swapon -a" to avoid the latency that is so annoying when >trying to use bloaty apps. the mechanism already exists; teach tar/dd and any other app that you don't want to pollute the page-cache with data with to use O_DIRECT. i suspect updatedb is a different case as its probably filling the system with dcache/inode entries. cheers, lincoln. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 3:10 ` Marc Singer 2004-04-29 3:19 ` Andrew Morton @ 2004-04-29 8:02 ` Wichert Akkerman 2004-04-29 14:25 ` Marcelo Tosatti 1 sibling, 1 reply; 128+ messages in thread From: Wichert Akkerman @ 2004-04-29 8:02 UTC (permalink / raw) To: linux-kernel Previously Marc Singer wrote: > I'm thinking that the problem is that the page cache is greedier that > most people expect. For example, if I could hold the page cache to be > under a specific size, then I could do some performance measurements. It is actually greedy enough that when my nightly cron starts I suddenly see apache and pdns_recursor being killed consistently every day. Wichert. -- Wichert Akkerman <wichert@wiggy.net> It is simple to make things. http://www.wiggy.net/ It is hard to make things simple. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 8:02 ` Wichert Akkerman @ 2004-04-29 14:25 ` Marcelo Tosatti 2004-04-29 14:27 ` Wichert Akkerman 0 siblings, 1 reply; 128+ messages in thread From: Marcelo Tosatti @ 2004-04-29 14:25 UTC (permalink / raw) To: linux-kernel; +Cc: wichert On Thu, Apr 29, 2004 at 10:02:19AM +0200, Wichert Akkerman wrote: > Previously Marc Singer wrote: > > I'm thinking that the problem is that the page cache is greedier that > > most people expect. For example, if I could hold the page cache to be > > under a specific size, then I could do some performance measurements. > > It is actually greedy enough that when my nightly cron starts I suddenly > see apache and pdns_recursor being killed consistently every day. Which kernel is that? They are getting killed because there is no more swap available. Otherwise its a bug. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 14:25 ` Marcelo Tosatti @ 2004-04-29 14:27 ` Wichert Akkerman 0 siblings, 0 replies; 128+ messages in thread From: Wichert Akkerman @ 2004-04-29 14:27 UTC (permalink / raw) To: linux-kernel Previously Marcelo Tosatti wrote: > Which kernel is that? That machine is running 2.6.4 at the moment. > They are getting killed because there is no more swap available. > Otherwise its a bug. It actually killed a bunch of processes a minute ago and right now has 120mb of swap free and 104mb used for cache. Wichert. -- Wichert Akkerman <wichert@wiggy.net> It is simple to make things. http://www.wiggy.net/ It is hard to make things simple. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:57 ` Andrew Morton 2004-04-29 2:29 ` Marc Singer @ 2004-04-29 2:41 ` Rik van Riel 2004-04-29 2:43 ` Andrew Morton 1 sibling, 1 reply; 128+ messages in thread From: Rik van Riel @ 2004-04-29 2:41 UTC (permalink / raw) To: Andrew Morton; +Cc: brettspamacct, jgarzik, linux-kernel On Wed, 28 Apr 2004, Andrew Morton wrote: > Rik van Riel <riel@redhat.com> wrote: > > > > IMHO, the VM on a desktop system really should be optimised to > > have the best interactive behaviour, meaning decent latency > > when switching applications. > > I'm gonna stick my fingers in my ears and sing "la la la" until people tell > me "I set swappiness to zero and it didn't do what I wanted it to do". Agreed, you shouldn't be the one to fix this problem. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 2:41 ` Rik van Riel @ 2004-04-29 2:43 ` Andrew Morton 0 siblings, 0 replies; 128+ messages in thread From: Andrew Morton @ 2004-04-29 2:43 UTC (permalink / raw) To: Rik van Riel; +Cc: brettspamacct, jgarzik, linux-kernel Rik van Riel <riel@redhat.com> wrote: > > On Wed, 28 Apr 2004, Andrew Morton wrote: > > Rik van Riel <riel@redhat.com> wrote: > > > > > > IMHO, the VM on a desktop system really should be optimised to > > > have the best interactive behaviour, meaning decent latency > > > when switching applications. > > > > I'm gonna stick my fingers in my ears and sing "la la la" until people tell > > me "I set swappiness to zero and it didn't do what I wanted it to do". > > Agreed, you shouldn't be the one to fix this problem. > What problem? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:49 ` Brett E. 2004-04-29 1:00 ` Andrew Morton @ 2004-04-29 1:41 ` Tim Connors 2004-04-29 9:43 ` Helge Hafting 2 siblings, 0 replies; 128+ messages in thread From: Tim Connors @ 2004-04-29 1:41 UTC (permalink / raw) To: Jeff Garzik, Andrew Morton, linux-kernel "Brett E." <brettspamacct@fastclick.com> said on Wed, 28 Apr 2004 17:49:43 -0700: > Or how about "Use ALL the cache you want Mr. Kernel. But when I want > more physical memory pages, just reap cache pages and only swap out when > the cache is down to a certain size(configurable, say 100megs or > something)." Oh how dearly I would love that... I have a huge app that operates on a large file (but both are a bit smaller than available memory, by maybe a hundred or two megs - enough for to keep the entire working set in RAM, anyway). I create these large files over and over (on another host, so cache does absolutely no good whatsoever, since we are streaming a read once), but don't delete the old ones, so they all remain in cache. So when I close one copy of the app, and open up a new one on a different file, when it comes time to allocate those several hundred megs, it rather blows away my mozilla or my X session(! -- since I need it to display the results) or my window manager, and keeps growing that cache. -- TimC -- http://astronomy.swin.edu.au/staff/tconnors/ When you are chewing on life's grissle, don't grumble - give a whistle! This'll help things turn out for the best Always look on the bright side of life ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:49 ` Brett E. 2004-04-29 1:00 ` Andrew Morton 2004-04-29 1:41 ` Tim Connors @ 2004-04-29 9:43 ` Helge Hafting 2004-04-29 14:48 ` Marc Singer 2 siblings, 1 reply; 128+ messages in thread From: Helge Hafting @ 2004-04-29 9:43 UTC (permalink / raw) To: brettspamacct; +Cc: linux-kernel Brett E. wrote: [...] > Or how about "Use ALL the cache you want Mr. Kernel. But when I want > more physical memory pages, just reap cache pages and only swap out when > the cache is down to a certain size(configurable, say 100megs or > something)." Problem: reaping cache is equivalent to swapping in some cases. The cache isn't merely "files read & written". It is also all your executable code. Code is not different from files being read at all. Dumping too much cache will dump the code you're executing, and then it have to be reloaded from disk. Helge Hafting ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 9:43 ` Helge Hafting @ 2004-04-29 14:48 ` Marc Singer 0 siblings, 0 replies; 128+ messages in thread From: Marc Singer @ 2004-04-29 14:48 UTC (permalink / raw) To: Helge Hafting; +Cc: brettspamacct, linux-kernel On Thu, Apr 29, 2004 at 11:43:25AM +0200, Helge Hafting wrote: > Brett E. wrote: > [...] > >Or how about "Use ALL the cache you want Mr. Kernel. But when I want > >more physical memory pages, just reap cache pages and only swap out when > >the cache is down to a certain size(configurable, say 100megs or > >something)." > > Problem: reaping cache is equivalent to swapping in some cases. > The cache isn't merely "files read & written". > It is also all your executable code. Code is not different from > files being read at all. Dumping too much cache will dump the > code you're executing, and then it have to be reloaded from disk. Hmm. I was under the impression that mapped pages were code and unmapped pages were IO page cache. Are you suggesting that code is duplicated? ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:01 ` Andrew Morton 2004-04-29 0:10 ` Jeff Garzik @ 2004-04-29 0:44 ` Brett E. 2004-04-29 1:13 ` Andrew Morton 1 sibling, 1 reply; 128+ messages in thread From: Brett E. @ 2004-04-29 0:44 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1968 bytes --] First of all, thanks for your replies and helpful information. Andrew Morton wrote: > "Brett E." <brettspamacct@fastclick.com> wrote: > >>I attached sar, slabinfo and /proc/meminfo data on the 2.6.5 machine. I >>reproduce this behavior by simply untarring a 260meg file on a >>production server, the machine becomes sluggish as it swaps to disk. > > > I see no swapout from the info which you sent. pgpgout/s gives the total number of blocks paged out to disk per second, it peaks at 13,000 and hovers around 3,000 per the attachment. > > A `vmstat 1' trace would be more useful. Ok. Attached(ran this with swappiness set to 0 then 100). In both cases sar showed high paging in/out. > > >>Is there a way to limit the cache so this machine, which has 1 gigabyte of >>memory, doesn't dip into swap? > > > Decrease /proc/sys/vm/swappiness? > > Swapout is good. It frees up unused memory. I run my desktop machines at > swappiness=100. > Swapping out is good, but when that's coupled with swapping in as is the case on my side, it creates a thrashing situation where we swap out to disk pages which are being used, we then immediately swap those pages back in, etc etc.. This creates lots of disk I/O which competes with the userland processes, resulting in a system slowing down to a crawl. I don't understand why it swaps in the first place when 400-500 megs are taken up by cache datastructures. The usage pattern by the way is on a server which continuously hits a database and reads files so I don't know what "swappiness" should be set to exactly. Every hour or so it wants to untar tarballs and by then the cache is large. From here, the system swaps in and out more while cache decreases. Basically, it should do what I believe Solaris does... simply reclaim cache and not swap. Capping cache would be good too but the best solution IMO is to simply reclaim the cache on an as-needed basis before thinking about swapping. [-- Attachment #2: attach.2 --] [-- Type: text/plain, Size: 20862 bytes --] swappiness of 0: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 11 168396 9260 16032 508496 1 2 84 120 142 132 37 7 48 8 0 8 168396 6636 16056 510580 0 0 1320 76 1334 1639 13 3 0 84 0 9 168396 6432 16068 510092 0 0 836 60 1242 1124 13 2 0 85 17 9 168396 7200 16084 508580 0 0 1248 148 1318 1351 11 3 7 80 0 10 168396 14488 16104 507064 0 0 1364 904 1488 1977 20 4 0 76 8 8 168396 11992 16116 508752 0 0 1124 88 1304 1345 11 3 0 86 16 8 168396 10008 16140 510768 0 0 1392 172 1434 1970 21 4 0 74 3 11 168396 13592 16152 512524 0 0 1072 1364 1625 2544 32 6 2 60 0 9 168396 6560 16252 519632 0 0 5644 380 1431 2073 19 6 0 76 0 10 168396 6576 16320 519564 0 0 3840 1208 1259 1013 9 4 0 88 0 6 168396 7040 16420 519260 0 0 5408 356 1311 1281 11 4 0 85 0 10 168396 8640 16432 517616 0 0 2020 116 1496 2268 26 6 0 68 0 10 168396 8384 16528 516704 0 0 4972 4124 1526 2278 30 8 4 60 0 7 168396 7744 16528 517248 0 0 552 16 1267 1012 8 2 1 89 0 7 168396 6528 16532 517788 0 0 304 12024 1175 174 1 1 0 98 0 8 168396 7488 16552 515728 0 0 1408 7376 1111 173 1 1 0 98 12 8 168396 8824 16556 514024 0 0 1956 2724 1301 1582 19 5 0 76 12 14 168396 6944 16504 492860 0 0 1524 0 1637 2458 71 12 0 17 0 7 168396 7072 16596 491272 0 0 5624 0 1296 1168 14 5 0 82 0 7 168396 6936 16708 488712 0 0 7520 1496 1287 1737 17 7 0 76 0 6 168396 6328 16756 488324 0 0 4712 2000 1234 786 7 4 0 89 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 12 168396 6448 13400 485600 96 0 8896 29480 8401 6544 11 3 0 86 4 13 168396 10016 12760 487056 0 0 1600 0 1475 2089 21 5 0 74 10 7 168396 6944 12764 486304 0 0 1620 1028 1509 2366 37 6 0 57 0 11 168396 6760 12780 493088 0 0 3880 2240 1455 2034 22 5 0 72 0 6 168396 6240 12796 493480 0 0 8064 2032 1248 1167 9 5 0 87 0 8 168396 7200 12820 492300 0 0 7336 2416 1387 1738 21 7 0 73 1 11 168396 7968 12820 491144 0 0 3628 784 1551 2490 27 8 0 65 0 4 168396 6544 12844 499144 0 0 8948 584 1318 1310 12 6 0 83 5 7 168396 8856 12856 496820 0 0 4536 5792 1288 1336 11 4 0 85 0 6 168396 7640 12856 498112 0 0 920 1352 1180 545 4 1 0 96 0 6 168396 7512 12860 497836 0 0 2053 10676 1152 133 0 1 0 98 0 6 168396 8056 12864 496676 0 0 1664 0 1079 138 1 1 0 99 1 7 168396 6400 12896 495964 0 0 7665 1649 1171 465 4 4 0 93 0 8 168396 7920 12908 495340 0 0 956 672 1410 1851 22 6 0 73 8 13 168396 6936 12908 477456 0 0 684 3299 1643 2473 54 11 0 34 0 15 168396 6584 12912 470516 0 0 832 1340 1376 1703 32 5 0 63 0 18 168396 11184 12912 471128 0 0 620 0 1541 2548 35 6 0 60 0 8 168396 11084 12912 472012 0 0 904 3384 1459 1771 29 6 0 65 1 9 168396 6844 12924 473088 0 0 1064 4 1390 1694 24 3 0 73 1 12 168396 7248 12932 469408 0 0 960 132 1562 2412 43 6 0 51 5 9 168396 13228 12932 468996 4 0 1160 128 1628 2732 38 7 0 55 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 11 168396 11628 12932 469948 0 0 880 1388 1619 2407 29 6 0 66 0 9 168396 16500 12936 471372 0 0 1068 2200 1385 1464 14 3 0 84 1 4 168396 14132 12964 473724 0 0 1712 5432 1516 2168 31 5 0 65 5 12 168396 19000 12972 475824 0 0 1360 160 1468 1918 30 5 0 66 0 9 168396 16720 12976 477724 0 0 1376 128 1599 2943 36 7 0 56 0 8 168396 20260 12976 480920 0 0 1884 224 1785 2982 38 8 0 54 5 12 168396 7368 13000 493136 0 0 10312 648 1372 1743 18 8 1 74 1 12 168396 7480 13008 492312 0 0 2848 1628 1532 2428 26 7 0 67 0 11 168396 7992 13016 498696 0 0 3484 3368 1546 2521 28 7 0 64 1 14 168396 6760 13028 499772 0 20 5536 812 1534 2733 32 7 0 62 0 17 168396 6120 13048 507504 0 0 7944 6596 1539 2217 26 8 0 66 0 5 168396 6424 13048 507164 0 0 564 4352 1229 332 4 1 0 95 0 14 168396 7128 13048 506552 0 0 1576 3280 1374 2009 22 4 0 74 0 10 168396 12536 13048 508184 0 0 1008 0 1283 1488 19 3 0 79 8 10 168392 10104 13056 510256 32 0 1460 192 1628 2668 30 8 0 62 1 6 168392 14904 13076 512960 64 0 1736 0 1696 3413 43 11 0 46 3 4 168392 19384 13088 515804 0 0 1708 0 1483 2337 27 6 0 67 7 5 168392 14328 13096 520624 0 0 2760 280 1659 2531 36 6 0 58 0 5 168392 16124 13108 527004 0 0 3416 276 1319 1308 15 6 0 80 0 5 168392 9724 13124 533448 0 0 3284 3840 1231 739 10 2 0 88 18 9 168392 9532 13124 533516 0 0 64 3108 1314 473 6 2 0 92 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 14 168392 8636 13124 534264 0 0 808 0 1511 2083 24 5 0 72 1 10 168392 8156 13124 535012 0 0 716 6360 1459 2052 21 4 1 76 0 5 168392 7388 13124 535624 0 0 628 2892 1449 1890 23 4 9 64 14 4 168392 14204 13128 536576 64 0 828 2056 1534 2161 31 5 3 61 0 4 168392 13820 13132 536980 0 0 356 3821 1490 2082 29 7 2 61 1 2 168392 20700 13136 537656 0 0 636 1568 1589 2516 31 6 5 58 1 6 168392 19708 13144 538600 0 0 1024 2732 1558 2504 39 7 6 49 6 3 168392 26688 13144 539144 0 0 520 5372 1624 2589 40 7 0 53 0 4 168392 34036 13148 539820 0 0 692 1568 1602 3144 37 7 6 50 2 3 168392 33300 13152 540564 0 0 732 5212 1624 2892 30 5 16 48 0 3 168392 40276 13160 541508 0 0 960 4188 1655 2542 37 7 6 52 1 2 168392 40340 13160 542052 0 0 464 2056 1719 3330 42 8 2 49 3 2 168392 47252 13164 542460 64 0 572 260 1728 3321 43 8 6 42 5 4 168392 54880 13168 542936 64 0 484 232 1725 3352 51 8 9 31 MemTotal: 1293976 kB MemFree: 77964 kB Buffers: 15568 kB Cached: 525740 kB SwapCached: 38596 kB Active: 677728 kB Inactive: 442556 kB HighTotal: 393216 kB HighFree: 768 kB LowTotal: 900760 kB LowFree: 77196 kB SwapTotal: 530104 kB SwapFree: 365860 kB Dirty: 50036 kB Writeback: 14036 kB Mapped: 570488 kB Slab: 82860 kB Committed_AS: 853200 kB PageTables: 4400 kB VmallocTotal: 114680 kB VmallocUsed: 560 kB VmallocChunk: 114120 kB swappiness of 100: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 1 168304 198344 13736 463148 1 2 85 121 144 137 37 7 48 9 1 0 168304 198200 13736 463216 0 0 44 372 1830 3379 48 8 28 17 0 0 168304 198120 13736 463284 0 0 32 8 1680 3064 41 7 51 1 2 1 168304 145512 14952 514768 0 0 68 864 1586 2377 33 25 34 7 0 4 168304 135336 15176 524744 0 0 248 1092 1410 657 10 6 0 85 0 5 168304 135144 15176 524744 0 0 0 956 1315 178 1 1 0 98 0 5 168304 133544 15188 525616 0 0 412 10604 1267 795 14 3 0 84 0 4 168304 112112 15212 527564 0 0 1012 4768 1630 2875 74 12 0 14 0 4 168304 84432 15504 545836 0 0 6120 12484 1627 2396 54 15 0 32 0 6 168304 57496 16040 568352 0 0 180 32512 2432 1297 16 7 0 77 1 5 168304 36696 16452 585688 0 0 8 8120 1338 259 7 8 0 84 10 6 168304 23752 16712 595220 0 0 64 13304 1317 1029 19 7 0 73 8 8 168304 7248 16936 589440 0 0 284 5380 1607 2234 63 15 0 22 0 17 168304 6192 16944 587324 0 0 932 5328 1381 1636 17 4 0 80 0 15 168304 7400 16940 581888 0 0 600 752 1389 1500 23 4 0 74 3 11 168304 7400 16944 581136 0 0 688 16 1289 877 7 1 0 92 5 23 168304 7304 16944 578892 0 0 888 606 1287 1189 9 3 0 89 2 10 168304 7784 16940 571756 0 0 780 1136 1436 1855 30 4 0 67 0 7 168304 6184 16940 568016 0 0 816 104 1458 1757 23 5 1 71 0 7 168304 6752 16940 556184 0 0 732 188 1441 1726 34 7 0 59 0 11 168304 6488 16940 555368 0 0 652 344 1285 1361 11 2 0 87 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 15 168304 6640 16996 553952 0 0 3316 960 1357 1578 11 4 0 85 1 15 168304 6512 17072 551876 0 0 6064 3804 1289 1132 14 5 0 82 1 14 168304 6896 17212 542792 0 0 6032 3176 1333 1496 26 8 0 66 0 11 168296 6504 17220 527728 0 0 3508 4204 1323 1326 38 8 0 54 1 7 168296 7376 17292 526992 0 0 3476 1120 1236 909 8 3 0 88 1 12 168292 6992 17388 520728 0 0 4268 4776 1354 1409 25 5 0 70 0 13 168292 6544 17388 521068 0 0 424 48 1213 914 6 1 0 93 2 16 168292 6536 17388 519028 0 0 756 4 1308 1582 18 3 0 78 0 7 168292 6456 17388 516784 0 0 668 5016 1308 1289 13 2 0 85 0 7 168292 11168 17388 516036 0 0 644 112 1340 1366 14 3 0 84 0 13 168292 8992 17392 516712 0 0 696 2068 1358 1713 17 3 0 80 5 11 168292 6896 17392 513176 0 0 616 92 1431 1792 35 5 0 61 0 6 168292 7280 17416 512488 0 0 1212 2396 1431 2187 20 3 0 77 0 10 168292 7216 17436 512260 0 0 1472 44 1372 1647 14 4 1 82 0 9 168292 12576 17444 512580 0 0 1044 84 1252 1067 10 2 0 87 0 11 168292 10912 17464 514192 0 0 1136 2640 1304 1353 12 3 0 86 1 4 168292 9312 17472 515816 0 0 1144 720 1244 1212 10 2 0 89 1 9 168292 13744 17488 517976 0 0 1276 96 1417 1750 15 4 0 82 0 11 168292 11760 17508 519656 0 0 1228 1732 1362 2056 17 4 0 79 0 9 168292 9712 17524 521204 0 0 1116 832 1338 1286 12 3 0 86 1 10 168292 7072 17644 524056 0 0 5600 160 1514 2443 24 8 0 68 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 10 168292 8036 17672 530312 0 0 6048 184 1319 1684 16 5 0 79 1 10 168292 7392 17648 531064 0 0 7080 132 1340 1502 11 5 0 83 1 13 168292 6840 17644 531620 0 0 6372 360 1379 1858 15 6 0 80 1 7 168292 7288 17652 528416 0 0 924 5 1317 690 11 2 0 87 0 5 168292 6648 17664 528472 0 0 136 9961 1293 279 1 1 0 98 0 6 168292 8056 17664 526024 0 0 4 6297 1154 107 0 1 0 99 0 14 168292 7416 17680 528388 0 0 1480 2327 1476 2175 24 6 0 70 2 10 168292 6576 17604 504120 0 0 884 3724 1349 1494 56 9 0 35 2 7 168292 10016 17604 504052 0 0 920 1052 1358 1564 25 4 0 71 1 8 168292 12064 17608 506564 0 0 1604 1108 1397 1779 17 4 0 79 0 12 168292 6656 17604 509764 0 0 3904 2488 1413 1661 19 5 0 76 1 15 168292 6848 17516 505296 0 0 5196 2052 1445 2156 29 6 0 65 0 11 168292 8256 17520 507672 0 0 4284 780 1300 1566 19 5 1 75 1 13 168292 7792 17504 508572 0 0 6808 2303 1364 1465 13 5 0 83 0 7 168292 6896 17516 508780 0 0 4516 3256 1315 1473 12 4 0 84 0 6 168292 8160 17524 507604 0 8 2373 448 1196 558 3 2 0 96 0 4 168292 7904 17532 507664 0 0 52 10924 1234 244 3 1 0 96 1 19 168292 6640 17532 508140 0 0 1740 7580 1250 1175 13 4 0 83 0 11 168292 6704 17532 508344 0 0 1104 12 1258 1166 11 2 0 87 1 6 168292 6256 17536 502296 0 0 1724 404 1489 1894 34 7 0 58 0 10 168292 8552 17536 496788 0 0 1824 1548 1561 2694 38 8 3 52 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 5 168292 12036 17544 499432 0 0 1724 148 1515 2175 28 6 0 66 7 6 168292 6380 17544 501608 0 0 1412 128 1291 1381 22 3 0 74 9 8 168292 11580 17540 502428 0 0 1540 148 1436 1863 22 6 1 71 0 14 168292 9276 17540 504264 0 0 1228 660 1616 2405 30 6 0 65 0 10 168292 6280 17556 507220 0 0 7260 1948 1355 1679 16 6 0 78 0 12 168292 7176 17564 506164 0 0 7224 2040 1294 1406 10 6 0 84 0 10 168292 6984 17572 506168 0 0 6608 2340 1316 1511 11 5 0 84 0 17 168292 6108 17584 506836 0 8 6928 4196 1386 1543 13 5 0 82 0 9 168292 6712 17584 506020 0 0 592 12 1188 694 7 1 0 91 0 6 168292 6200 17584 506292 0 0 312 4420 1231 360 1 1 0 98 1 7 168292 6456 17584 502996 32 0 612 8952 1464 1424 17 4 0 79 0 6 168296 3200 17580 496572 0 4 636 500 1592 2571 49 10 0 41 0 17 168304 3124 17580 494836 0 8 868 8 1505 2131 32 6 0 62 0 7 168404 8364 17580 492808 0 220 628 4264 1406 1908 24 4 0 73 0 12 168620 5804 17580 493008 0 340 1108 1312 1517 2096 30 5 0 66 0 10 168620 9884 17584 495112 0 0 1416 0 1424 2218 24 5 0 71 0 12 168620 7324 17588 497624 0 0 1620 0 1441 1633 16 4 0 80 0 14 168620 10580 17588 499460 0 0 1228 144 1383 1637 25 5 0 71 0 8 168632 6784 17608 503888 0 124 4056 124 1432 1655 18 5 0 76 0 11 169212 7348 17628 503444 0 676 3748 864 1383 1622 19 4 5 71 4 13 169212 5236 17628 506012 16 0 1808 3896 1364 1903 20 4 4 73 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 6 169296 3444 17640 507780 0 88 1208 1364 1580 2414 33 7 0 60 0 17 173120 14424 17624 504260 0 3832 956 3836 1383 1716 17 3 0 80 0 9 174236 13336 17624 505592 0 1116 1056 2672 1298 1119 10 3 0 88 0 16 174236 5464 17640 513396 0 0 4254 3100 1367 1253 10 4 0 86 0 12 174732 5176 17624 514112 0 496 776 568 1354 1686 16 4 0 80 0 17 178196 7008 17624 508172 0 3464 696 3480 1366 1636 23 5 0 73 0 14 180832 10424 17620 498148 0 2640 664 2640 1307 1274 24 4 0 72 0 6 180832 9448 17620 498912 52 0 784 12 1316 1169 9 2 2 86 0 13 180832 8360 17620 499584 8 0 680 176 1405 1717 17 4 4 75 5 11 180832 9192 17620 500196 0 0 680 128 1579 2259 34 6 8 51 1 9 180832 8104 17624 501076 0 0 820 340 1355 1412 12 2 13 74 16 7 180832 7016 17632 502020 0 0 972 256 1364 1607 20 3 0 77 0 8 180832 12280 17632 502600 32 0 636 1930 1394 1472 15 3 0 81 2 5 180832 11256 17632 503484 0 0 812 3356 1420 1983 23 4 0 73 0 9 180832 18128 17632 503824 0 0 372 1712 1420 1865 25 5 0 71 0 7 180832 17152 17632 504708 0 0 848 1456 1327 1609 18 3 2 77 4 11 180832 24376 17640 505312 0 0 628 3552 1470 1961 21 3 20 55 0 2 180832 23544 17640 505924 0 0 612 1560 1447 1918 21 4 7 69 0 15 180832 22584 17648 506732 0 0 740 1376 1313 1216 15 3 14 67 0 5 180832 22072 17652 507204 0 0 572 1476 1770 3492 46 9 1 44 1 12 180832 20984 17652 508292 0 0 1040 3232 1741 3186 50 9 5 37 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 8 5 180832 27736 17652 509108 0 0 828 3726 1631 2891 51 11 0 37 4 6 180832 26936 17652 509856 0 0 700 5672 1702 2875 39 9 13 39 0 3 180832 33568 17652 510672 0 0 880 232 1677 2889 42 7 2 49 0 16 180832 32992 17652 511624 0 0 948 136 1559 2325 29 6 5 59 13 9 180832 40240 17652 512236 0 0 604 176 1403 1779 17 3 3 76 1 9 180832 39248 17652 513732 0 0 1476 144 1705 3237 41 9 3 48 3 4 180832 46724 17652 514276 0 0 504 272 1721 3376 39 8 0 53 3 7 180832 45940 17652 515092 0 0 828 840 1753 3187 43 9 6 42 4 8 180832 53268 17652 515704 0 0 560 3424 1545 2224 29 6 12 53 2 2 180832 60276 17652 516384 0 0 664 356 1698 3135 37 7 8 48 5 6 180832 59508 17652 517124 8 0 772 880 1562 2492 28 6 9 58 0 4 180776 67384 17652 517520 0 0 420 3280 1595 3019 36 8 9 48 1 7 180776 66968 17652 517928 0 0 336 1568 1558 2338 28 5 22 45 0 8 180776 66456 17652 518336 0 0 400 332 1501 2040 28 6 5 61 0 5 180776 73240 17652 519280 8 0 980 48 1604 2806 32 8 12 50 0 8 180776 72664 17652 519960 0 0 688 5048 1667 2822 35 6 16 42 1 2 180716 79924 17656 520492 0 0 504 8 1638 2522 37 7 22 35 3 0 180716 79124 17660 520896 0 0 424 8 1683 3526 48 8 27 16 0 0 180604 86020 17664 521592 20 0 696 16 1783 3452 56 11 20 12 1 1 180604 93412 17664 522340 0 0 720 8 1796 3540 52 10 24 15 5 0 180604 92516 17664 523020 0 0 708 240 1799 3012 47 7 18 27 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 8 0 180604 99556 17664 523632 0 0 580 8 1811 3788 51 10 25 13 0 3 180604 107132 17664 524176 0 0 528 8 1798 3622 61 10 13 16 14 1 180604 106524 17668 524716 0 0 572 16 1780 3195 48 11 9 32 0 0 180604 105884 17668 525300 96 0 644 8 1931 4016 66 12 16 5 10 12 180604 113244 17668 525744 32 0 544 651 1733 3057 40 8 17 36 5 0 180604 109020 17672 526592 32 0 868 8 1778 3294 49 9 7 34 8 1 180604 104764 17672 527680 0 0 1012 8 1835 3832 65 11 5 19 1 1 180604 110460 17672 528428 0 0 736 16 1772 3721 56 10 24 11 0 1 180604 109692 17672 529172 4 0 752 8 1791 3388 51 9 28 11 1 3 180604 108988 17676 529844 4 0 708 632 1860 3637 54 9 9 28 0 1 180604 108732 17676 530116 0 0 252 16 1770 2861 46 8 16 32 4 0 180604 108220 17684 530652 0 0 460 8 1650 3194 41 8 36 14 5 0 180604 107772 17684 531128 0 0 512 16 1625 3173 40 7 41 11 3 1 180556 114956 17684 531612 0 0 504 8 1750 3649 51 10 29 11 13 0 180556 122300 17684 532088 0 0 452 724 1835 3344 50 9 14 27 6 1 180556 121836 17684 532564 0 0 496 16 1782 4114 53 10 30 7 1 0 180556 121324 17684 533176 0 0 524 8 1686 3157 43 8 40 10 1 0 180556 128844 17684 533516 0 0 384 16 1720 3460 51 8 31 11 3 0 180556 128652 17684 533712 8 0 216 8 1740 3888 45 8 38 8 13 4 180556 136076 17684 534324 0 0 576 808 1794 3053 43 8 19 30 1 0 180556 135756 17684 534596 0 0 288 16 1728 2986 42 8 25 25 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 180556 135308 17684 535072 0 0 456 8 1715 3644 42 7 41 9 MemTotal: 1293976 kB MemFree: 7528 kB Buffers: 16940 kB Cached: 581684 kB SwapCached: 32552 kB Active: 852360 kB Inactive: 342852 kB HighTotal: 393216 kB HighFree: 768 kB LowTotal: 900760 kB LowFree: 6760 kB SwapTotal: 530104 kB SwapFree: 361800 kB Dirty: 48908 kB Writeback: 12996 kB Mapped: 594156 kB Slab: 78336 kB Committed_AS: 883164 kB PageTables: 4452 kB VmallocTotal: 114680 kB VmallocUsed: 560 kB VmallocChunk: 114120 kB ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:44 ` Brett E. @ 2004-04-29 1:13 ` Andrew Morton 2004-04-29 1:29 ` Brett E. 0 siblings, 1 reply; 128+ messages in thread From: Andrew Morton @ 2004-04-29 1:13 UTC (permalink / raw) To: brettspamacct; +Cc: linux-kernel "Brett E." <brettspamacct@fastclick.com> wrote: > > > I see no swapout from the info which you sent. > > pgpgout/s gives the total number of blocks paged out to disk per second, > it peaks at 13,000 and hovers around 3,000 per the attachment. Nope. pgpgout is simply writes to disk, of all types. swapout is accounted for under pswpout and your vmstat trace shows a little bit of (healthy) swapout with swappiness=100 and negligible swapout with swappiness=0. In both cases, negligible swapin. That's all just fine. > Swapping out is good, but when that's coupled with swapping in as is the > case on my side, it creates a thrashing situation where we swap out to > disk pages which are being used, we then immediately swap those pages > back in, etc etc.. Look at your "si" column in vmstat. It's practically all zeroes. > The usage pattern by the way is on a server which continuously hits a > database and reads files so I don't know what "swappiness" should be set > to exactly. Every hour or so it wants to untar tarballs and by then the > cache is large. From here, the system swaps in and out more while cache > decreases. Basically, it should do what I believe Solaris does... simply > reclaim cache and not swap. Capping cache would be good too but the > best solution IMO is to simply reclaim the cache on an as-needed basis > before thinking about swapping. swappiness=100: swaps a lot. swappiness=0: doesn't swap much. With a funny workload like that you might choose to set swappiness to 0 just around the hourly tar operation, but as the machine seems to not be swapping there doesn't seem to be a need. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:13 ` Andrew Morton @ 2004-04-29 1:29 ` Brett E. 2004-04-29 18:05 ` Brett E. 0 siblings, 1 reply; 128+ messages in thread From: Brett E. @ 2004-04-29 1:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton wrote: > "Brett E." <brettspamacct@fastclick.com> wrote: > >>>I see no swapout from the info which you sent. >> >> pgpgout/s gives the total number of blocks paged out to disk per second, >> it peaks at 13,000 and hovers around 3,000 per the attachment. > > > Nope. pgpgout is simply writes to disk, of all types. That is what is confusing me.. From the sar man page: pgpgin/s Total number of kilobytes the system paged in from disk per second. pgpgout/s Total number of kilobytes the system paged out to disk per second. > > swapout is accounted for under pswpout and your vmstat trace shows a little > bit of (healthy) swapout with swappiness=100 and negligible swapout with > swappiness=0. In both cases, negligible swapin. That's all just fine. > > >> Swapping out is good, but when that's coupled with swapping in as is the >> case on my side, it creates a thrashing situation where we swap out to >> disk pages which are being used, we then immediately swap those pages >> back in, etc etc.. > > > Look at your "si" column in vmstat. It's practically all zeroes. > > >> The usage pattern by the way is on a server which continuously hits a >> database and reads files so I don't know what "swappiness" should be set >> to exactly. Every hour or so it wants to untar tarballs and by then the >> cache is large. From here, the system swaps in and out more while cache >> decreases. Basically, it should do what I believe Solaris does... simply >> reclaim cache and not swap. Capping cache would be good too but the >> best solution IMO is to simply reclaim the cache on an as-needed basis >> before thinking about swapping. > > > swappiness=100: swaps a lot. swappiness=0: doesn't swap much. > > With a funny workload like that you might choose to set swappiness to 0 > just around the hourly tar operation, but as the machine seems to not be > swapping there doesn't seem to be a need. Yeah, it wouldn't help if paging isn't the problem. I'd like more clarificaton on sar before I throw out paging being the culprit. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 1:29 ` Brett E. @ 2004-04-29 18:05 ` Brett E. 2004-04-29 18:32 ` William Lee Irwin III 0 siblings, 1 reply; 128+ messages in thread From: Brett E. @ 2004-04-29 18:05 UTC (permalink / raw) To: brettspamacct; +Cc: Andrew Morton, linux-kernel Brett E. wrote: > Andrew Morton wrote: > >> "Brett E." <brettspamacct@fastclick.com> wrote: >> >>>> I see no swapout from the info which you sent. >>> >>> >>> pgpgout/s gives the total number of blocks paged out to disk per >>> second, it peaks at 13,000 and hovers around 3,000 per the attachment. >> >> >> >> Nope. pgpgout is simply writes to disk, of all types. > > That is what is confusing me.. From the sar man page: > > pgpgin/s > Total number of kilobytes the system paged in from disk per second. > > pgpgout/s > Total number of kilobytes the system paged out to disk per second. > > Anyone know what I should believe? Sar's pgpgin/s and pgpgout/s tell me that it is paging in/out from/to disk. Yet pswpin/s and pswpout/s are both 0. Swapping and paging are the same thing I believe. pgpgin/out refer to paging, pswpin/out refer to swapping. So I for one am confused. I guess I could dig through the source but I figured someone might have encountered this disrepency in the past. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 18:05 ` Brett E. @ 2004-04-29 18:32 ` William Lee Irwin III 2004-04-29 20:47 ` Brett E. 0 siblings, 1 reply; 128+ messages in thread From: William Lee Irwin III @ 2004-04-29 18:32 UTC (permalink / raw) To: Brett E.; +Cc: Andrew Morton, linux-kernel On Thu, Apr 29, 2004 at 11:05:42AM -0700, Brett E. wrote: > Anyone know what I should believe? Sar's pgpgin/s and pgpgout/s tell me > that it is paging in/out from/to disk. Yet pswpin/s and pswpout/s are > both 0. Swapping and paging are the same thing I believe. pgpgin/out > refer to paging, pswpin/out refer to swapping. So I for one am confused. > I guess I could dig through the source but I figured someone might have > encountered this disrepency in the past. Both are to be believed. They merely describe different things. Pagein/pageout are counts of VM-initiated IO, regardless of whether this IO is done on filesystem-backed pages or swap-backed pages. Pagein and pageout are used more generally to describe VM-initiated IO and don't exclusively refer to swap IO, but also include IO to filesystems to/from filesystem-backed memory. Swapin/swapout are counts of swap IO only, and are considered to apply only to IO done to swap files/devices to/from swap-backed anonymous memory. Pagein/pageout are both proper and necessary to have. In fact, you were requesting that filesystem IO be done preferentially to swap IO, and the pagein/pageout indicators showing IO while swapin/swapout indicators show none mean you are getting exactly what you asked for. -- wli ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 18:32 ` William Lee Irwin III @ 2004-04-29 20:47 ` Brett E. 0 siblings, 0 replies; 128+ messages in thread From: Brett E. @ 2004-04-29 20:47 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Andrew Morton, linux-kernel William Lee Irwin III wrote: > On Thu, Apr 29, 2004 at 11:05:42AM -0700, Brett E. wrote: > >>Anyone know what I should believe? Sar's pgpgin/s and pgpgout/s tell me >> that it is paging in/out from/to disk. Yet pswpin/s and pswpout/s are >>both 0. Swapping and paging are the same thing I believe. pgpgin/out >>refer to paging, pswpin/out refer to swapping. So I for one am confused. >>I guess I could dig through the source but I figured someone might have >>encountered this disrepency in the past. > > > Both are to be believed. They merely describe different things. > > Pagein/pageout are counts of VM-initiated IO, regardless of whether this > IO is done on filesystem-backed pages or swap-backed pages. Pagein and > pageout are used more generally to describe VM-initiated IO and don't > exclusively refer to swap IO, but also include IO to filesystems to/from > filesystem-backed memory. > > Swapin/swapout are counts of swap IO only, and are considered to apply > only to IO done to swap files/devices to/from swap-backed anonymous memory. > > Pagein/pageout are both proper and necessary to have. In fact, you were > requesting that filesystem IO be done preferentially to swap IO, and the > pagein/pageout indicators showing IO while swapin/swapout indicators show > none mean you are getting exactly what you asked for. > > Thanks I think it's clear now. In layman's terms, pgpgin/out relate to disk cache activity vs pswpin/out which relate to swap activity. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-28 21:27 ~500 megs cached yet 2.6.5 goes into swap hell Brett E. 2004-04-29 0:01 ` Andrew Morton @ 2004-04-29 0:04 ` Brett E. 2004-04-29 0:13 ` Jeff Garzik 2004-04-29 13:51 ` Horst von Brand 1 sibling, 2 replies; 128+ messages in thread From: Brett E. @ 2004-04-29 0:04 UTC (permalink / raw) To: brettspamacct; +Cc: linux-kernel mailing list Brett E. wrote: > Same thing happens on 2.4.18. > > I attached sar, slabinfo and /proc/meminfo data on the 2.6.5 machine. I > reproduce this behavior by simply untarring a 260meg file on a > production server, the machine becomes sluggish as it swaps to disk. Is > there a way to limit the cache so this machine, which has 1 gigabyte of > memory, doesn't dip into swap? > > Thanks, > > Brett > I created a hack which allocates memory causing cache to go down, then exits, freeing up the malloc'ed memory. This brings free memory up by 400 megs and brings the cache down to close to 0, of course the cache grows right afterwards. It would be nice to cap the cache datastructures in the kernel but I've been posting about this since September to no avail so my expectations are pretty low. Here's the code: #define ALLOC_SIZE 1024*1024 #define NUM_ALLOC 400 int main() { char* ptr; int i,j; for(i=0;i<NUM_ALLOC;i++) { ptr = (void*)malloc(ALLOC_SIZE); for(j=0;j<ALLOC_SIZE;j+=512) { ptr[j]=0; } } return 0; } ... Maybe I can make it a hack of all hacks and have it parse out "Cached" from /proc/meminfo and allocate that many bytes. ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:04 ` Brett E. @ 2004-04-29 0:13 ` Jeff Garzik 2004-04-29 0:43 ` Nick Piggin 2004-04-29 13:51 ` Horst von Brand 1 sibling, 1 reply; 128+ messages in thread From: Jeff Garzik @ 2004-04-29 0:13 UTC (permalink / raw) To: brettspamacct; +Cc: linux-kernel mailing list [-- Attachment #1: Type: text/plain, Size: 695 bytes --] Brett E. wrote: > exits, freeing up the malloc'ed memory. This brings free memory up by > 400 megs and brings the cache down to close to 0, of course the cache Yeah, I have something similar (attached). Run it like fillmem <number-of-megabytes> > grows right afterwards. It would be nice to cap the cache datastructures > in the kernel but I've been posting about this since September to no > avail so my expectations are pretty low. This is a frequent request... although I disagree with a hard cap on the cache, I think the request (and similar ones) should hopefully indicate to the VM gurus that the kernel likes cache better than anon VMAs that must be swapped out. Jeff [-- Attachment #2: fillmem.c --] [-- Type: text/plain, Size: 707 bytes --] #include <stdio.h> #include <stdlib.h> #include <time.h> #include <string.h> #define MEGS 140 #define MEG (1024 * 1024) int main (int argc, char *argv[]) { void **data; int i, r; size_t megs = MEGS; if ((argc >= 2) && (atoi(argv[1]) > 0)) megs = atoi(argv[1]); data = malloc (megs * sizeof (void*)); if (!data) abort(); memset (data, 0, megs * sizeof (void*)); srand(time(NULL)); for (i = 0; i < megs; i++) { data[i] = malloc(MEG); memset (data[i], i, MEG); printf("malloc/memset %03d/%03lu\n", i+1, megs); } for (i = megs - 1; i >= 0; i--) { r = rand() % 200; memset (data[i], r, MEG); printf("memset #2 %03d/%03lu = %d\n", i+1, megs, r); } printf("done\n"); return 0; } ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:13 ` Jeff Garzik @ 2004-04-29 0:43 ` Nick Piggin 0 siblings, 0 replies; 128+ messages in thread From: Nick Piggin @ 2004-04-29 0:43 UTC (permalink / raw) To: Jeff Garzik; +Cc: brettspamacct, linux-kernel mailing list Jeff Garzik wrote: > Brett E. wrote: > >> exits, freeing up the malloc'ed memory. This brings free memory up by >> 400 megs and brings the cache down to close to 0, of course the cache > > > Yeah, I have something similar (attached). Run it like > > fillmem <number-of-megabytes> > > >> grows right afterwards. It would be nice to cap the cache >> datastructures in the kernel but I've been posting about this since >> September to no avail so my expectations are pretty low. > > > This is a frequent request... although I disagree with a hard cap on > the cache, I think the request (and similar ones) should hopefully > indicate to the VM gurus that the kernel likes cache better than anon > VMAs that must be swapped out. > For 2.6.6-rc2-mm2: http://www.kerneltrap.org/~npiggin/vm-rollup.patch.gz /proc/sys/vm/mapped_page_cost - indicate which *you* like better ;) ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 0:04 ` Brett E. 2004-04-29 0:13 ` Jeff Garzik @ 2004-04-29 13:51 ` Horst von Brand 2004-04-29 18:32 ` Brett E. 1 sibling, 1 reply; 128+ messages in thread From: Horst von Brand @ 2004-04-29 13:51 UTC (permalink / raw) To: brettspamacct; +Cc: Linux Kernel Mailing List "Brett E." <brettspamacct@fastclick.com> said: [...] > I created a hack which allocates memory causing cache to go down, then > exits, freeing up the malloc'ed memory. This brings free memory up by > 400 megs and brings the cache down to close to 0, of course the cache > grows right afterwards. It would be nice to cap the cache datastructures > in the kernel but I've been posting about this since September to no > avail so my expectations are pretty low. Because it is complete nonsense. Keeping stuff around in RAM in case it is needed again, as long as RAM is not needed for anything else, is a mayor win. That is what cache is. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: ~500 megs cached yet 2.6.5 goes into swap hell 2004-04-29 13:51 ` Horst von Brand @ 2004-04-29 18:32 ` Brett E. 0 siblings, 0 replies; 128+ messages in thread From: Brett E. @ 2004-04-29 18:32 UTC (permalink / raw) To: Horst von Brand; +Cc: Linux Kernel Mailing List Horst von Brand wrote: > "Brett E." <brettspamacct@fastclick.com> said: > > [...] > > >>I created a hack which allocates memory causing cache to go down, then >>exits, freeing up the malloc'ed memory. This brings free memory up by >>400 megs and brings the cache down to close to 0, of course the cache >>grows right afterwards. It would be nice to cap the cache datastructures >>in the kernel but I've been posting about this since September to no >>avail so my expectations are pretty low. > > > Because it is complete nonsense. Keeping stuff around in RAM in case it > is needed again, as long as RAM is not needed for anything else, is a mayor > win. That is what cache is. The key phrase in your post is "as long as RAM is not needed for anything else." My assertion was that this is not the case and it seems to favor cache over pages being used. Sar shows heavy paging to/from disk even though 500 megs are reported in cache. I hope I don't need to go into what paging in/out in succession means regarding paging out pages which we will need shortly after they are paged out. Sar also reports no swapping, hence the need to figure out why there is a disprepency before continuing. ^ permalink raw reply [flat|nested] 128+ messages in thread
end of thread, other threads:[~2004-05-17 20:16 UTC | newest] Thread overview: 128+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-04-28 21:27 ~500 megs cached yet 2.6.5 goes into swap hell Brett E. 2004-04-29 0:01 ` Andrew Morton 2004-04-29 0:10 ` Jeff Garzik 2004-04-29 0:21 ` Nick Piggin 2004-04-29 0:50 ` Wakko Warner 2004-04-29 0:53 ` Jeff Garzik 2004-04-29 0:54 ` Nick Piggin 2004-04-29 1:51 ` Tim Connors 2004-04-29 21:45 ` Denis Vlasenko 2004-04-29 0:58 ` Marc Singer 2004-04-29 3:48 ` Nick Piggin 2004-04-29 4:20 ` Marc Singer 2004-04-29 4:26 ` Nick Piggin 2004-04-29 14:49 ` Marc Singer 2004-04-30 4:08 ` Nick Piggin 2004-04-30 22:31 ` Marc Singer 2004-04-29 6:38 ` William Lee Irwin III 2004-04-29 7:36 ` Russell King 2004-04-29 10:44 ` Nick Piggin 2004-04-29 11:04 ` Russell King 2004-04-29 14:52 ` Marc Singer 2004-04-29 20:01 ` Horst von Brand 2004-04-29 20:18 ` Martin J. Bligh 2004-04-29 20:33 ` David B. Stevens 2004-04-29 22:42 ` Steve Youngs 2004-04-29 20:36 ` Paul Jackson 2004-04-29 21:19 ` Andrew Morton 2004-04-29 21:34 ` Paul Jackson 2004-04-29 21:57 ` Andrew Morton 2004-04-29 22:18 ` Paul Jackson 2004-04-30 0:04 ` Andy Isaacson 2004-04-30 0:32 ` Andrew Morton 2004-04-30 0:54 ` Paul Jackson 2004-04-30 5:38 ` Andy Isaacson 2004-04-30 6:00 ` Nick Piggin 2004-04-30 7:52 ` Jeff Garzik 2004-04-30 8:02 ` Andrew Morton 2004-04-30 8:09 ` Jeff Garzik 2004-05-06 13:08 ` Pavel Machek 2004-05-07 15:53 ` Hugh Dickins 2004-05-07 16:57 ` Pavel Machek 2004-05-07 17:30 ` Timothy Miller 2004-05-07 17:43 ` Hugh Dickins 2004-05-07 17:48 ` Mark Frazer 2004-05-12 17:52 ` Rob Landley 2004-05-17 20:16 ` Hugh Dickins 2004-04-29 21:38 ` Timothy Miller 2004-04-29 21:47 ` Paul Jackson 2004-04-29 22:18 ` Timothy Miller 2004-04-29 22:46 ` Paul Jackson 2004-04-29 23:08 ` Timothy Miller 2004-04-30 12:31 ` Bart Samwel 2004-04-30 15:35 ` Clay Haapala 2004-04-30 15:44 ` Bart Samwel 2004-04-30 22:11 ` Paul Jackson 2004-04-30 3:37 ` Tim Connors 2004-04-30 5:15 ` Nick Piggin 2004-04-30 6:20 ` Tim Connors 2004-04-30 6:34 ` Nick Piggin 2004-04-30 7:05 ` Tim Connors 2004-04-30 7:15 ` Nick Piggin 2004-04-30 9:18 ` Re[2]: " vda 2004-04-30 9:33 ` Arjan van de Ven 2004-04-30 11:33 ` Denis Vlasenko 2004-04-30 16:19 ` Timothy Miller 2004-04-29 0:49 ` Brett E. 2004-04-29 1:00 ` Andrew Morton 2004-04-29 1:24 ` Jeff Garzik 2004-04-29 1:40 ` Andrew Morton 2004-04-29 1:47 ` Rik van Riel 2004-04-29 18:14 ` Adam Kropelin 2004-04-30 3:17 ` Tim Connors 2004-04-29 2:19 ` Tim Connors 2004-04-29 16:24 ` Martin J. Bligh 2004-04-29 16:36 ` Chris Friesen 2004-04-29 16:56 ` Martin J. Bligh 2004-04-29 1:30 ` Paul Mackerras 2004-04-29 1:31 ` Paul Mackerras 2004-04-29 1:53 ` Andrew Morton 2004-04-29 2:40 ` Andrew Morton 2004-04-29 2:58 ` Paul Mackerras 2004-04-29 3:09 ` Andrew Morton 2004-04-29 3:14 ` William Lee Irwin III 2004-04-29 6:12 ` Benjamin Herrenschmidt 2004-04-29 6:22 ` Andrew Morton 2004-04-29 6:25 ` Benjamin Herrenschmidt 2004-04-29 6:31 ` William Lee Irwin III 2004-04-29 16:50 ` Martin J. Bligh 2004-04-29 3:57 ` Nick Piggin 2004-04-29 14:29 ` Rik van Riel 2004-04-30 3:00 ` Nick Piggin 2004-04-30 12:50 ` Rik van Riel 2004-04-30 13:07 ` Nick Piggin 2004-04-30 13:18 ` Nikita Danilov 2004-04-30 13:39 ` Nick Piggin 2004-04-29 1:46 ` Rik van Riel 2004-04-29 1:57 ` Andrew Morton 2004-04-29 2:29 ` Marc Singer 2004-04-29 2:35 ` Andrew Morton 2004-04-29 3:10 ` Marc Singer 2004-04-29 3:19 ` Andrew Morton 2004-04-29 4:13 ` Marc Singer 2004-04-29 4:33 ` Andrew Morton 2004-04-29 14:45 ` Marc Singer 2004-04-29 16:51 ` Andy Isaacson 2004-04-29 20:42 ` Andrew Morton 2004-04-29 22:27 ` Andy Isaacson 2004-04-29 23:19 ` Andrew Morton 2004-04-30 0:14 ` Lincoln Dale 2004-04-29 8:02 ` Wichert Akkerman 2004-04-29 14:25 ` Marcelo Tosatti 2004-04-29 14:27 ` Wichert Akkerman 2004-04-29 2:41 ` Rik van Riel 2004-04-29 2:43 ` Andrew Morton 2004-04-29 1:41 ` Tim Connors 2004-04-29 9:43 ` Helge Hafting 2004-04-29 14:48 ` Marc Singer 2004-04-29 0:44 ` Brett E. 2004-04-29 1:13 ` Andrew Morton 2004-04-29 1:29 ` Brett E. 2004-04-29 18:05 ` Brett E. 2004-04-29 18:32 ` William Lee Irwin III 2004-04-29 20:47 ` Brett E. 2004-04-29 0:04 ` Brett E. 2004-04-29 0:13 ` Jeff Garzik 2004-04-29 0:43 ` Nick Piggin 2004-04-29 13:51 ` Horst von Brand 2004-04-29 18:32 ` Brett E.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox