From: Ken Brownfield <brownfld@irridia.com>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: linux-kernel@vger.kernel.org, Andrea Arcangeli <andrea@suse.de>
Subject: Re: [VM] 2.4.14/15-pre4 too "swap-happy"?
Date: Mon, 19 Nov 2001 21:09:41 -0600 [thread overview]
Message-ID: <20011119210941.C10597@asooo.flowerfire.com> (raw)
In-Reply-To: <20011119173935.A10597@asooo.flowerfire.com> <Pine.LNX.4.33.0111191543390.19585-200000@penguin.transmeta.com>
In-Reply-To: <Pine.LNX.4.33.0111191543390.19585-200000@penguin.transmeta.com>; from torvalds@transmeta.com on Mon, Nov 19, 2001 at 03:52:44PM -0800
Well, I think you'll be pleased to hear that your untested patch
compiled, booted, _and_ fixed the problem. :)
The minimum free RAM was about 9.8-11MB (matching your guestimate) and
kswapd seemed to behave the same as the watermark patch. The results of
top were basically the same, so I'm omitting it.
However, I do have some profiling numbers, thanks to Marcelo. Attached
are numbers from "readprofile | sort -nr +2 | head -20". I think the
pre4 numbers point to shrink_cache, prune_icache, and statm_pgd_range.
The other two might have significance for wizards, but statistically
don't stand out to me, except maybe statm_pgd_range.
I reset the counters just before starting Oracle and the stress test. I
think a -pre7 with a blessed patch would be good, since my testing was
very narrow.
I'll test new kernels as I hear new info.
Thanks much!
--
Ken.
brownfld@irridia.com
2.4.15-pre4 with your original patch:
(shorter time period since the machine went to hell fast)
(matches vanilla behaviour)
164536 default_idle 3164.1538
101562 shrink_cache 113.8587
3683 prune_icache 13.5404
3034 file_read_actor 12.2339
914 DAC960_BA_InterruptHandler 5.5732
1128 statm_pgd_range 2.9072
40 page_cache_release 0.8333
31 add_page_to_hash_queue 0.5167
89 page_cache_read 0.4363
25 remove_inode_page 0.4167
26 unlock_page 0.3095
509 __make_request 0.3008
66 smp_call_function 0.2946
21 set_bh_page 0.2917
9 __brelse 0.2812
90 try_to_free_buffers 0.2778
13 mark_page_accessed 0.2708
8 __free_pages 0.2500
43 get_hash_table 0.2443
42 activate_page 0.2234
2.4.15-pre6 with watermark patch:
1617446 default_idle 31104.7308
27599 DAC960_BA_InterruptHandler 168.2866
38918 file_read_actor 156.9274
528 page_cache_release 11.0000
554 add_page_to_hash_queue 9.2333
15487 __make_request 9.1531
3453 statm_pgd_range 8.8995
514 remove_inode_page 8.5667
1453 blk_init_free_list 7.2650
377 set_bh_page 5.2361
898 page_cache_read 4.4020
590 add_to_page_cache_unique 4.3382
136 __brelse 4.2500
1120 kmem_cache_alloc 3.8356
628 kunmap_high 3.7381
1189 try_to_free_buffers 3.6698
625 get_hash_table 3.5511
439 lru_cache_add 3.4297
1715 rmqueue 3.0194
105 remove_wait_queue 2.9167
2.4.15-pre6 with Linus patch:
1249875 default_idle 24036.0577
65324 file_read_actor 263.4032
36979 DAC960_BA_InterruptHandler 225.4817
9809 statm_pgd_range 25.2809
1039 page_cache_release 21.6458
994 add_page_to_hash_queue 16.5667
922 remove_inode_page 15.3667
2409 blk_init_free_list 12.0450
20159 __make_request 11.9143
1198 lru_cache_add 9.3594
1628 page_cache_read 7.9804
987 add_to_page_cache_unique 7.2574
2202 try_to_free_buffers 6.7963
1038 get_unused_buffer_head 6.6538
484 unlock_page 5.7619
3182 rmqueue 5.6021
874 kunmap_high 5.2024
164 __brelse 5.1250
900 get_hash_table 5.1136
357 set_bh_page 4.9583
On Mon, Nov 19, 2001 at 03:52:44PM -0800, Linus Torvalds wrote:
|
| On Mon, 19 Nov 2001, Ken Brownfield wrote:
| >
| > I went straight to the aa patch, and it looks like it either fixes the
| > problem or (because of the side-effects Linus mentioned) otherwise
| > prevents the issue:
|
| So is this pre6aa1, or pre6 + just the watermark patch?
|
| > The machine went into swap immediately when the page cache stopped
| > growing and hovered at 100-400MB. Also, in my experience the page cache
| > will grow until there's only 5ishMB of free RAM, but with the aa patch
| > it looks like it stops at 320MB or maybe 10% of RAM. Was that the aa
| > patch, or part of -pre6?
|
| That was the watermarking. The way Andrea did it, the page cache will
| basically refuse to touch as much of the "normal" page zone, because it
| would prefer to allocate more from highmem..
|
| I think it's excessive to have 320MB free memory, though, that's just
| an insane waste. I suspect that the real number should be somewhere
| between the old behaviour and the new one. You can tweak the behaviour of
| andrea's kernel by changing the "reserved" page numbers, but I'd like to
| hear whether my simpler approach works too..
|
| > The Oracle SGA is set to ~522MB, with nothing else running except a
| > couple of sshds, getty, etc. Now that I'm looking, 2.8GB page cache
| > plus 328MB free adds up to about 3.1GB of RAM -- where does the 512MB
| > shared memory segment fit? Is it being swapped out in deference to page
| > cache?
|
| Shared memory actually uses the page cache too, so it will be accounted
| for in the 2.8GB number.
|
| Anyway, can you try plain vanilla pre6, with the appended patch? This is
| my suggested simplified version of what Andrea tried to do, and it should
| try to keep only a few extra megs of memory free in the low memory
| regions, not 300+ MB.
|
| (and the profiling would be interesting regardless, but I think Andrea did
| find the real problem, his fix just seems a bit of an overkill ;)
|
| Linus
| diff -u --recursive --new-file pre6/linux/mm/page_alloc.c linux/mm/page_alloc.c
| --- pre6/linux/mm/page_alloc.c Sat Nov 17 19:07:43 2001
| +++ linux/mm/page_alloc.c Mon Nov 19 15:13:36 2001
| @@ -299,29 +299,26 @@
| return page;
| }
|
| -static inline unsigned long zone_free_pages(zone_t * zone, unsigned int order)
| -{
| - long free = zone->free_pages - (1UL << order);
| - return free >= 0 ? free : 0;
| -}
| -
| /*
| * This is the 'heart' of the zoned buddy allocator:
| */
| struct page * __alloc_pages(unsigned int gfp_mask, unsigned int order, zonelist_t *zonelist)
| {
| + unsigned long min;
| zone_t **zone, * classzone;
| struct page * page;
| int freed;
|
| zone = zonelist->zones;
| classzone = *zone;
| + min = 1UL << order;
| for (;;) {
| zone_t *z = *(zone++);
| if (!z)
| break;
|
| - if (zone_free_pages(z, order) > z->pages_low) {
| + min += z->pages_low;
| + if (z->free_pages > min) {
| page = rmqueue(z, order);
| if (page)
| return page;
| @@ -334,16 +331,18 @@
| wake_up_interruptible(&kswapd_wait);
|
| zone = zonelist->zones;
| + min = 1UL << order;
| for (;;) {
| - unsigned long min;
| + unsigned long local_min;
| zone_t *z = *(zone++);
| if (!z)
| break;
|
| - min = z->pages_min;
| + local_min = z->pages_min;
| if (!(gfp_mask & __GFP_WAIT))
| - min >>= 2;
| - if (zone_free_pages(z, order) > min) {
| + local_min >>= 2;
| + min += local_min;
| + if (z->free_pages > min) {
| page = rmqueue(z, order);
| if (page)
| return page;
| @@ -376,12 +375,14 @@
| return page;
|
| zone = zonelist->zones;
| + min = 1UL << order;
| for (;;) {
| zone_t *z = *(zone++);
| if (!z)
| break;
|
| - if (zone_free_pages(z, order) > z->pages_min) {
| + min += z->pages_min;
| + if (z->free_pages > min) {
| page = rmqueue(z, order);
| if (page)
| return page;
next prev parent reply other threads:[~2001-11-20 3:10 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200111191801.fAJI1l922388@neosilicon.transmeta.com>
2001-11-19 18:07 ` [VM] 2.4.14/15-pre4 too "swap-happy"? Linus Torvalds
2001-11-19 18:31 ` Ken Brownfield
2001-11-19 19:23 ` Linus Torvalds
2001-11-19 23:39 ` Ken Brownfield
2001-11-19 23:52 ` Linus Torvalds
2001-11-20 0:18 ` M. Edward (Ed) Borasky
2001-11-20 0:25 ` Ken Brownfield
2001-11-20 0:31 ` Linus Torvalds
2001-11-20 3:09 ` Ken Brownfield [this message]
2001-11-20 3:30 ` Linus Torvalds
2001-11-20 3:32 ` Andrea Arcangeli
2001-11-20 5:54 ` Ken Brownfield
2001-11-20 6:50 ` Linus Torvalds
2001-12-01 13:15 ` Slight Return (was Re: [VM] 2.4.14/15-pre4 too "swap-happy"?) Ken Brownfield
2001-12-08 13:12 ` Ken Brownfield
2001-12-09 18:51 ` Marcelo Tosatti
2001-12-10 6:56 ` Ken Brownfield
2001-11-19 19:30 ` [VM] 2.4.14/15-pre4 too "swap-happy"? Ken Brownfield
2001-11-19 18:26 ` Marcelo Tosatti
2001-11-19 19:44 ` Slo Mo Snail
2001-11-20 0:48 Yan, Noah
-- strict thread matches above, loose matches on Subject: below --
2001-11-15 8:38 janne
2001-11-15 9:05 ` janne
2001-11-15 17:44 ` Mike Galbraith
2001-11-16 0:14 ` janne
[not found] <200111141243.fAEChS915731@neosilicon.transmeta.com>
2001-11-14 16:34 ` Linus Torvalds
2001-11-19 18:01 ` Sebastian Dröge
2001-11-19 18:18 ` Simon Kirby
2001-11-14 12:44 Sebastian Dröge
2001-11-14 15:00 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20011119210941.C10597@asooo.flowerfire.com \
--to=brownfld@irridia.com \
--cc=andrea@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.