public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ken Brownfield <brownfld@irridia.com>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: linux-kernel@vger.kernel.org, Andrea Arcangeli <andrea@suse.de>
Subject: Re: [VM] 2.4.14/15-pre4 too "swap-happy"?
Date: Mon, 19 Nov 2001 21:09:41 -0600	[thread overview]
Message-ID: <20011119210941.C10597@asooo.flowerfire.com> (raw)
In-Reply-To: <20011119173935.A10597@asooo.flowerfire.com> <Pine.LNX.4.33.0111191543390.19585-200000@penguin.transmeta.com>
In-Reply-To: <Pine.LNX.4.33.0111191543390.19585-200000@penguin.transmeta.com>; from torvalds@transmeta.com on Mon, Nov 19, 2001 at 03:52:44PM -0800

Well, I think you'll be pleased to hear that your untested patch
compiled, booted, _and_ fixed the problem. :)

The minimum free RAM was about 9.8-11MB (matching your guestimate) and
kswapd seemed to behave the same as the watermark patch.  The results of
top were basically the same, so I'm omitting it.

However, I do have some profiling numbers, thanks to Marcelo.  Attached
are numbers from "readprofile | sort -nr +2 | head -20".  I think the
pre4 numbers point to shrink_cache, prune_icache, and statm_pgd_range.
The other two might have significance for wizards, but statistically
don't stand out to me, except maybe statm_pgd_range.

I reset the counters just before starting Oracle and the stress test.  I
think a -pre7 with a blessed patch would be good, since my testing was
very narrow.

I'll test new kernels as I hear new info.

Thanks much!
-- 
Ken.
brownfld@irridia.com


2.4.15-pre4 with your original patch:
(shorter time period since the machine went to hell fast)
(matches vanilla behaviour)

164536 default_idle                             3164.1538
101562 shrink_cache                             113.8587
  3683 prune_icache                              13.5404
  3034 file_read_actor                           12.2339
   914 DAC960_BA_InterruptHandler                 5.5732
  1128 statm_pgd_range                            2.9072
    40 page_cache_release                         0.8333
    31 add_page_to_hash_queue                     0.5167
    89 page_cache_read                            0.4363
    25 remove_inode_page                          0.4167
    26 unlock_page                                0.3095
   509 __make_request                             0.3008
    66 smp_call_function                          0.2946
    21 set_bh_page                                0.2917
     9 __brelse                                   0.2812
    90 try_to_free_buffers                        0.2778
    13 mark_page_accessed                         0.2708
     8 __free_pages                               0.2500
    43 get_hash_table                             0.2443
    42 activate_page                              0.2234

2.4.15-pre6 with watermark patch:

1617446 default_idle                             31104.7308
 27599 DAC960_BA_InterruptHandler               168.2866
 38918 file_read_actor                          156.9274
   528 page_cache_release                        11.0000
   554 add_page_to_hash_queue                     9.2333
 15487 __make_request                             9.1531
  3453 statm_pgd_range                            8.8995
   514 remove_inode_page                          8.5667
  1453 blk_init_free_list                         7.2650
   377 set_bh_page                                5.2361
   898 page_cache_read                            4.4020
   590 add_to_page_cache_unique                   4.3382
   136 __brelse                                   4.2500
  1120 kmem_cache_alloc                           3.8356
   628 kunmap_high                                3.7381
  1189 try_to_free_buffers                        3.6698
   625 get_hash_table                             3.5511
   439 lru_cache_add                              3.4297
  1715 rmqueue                                    3.0194
   105 remove_wait_queue                          2.9167

2.4.15-pre6 with Linus patch:

1249875 default_idle                             24036.0577
 65324 file_read_actor                          263.4032
 36979 DAC960_BA_InterruptHandler               225.4817
  9809 statm_pgd_range                           25.2809
  1039 page_cache_release                        21.6458
   994 add_page_to_hash_queue                    16.5667
   922 remove_inode_page                         15.3667
  2409 blk_init_free_list                        12.0450
 20159 __make_request                            11.9143
  1198 lru_cache_add                              9.3594
  1628 page_cache_read                            7.9804
   987 add_to_page_cache_unique                   7.2574
  2202 try_to_free_buffers                        6.7963
  1038 get_unused_buffer_head                     6.6538
   484 unlock_page                                5.7619
  3182 rmqueue                                    5.6021
   874 kunmap_high                                5.2024
   164 __brelse                                   5.1250
   900 get_hash_table                             5.1136
   357 set_bh_page                                4.9583


On Mon, Nov 19, 2001 at 03:52:44PM -0800, Linus Torvalds wrote:
| 
| On Mon, 19 Nov 2001, Ken Brownfield wrote:
| >
| > I went straight to the aa patch, and it looks like it either fixes the
| > problem or (because of the side-effects Linus mentioned) otherwise
| > prevents the issue:
| 
| So is this pre6aa1, or pre6 + just the watermark patch?
| 
| > The machine went into swap immediately when the page cache stopped
| > growing and hovered at 100-400MB.  Also, in my experience the page cache
| > will grow until there's only 5ishMB of free RAM, but with the aa patch
| > it looks like it stops at 320MB or maybe 10% of RAM.  Was that the aa
| > patch, or part of -pre6?
| 
| That was the watermarking. The way Andrea did it, the page cache will
| basically refuse to touch as much of the "normal" page zone, because it
| would prefer to allocate more from highmem..
| 
| I think it's excessive to have 320MB free memory, though, that's just
| an insane waste. I suspect that the real number should be somewhere
| between the old behaviour and the new one. You can tweak the behaviour of
| andrea's kernel by changing the "reserved" page numbers, but I'd like to
| hear whether my simpler approach works too..
| 
| > The Oracle SGA is set to ~522MB, with nothing else running except a
| > couple of sshds, getty, etc.  Now that I'm looking, 2.8GB page cache
| > plus 328MB free adds up to about 3.1GB of RAM -- where does the 512MB
| > shared memory segment fit?  Is it being swapped out in deference to page
| > cache?
| 
| Shared memory actually uses the page cache too, so it will be accounted
| for in the 2.8GB number.
| 
| Anyway, can you try plain vanilla pre6, with the appended patch? This is
| my suggested simplified version of what Andrea tried to do, and it should
| try to keep only a few extra megs of memory free in the low memory
| regions, not 300+ MB.
| 
| (and the profiling would be interesting regardless, but I think Andrea did
| find the real problem, his fix just seems a bit of an overkill ;)
| 
| 		Linus

| diff -u --recursive --new-file pre6/linux/mm/page_alloc.c linux/mm/page_alloc.c
| --- pre6/linux/mm/page_alloc.c	Sat Nov 17 19:07:43 2001
| +++ linux/mm/page_alloc.c	Mon Nov 19 15:13:36 2001
| @@ -299,29 +299,26 @@
|  	return page;
|  }
|  
| -static inline unsigned long zone_free_pages(zone_t * zone, unsigned int order)
| -{
| -	long free = zone->free_pages - (1UL << order);
| -	return free >= 0 ? free : 0;
| -}
| -
|  /*
|   * This is the 'heart' of the zoned buddy allocator:
|   */
|  struct page * __alloc_pages(unsigned int gfp_mask, unsigned int order, zonelist_t *zonelist)
|  {
| +	unsigned long min;
|  	zone_t **zone, * classzone;
|  	struct page * page;
|  	int freed;
|  
|  	zone = zonelist->zones;
|  	classzone = *zone;
| +	min = 1UL << order;
|  	for (;;) {
|  		zone_t *z = *(zone++);
|  		if (!z)
|  			break;
|  
| -		if (zone_free_pages(z, order) > z->pages_low) {
| +		min += z->pages_low;
| +		if (z->free_pages > min) {
|  			page = rmqueue(z, order);
|  			if (page)
|  				return page;
| @@ -334,16 +331,18 @@
|  		wake_up_interruptible(&kswapd_wait);
|  
|  	zone = zonelist->zones;
| +	min = 1UL << order;
|  	for (;;) {
| -		unsigned long min;
| +		unsigned long local_min;
|  		zone_t *z = *(zone++);
|  		if (!z)
|  			break;
|  
| -		min = z->pages_min;
| +		local_min = z->pages_min;
|  		if (!(gfp_mask & __GFP_WAIT))
| -			min >>= 2;
| -		if (zone_free_pages(z, order) > min) {
| +			local_min >>= 2;
| +		min += local_min;
| +		if (z->free_pages > min) {
|  			page = rmqueue(z, order);
|  			if (page)
|  				return page;
| @@ -376,12 +375,14 @@
|  		return page;
|  
|  	zone = zonelist->zones;
| +	min = 1UL << order;
|  	for (;;) {
|  		zone_t *z = *(zone++);
|  		if (!z)
|  			break;
|  
| -		if (zone_free_pages(z, order) > z->pages_min) {
| +		min += z->pages_min;
| +		if (z->free_pages > min) {
|  			page = rmqueue(z, order);
|  			if (page)
|  				return page;


  parent reply	other threads:[~2001-11-20  3:10 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200111191801.fAJI1l922388@neosilicon.transmeta.com>
2001-11-19 18:07 ` [VM] 2.4.14/15-pre4 too "swap-happy"? Linus Torvalds
2001-11-19 18:31   ` Ken Brownfield
2001-11-19 19:23     ` Linus Torvalds
2001-11-19 23:39       ` Ken Brownfield
2001-11-19 23:52         ` Linus Torvalds
2001-11-20  0:18           ` M. Edward (Ed) Borasky
2001-11-20  0:25           ` Ken Brownfield
2001-11-20  0:31             ` Linus Torvalds
2001-11-20  3:09           ` Ken Brownfield [this message]
2001-11-20  3:30             ` Linus Torvalds
2001-11-20  3:32             ` Andrea Arcangeli
2001-11-20  5:54               ` Ken Brownfield
2001-11-20  6:50                 ` Linus Torvalds
2001-12-01 13:15                   ` Slight Return (was Re: [VM] 2.4.14/15-pre4 too "swap-happy"?) Ken Brownfield
2001-12-08 13:12                     ` Ken Brownfield
2001-12-09 18:51                       ` Marcelo Tosatti
2001-12-10  6:56                         ` Ken Brownfield
2001-11-19 19:30     ` [VM] 2.4.14/15-pre4 too "swap-happy"? Ken Brownfield
2001-11-19 18:26       ` Marcelo Tosatti
2001-11-19 19:44   ` Slo Mo Snail
2001-11-20  0:48 Yan, Noah
  -- strict thread matches above, loose matches on Subject: below --
2001-11-15  8:38 janne
2001-11-15  9:05 ` janne
2001-11-15 17:44   ` Mike Galbraith
2001-11-16  0:14     ` janne
     [not found] <200111141243.fAEChS915731@neosilicon.transmeta.com>
2001-11-14 16:34 ` Linus Torvalds
2001-11-19 18:01   ` Sebastian Dröge
2001-11-19 18:18   ` Simon Kirby
2001-11-14 12:44 Sebastian Dröge
2001-11-14 15:00 ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20011119210941.C10597@asooo.flowerfire.com \
    --to=brownfld@irridia.com \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox