All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ken Brownfield <brownfld@irridia.com>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: linux-kernel@vger.kernel.org, Andrea Arcangeli <andrea@suse.de>
Subject: Re: [VM] 2.4.14/15-pre4 too "swap-happy"?
Date: Mon, 19 Nov 2001 21:09:41 -0600	[thread overview]
Message-ID: <20011119210941.C10597@asooo.flowerfire.com> (raw)
In-Reply-To: <20011119173935.A10597@asooo.flowerfire.com> <Pine.LNX.4.33.0111191543390.19585-200000@penguin.transmeta.com>
In-Reply-To: <Pine.LNX.4.33.0111191543390.19585-200000@penguin.transmeta.com>; from torvalds@transmeta.com on Mon, Nov 19, 2001 at 03:52:44PM -0800

Well, I think you'll be pleased to hear that your untested patch
compiled, booted, _and_ fixed the problem. :)

The minimum free RAM was about 9.8-11MB (matching your guestimate) and
kswapd seemed to behave the same as the watermark patch.  The results of
top were basically the same, so I'm omitting it.

However, I do have some profiling numbers, thanks to Marcelo.  Attached
are numbers from "readprofile | sort -nr +2 | head -20".  I think the
pre4 numbers point to shrink_cache, prune_icache, and statm_pgd_range.
The other two might have significance for wizards, but statistically
don't stand out to me, except maybe statm_pgd_range.

I reset the counters just before starting Oracle and the stress test.  I
think a -pre7 with a blessed patch would be good, since my testing was
very narrow.

I'll test new kernels as I hear new info.

Thanks much!
-- 
Ken.
brownfld@irridia.com


2.4.15-pre4 with your original patch:
(shorter time period since the machine went to hell fast)
(matches vanilla behaviour)

164536 default_idle                             3164.1538
101562 shrink_cache                             113.8587
  3683 prune_icache                              13.5404
  3034 file_read_actor                           12.2339
   914 DAC960_BA_InterruptHandler                 5.5732
  1128 statm_pgd_range                            2.9072
    40 page_cache_release                         0.8333
    31 add_page_to_hash_queue                     0.5167
    89 page_cache_read                            0.4363
    25 remove_inode_page                          0.4167
    26 unlock_page                                0.3095
   509 __make_request                             0.3008
    66 smp_call_function                          0.2946
    21 set_bh_page                                0.2917
     9 __brelse                                   0.2812
    90 try_to_free_buffers                        0.2778
    13 mark_page_accessed                         0.2708
     8 __free_pages                               0.2500
    43 get_hash_table                             0.2443
    42 activate_page                              0.2234

2.4.15-pre6 with watermark patch:

1617446 default_idle                             31104.7308
 27599 DAC960_BA_InterruptHandler               168.2866
 38918 file_read_actor                          156.9274
   528 page_cache_release                        11.0000
   554 add_page_to_hash_queue                     9.2333
 15487 __make_request                             9.1531
  3453 statm_pgd_range                            8.8995
   514 remove_inode_page                          8.5667
  1453 blk_init_free_list                         7.2650
   377 set_bh_page                                5.2361
   898 page_cache_read                            4.4020
   590 add_to_page_cache_unique                   4.3382
   136 __brelse                                   4.2500
  1120 kmem_cache_alloc                           3.8356
   628 kunmap_high                                3.7381
  1189 try_to_free_buffers                        3.6698
   625 get_hash_table                             3.5511
   439 lru_cache_add                              3.4297
  1715 rmqueue                                    3.0194
   105 remove_wait_queue                          2.9167

2.4.15-pre6 with Linus patch:

1249875 default_idle                             24036.0577
 65324 file_read_actor                          263.4032
 36979 DAC960_BA_InterruptHandler               225.4817
  9809 statm_pgd_range                           25.2809
  1039 page_cache_release                        21.6458
   994 add_page_to_hash_queue                    16.5667
   922 remove_inode_page                         15.3667
  2409 blk_init_free_list                        12.0450
 20159 __make_request                            11.9143
  1198 lru_cache_add                              9.3594
  1628 page_cache_read                            7.9804
   987 add_to_page_cache_unique                   7.2574
  2202 try_to_free_buffers                        6.7963
  1038 get_unused_buffer_head                     6.6538
   484 unlock_page                                5.7619
  3182 rmqueue                                    5.6021
   874 kunmap_high                                5.2024
   164 __brelse                                   5.1250
   900 get_hash_table                             5.1136
   357 set_bh_page                                4.9583


On Mon, Nov 19, 2001 at 03:52:44PM -0800, Linus Torvalds wrote:
| 
| On Mon, 19 Nov 2001, Ken Brownfield wrote:
| >
| > I went straight to the aa patch, and it looks like it either fixes the
| > problem or (because of the side-effects Linus mentioned) otherwise
| > prevents the issue:
| 
| So is this pre6aa1, or pre6 + just the watermark patch?
| 
| > The machine went into swap immediately when the page cache stopped
| > growing and hovered at 100-400MB.  Also, in my experience the page cache
| > will grow until there's only 5ishMB of free RAM, but with the aa patch
| > it looks like it stops at 320MB or maybe 10% of RAM.  Was that the aa
| > patch, or part of -pre6?
| 
| That was the watermarking. The way Andrea did it, the page cache will
| basically refuse to touch as much of the "normal" page zone, because it
| would prefer to allocate more from highmem..
| 
| I think it's excessive to have 320MB free memory, though, that's just
| an insane waste. I suspect that the real number should be somewhere
| between the old behaviour and the new one. You can tweak the behaviour of
| andrea's kernel by changing the "reserved" page numbers, but I'd like to
| hear whether my simpler approach works too..
| 
| > The Oracle SGA is set to ~522MB, with nothing else running except a
| > couple of sshds, getty, etc.  Now that I'm looking, 2.8GB page cache
| > plus 328MB free adds up to about 3.1GB of RAM -- where does the 512MB
| > shared memory segment fit?  Is it being swapped out in deference to page
| > cache?
| 
| Shared memory actually uses the page cache too, so it will be accounted
| for in the 2.8GB number.
| 
| Anyway, can you try plain vanilla pre6, with the appended patch? This is
| my suggested simplified version of what Andrea tried to do, and it should
| try to keep only a few extra megs of memory free in the low memory
| regions, not 300+ MB.
| 
| (and the profiling would be interesting regardless, but I think Andrea did
| find the real problem, his fix just seems a bit of an overkill ;)
| 
| 		Linus

| diff -u --recursive --new-file pre6/linux/mm/page_alloc.c linux/mm/page_alloc.c
| --- pre6/linux/mm/page_alloc.c	Sat Nov 17 19:07:43 2001
| +++ linux/mm/page_alloc.c	Mon Nov 19 15:13:36 2001
| @@ -299,29 +299,26 @@
|  	return page;
|  }
|  
| -static inline unsigned long zone_free_pages(zone_t * zone, unsigned int order)
| -{
| -	long free = zone->free_pages - (1UL << order);
| -	return free >= 0 ? free : 0;
| -}
| -
|  /*
|   * This is the 'heart' of the zoned buddy allocator:
|   */
|  struct page * __alloc_pages(unsigned int gfp_mask, unsigned int order, zonelist_t *zonelist)
|  {
| +	unsigned long min;
|  	zone_t **zone, * classzone;
|  	struct page * page;
|  	int freed;
|  
|  	zone = zonelist->zones;
|  	classzone = *zone;
| +	min = 1UL << order;
|  	for (;;) {
|  		zone_t *z = *(zone++);
|  		if (!z)
|  			break;
|  
| -		if (zone_free_pages(z, order) > z->pages_low) {
| +		min += z->pages_low;
| +		if (z->free_pages > min) {
|  			page = rmqueue(z, order);
|  			if (page)
|  				return page;
| @@ -334,16 +331,18 @@
|  		wake_up_interruptible(&kswapd_wait);
|  
|  	zone = zonelist->zones;
| +	min = 1UL << order;
|  	for (;;) {
| -		unsigned long min;
| +		unsigned long local_min;
|  		zone_t *z = *(zone++);
|  		if (!z)
|  			break;
|  
| -		min = z->pages_min;
| +		local_min = z->pages_min;
|  		if (!(gfp_mask & __GFP_WAIT))
| -			min >>= 2;
| -		if (zone_free_pages(z, order) > min) {
| +			local_min >>= 2;
| +		min += local_min;
| +		if (z->free_pages > min) {
|  			page = rmqueue(z, order);
|  			if (page)
|  				return page;
| @@ -376,12 +375,14 @@
|  		return page;
|  
|  	zone = zonelist->zones;
| +	min = 1UL << order;
|  	for (;;) {
|  		zone_t *z = *(zone++);
|  		if (!z)
|  			break;
|  
| -		if (zone_free_pages(z, order) > z->pages_min) {
| +		min += z->pages_min;
| +		if (z->free_pages > min) {
|  			page = rmqueue(z, order);
|  			if (page)
|  				return page;


  parent reply	other threads:[~2001-11-20  3:10 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200111191801.fAJI1l922388@neosilicon.transmeta.com>
2001-11-19 18:07 ` [VM] 2.4.14/15-pre4 too "swap-happy"? Linus Torvalds
2001-11-19 18:31   ` Ken Brownfield
2001-11-19 19:23     ` Linus Torvalds
2001-11-19 23:39       ` Ken Brownfield
2001-11-19 23:52         ` Linus Torvalds
2001-11-20  0:18           ` M. Edward (Ed) Borasky
2001-11-20  0:25           ` Ken Brownfield
2001-11-20  0:31             ` Linus Torvalds
2001-11-20  3:09           ` Ken Brownfield [this message]
2001-11-20  3:30             ` Linus Torvalds
2001-11-20  3:32             ` Andrea Arcangeli
2001-11-20  5:54               ` Ken Brownfield
2001-11-20  6:50                 ` Linus Torvalds
2001-12-01 13:15                   ` Slight Return (was Re: [VM] 2.4.14/15-pre4 too "swap-happy"?) Ken Brownfield
2001-12-08 13:12                     ` Ken Brownfield
2001-12-09 18:51                       ` Marcelo Tosatti
2001-12-10  6:56                         ` Ken Brownfield
2001-11-19 19:30     ` [VM] 2.4.14/15-pre4 too "swap-happy"? Ken Brownfield
2001-11-19 18:26       ` Marcelo Tosatti
2001-11-19 19:44   ` Slo Mo Snail
2001-11-20  0:48 Yan, Noah
  -- strict thread matches above, loose matches on Subject: below --
2001-11-15  8:38 janne
2001-11-15  9:05 ` janne
2001-11-15 17:44   ` Mike Galbraith
2001-11-16  0:14     ` janne
     [not found] <200111141243.fAEChS915731@neosilicon.transmeta.com>
2001-11-14 16:34 ` Linus Torvalds
2001-11-19 18:01   ` Sebastian Dröge
2001-11-19 18:18   ` Simon Kirby
2001-11-14 12:44 Sebastian Dröge
2001-11-14 15:00 ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20011119210941.C10597@asooo.flowerfire.com \
    --to=brownfld@irridia.com \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.