All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: j-nomura@ce.jp.nec.com
Cc: linux-kernel@vger.kernel.org, andrea@suse.de,
	Andrew Morton <akpm@osdl.org>,
	hugh@veritas.com
Subject: Re: [2.4] heavy-load under swap space shortage
Date: Wed, 26 May 2004 09:41:04 -0300	[thread overview]
Message-ID: <20040526124104.GF6439@logos.cnet> (raw)
In-Reply-To: <20040310.195707.521627048.nomura@linux.bs1.fc.nec.co.jp>

Andrea, Hugh, Jun'ichi,

I think we can merge this patch.

Its very safe - default behaviour unchanged. 

Jun, are you willing to do another test for us if this gets merged
in v2.4.27-pre4 ?

Maybe we should document the VM tunables somewhere outside source code
(Documentation/) ?

On Wed, Mar 10, 2004 at 07:57:07PM +0900, j-nomura@ce.jp.nec.com wrote:
> After discussion with Hugh and recommendation from Andrea,
> it turns out that Andrea's 05_vm_22_vm-anon-lru-3 in 2.4.23aa2 solves
> the problem.
> ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.23aa2/05_vm_22_vm-anon-lru-3
> 
> The patch adds a sysctl which accelerate the performance on
> huge memory machine. It doesn't affect anything if turned off.
> 
> Marcelo, could you apply this to 2.4.26-pre?
> (I attached the slightly modified patch in which the feature is turned
> off by default and which is cleanly applied to bk tree.)
> 
> 
> My test case was:
>   - there is a process with large anonymous mapping
>   - there are large amount of page caches and active I/O processes
>   - there are not much of file mappings
> 
> So the problem happens in this way:
>   - shrink_cache tries scanning inactive list in which most of pages
>     are anonymous mapped
>   - it soon fall into swap_out because of too many anonymous pages
>   - when no free swap space, it hardly frees anything
>   - it retries again but soon calls swap_out again and again
> 
> Without the patch, snapshot of readprofile looks like:
>    3590781 total
>    3289271 swap_out
>     212029 smp_call_function
>      22598 shrink_cache
>      21833 lru_cache_add
>       7787 get_user_pages
> 
> Most of the time was spent in swap_out. (contention on pagetable_lock)
> 
> After applying the patch, the snapshot is like:
>     17420 total
>      3929 copy_page
>      3677 statm_pgd_range
>      1317 try_to_free_buffers
>      1312 __copy_user
>       593 scsi_make_request
> 
> Best regards.
> --
> NOMURA, Jun'ichi <j-nomura@ce.jp.nec.com>

> --- linux/include/linux/swap.h	2004/02/19 04:12:39	1.1.1.26
> +++ linux/include/linux/swap.h	2004/03/10 10:09:11
> @@ -116,7 +116,7 @@ extern void swap_setup(void);
>  extern wait_queue_head_t kswapd_wait;
>  extern int FASTCALL(try_to_free_pages_zone(zone_t *, unsigned int));
>  extern int FASTCALL(try_to_free_pages(unsigned int));
> -extern int vm_vfs_scan_ratio, vm_cache_scan_ratio, vm_lru_balance_ratio, vm_passes, vm_gfp_debug, vm_mapped_ratio;
> +extern int vm_vfs_scan_ratio, vm_cache_scan_ratio, vm_lru_balance_ratio, vm_passes, vm_gfp_debug, vm_mapped_ratio, vm_anon_lru;
>  
>  /* linux/mm/page_io.c */
>  extern void rw_swap_page(int, struct page *);
> --- linux/include/linux/sysctl.h	2004/02/19 04:12:39	1.1.1.23
> +++ linux/include/linux/sysctl.h	2004/03/10 10:09:11
> @@ -156,6 +156,7 @@ enum
>  	VM_MAPPED_RATIO=20,     /* amount of unfreeable pages that triggers swapout */
>  	VM_LAPTOP_MODE=21,	/* kernel in laptop flush mode */
>  	VM_BLOCK_DUMP=22,	/* dump fs activity to log */
> +	VM_ANON_LRU=23,		/* immediatly insert anon pages in the vm page lru */
>  };
>  
>  
> --- linux/kernel/sysctl.c	2003/12/02 04:48:47	1.1.1.22
> +++ linux/kernel/sysctl.c	2004/03/10 10:09:12
> @@ -287,6 +287,8 @@ static ctl_table vm_table[] = {
>  	 &vm_cache_scan_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
>  	{VM_MAPPED_RATIO, "vm_mapped_ratio", 
>  	 &vm_mapped_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
> +	{VM_ANON_LRU, "vm_anon_lru", 
> +	 &vm_anon_lru, sizeof(int), 0644, NULL, &proc_dointvec},
>  	{VM_LRU_BALANCE_RATIO, "vm_lru_balance_ratio", 
>  	 &vm_lru_balance_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
>  	{VM_PASSES, "vm_passes", 
> --- linux/mm/memory.c	2003/12/02 04:48:47	1.1.1.31
> +++ linux/mm/memory.c	2004/03/10 10:09:12
> @@ -984,7 +984,8 @@ static int do_wp_page(struct mm_struct *
>  		if (PageReserved(old_page))
>  			++mm->rss;
>  		break_cow(vma, new_page, address, page_table);
> -		lru_cache_add(new_page);
> +		if (vm_anon_lru)
> +			lru_cache_add(new_page);
>  
>  		/* Free the old page.. */
>  		new_page = old_page;
> @@ -1215,7 +1216,8 @@ static int do_anonymous_page(struct mm_s
>  		mm->rss++;
>  		flush_page_to_ram(page);
>  		entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
> -		lru_cache_add(page);
> +		if (vm_anon_lru)
> +			lru_cache_add(page);
>  		mark_page_accessed(page);
>  	}
>  
> @@ -1270,7 +1272,8 @@ static int do_no_page(struct mm_struct *
>  		}
>  		copy_user_highpage(page, new_page, address);
>  		page_cache_release(new_page);
> -		lru_cache_add(page);
> +		if (vm_anon_lru)
> +			lru_cache_add(page);
>  		new_page = page;
>  	}
>  
> --- linux/mm/vmscan.c	2004/02/19 04:12:33	1.1.1.32
> +++ linux/mm/vmscan.c	2004/03/10 10:09:13
> @@ -65,6 +65,27 @@ int vm_lru_balance_ratio = 2;
>  int vm_vfs_scan_ratio = 6;
>  
>  /*
> + * "vm_anon_lru" select if to immdiatly insert anon pages in the
> + * lru. Immediatly means as soon as they're allocated during the
> + * page faults.
> + *
> + * If this is set to 0, they're inserted only after the first
> + * swapout.
> + *
> + * Having anon pages immediatly inserted in the lru allows the
> + * VM to know better when it's worthwhile to start swapping
> + * anonymous ram, it will start to swap earlier and it should
> + * swap smoother and faster, but it will decrease scalability
> + * on the >16-ways of an order of magnitude. Big SMP/NUMA
> + * definitely can't take an hit on a global spinlock at
> + * every anon page allocation. So this is off by default.
> + *
> + * Low ram machines that swaps all the time want to turn
> + * this on (i.e. set to 1).
> + */
> +int vm_anon_lru = 1;
> +
> +/*
>   * The swap-out function returns 1 if it successfully
>   * scanned all the pages it was asked to (`count').
>   * It returns zero if it couldn't do anything,


  parent reply	other threads:[~2004-05-26 17:46 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-02 10:12 [2.4] heavy-load under swap space shortage j-nomura
2004-02-02 13:29 ` Hugh Dickins
2004-02-03  7:53   ` j-nomura
2004-02-03 17:19     ` Hugh Dickins
2004-02-04 11:40       ` j-nomura
2004-02-05 18:42         ` Hugh Dickins
2004-02-06  9:03           ` j-nomura
2004-03-10 10:57           ` j-nomura
2004-03-14 19:47             ` Marcelo Tosatti
2004-03-14 19:54               ` Rik van Riel
2004-03-14 20:15               ` Andrew Morton
     [not found]                 ` <20040314230138.GV30940@dualathlon.random>
2004-03-14 23:22                   ` Andrew Morton
2004-03-15  0:14                     ` Andrea Arcangeli
2004-03-15  4:38                       ` Nick Piggin
2004-03-15 11:49                         ` Andrea Arcangeli
2004-03-15 13:23                           ` Rik van Riel
2004-03-15 14:37                             ` Nick Piggin
2004-03-15 14:50                               ` Andrea Arcangeli
2004-03-15 18:35                                 ` Andrew Morton
2004-03-15 18:51                                   ` Andrea Arcangeli
2004-03-15 19:02                                     ` Andrew Morton
2004-03-15 21:55                                       ` Andrea Arcangeli
2004-03-15 22:05                                 ` Nick Piggin
2004-03-15 22:24                                   ` Andrea Arcangeli
2004-03-15 22:41                                     ` Nick Piggin
2004-03-15 22:44                                       ` Andrea Arcangeli
2004-03-15 22:41                                     ` Rik van Riel
2004-03-15 23:32                                       ` Andrea Arcangeli
2004-03-16  6:27                                         ` Nick Piggin
2004-03-16  7:25                                   ` Marcelo Tosatti
2004-03-16  6:31                     ` Marcelo Tosatti
2004-03-16 13:47                       ` Andrea Arcangeli
2004-03-16 16:59                         ` Marcelo Tosatti
2004-11-22 15:01                     ` Lazily add anonymous pages to LRU on v2.4? was " Marcelo Tosatti
2004-11-22 19:49                       ` Andrea Arcangeli
2004-11-22 15:58                         ` Marcelo Tosatti
2004-05-26 12:41             ` Marcelo Tosatti [this message]
2004-05-26 18:24               ` Marc-Christian Petersen
2004-05-27 11:16                 ` Marcelo Tosatti
2004-05-26 19:06               ` Hugh Dickins
2004-05-26 22:23               ` Andrea Arcangeli
2004-05-28  2:55               ` j-nomura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040526124104.GF6439@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=hugh@veritas.com \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.