public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: akpm@osdl.org, linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org
Subject: Re: page fault scalability patch V10: [2/7] defer/omit taking page_table_lock
Date: Fri, 15 Oct 2004 17:00:41 -0300	[thread overview]
Message-ID: <20041015200041.GD4937@logos.cnet> (raw)
In-Reply-To: <Pine.LNX.4.58.0410151203521.26697@schroedinger.engr.sgi.com>


Hi Christoph,

Nice work! 

On Fri, Oct 15, 2004 at 12:04:53PM -0700, Christoph Lameter wrote:
> Changelog
> 	* Increase parallelism in SMP configurations by deferring
> 	  the acquisition of page_table_lock in handle_mm_fault
> 	* Anonymous memory page faults bypass the page_table_lock
> 	  through the use of atomic page table operations
> 	* Swapper does not set pte to empty in transition to swap
> 	* Simulate atomic page table operations using the
> 	  page_table_lock if an arch does not define
> 	  __HAVE_ARCH_ATOMIC_TABLE_OPS. This still provides
> 	  a performance benefit since the page_table_lock
> 	  is held for shorter periods of time.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.9-rc4/mm/memory.c
> ===================================================================
> --- linux-2.6.9-rc4.orig/mm/memory.c	2004-10-14 12:22:14.000000000 -0700
> +++ linux-2.6.9-rc4/mm/memory.c	2004-10-14 12:22:14.000000000 -0700
> @@ -1314,8 +1314,7 @@
>  }
> 
>  /*
> - * We hold the mm semaphore and the page_table_lock on entry and
> - * should release the pagetable lock on exit..
> + * We hold the mm semaphore
>   */
>  static int do_swap_page(struct mm_struct * mm,
>  	struct vm_area_struct * vma, unsigned long address,
> @@ -1327,15 +1326,13 @@
>  	int ret = VM_FAULT_MINOR;
> 
>  	pte_unmap(page_table);
> -	spin_unlock(&mm->page_table_lock);
>  	page = lookup_swap_cache(entry);
>  	if (!page) {
>   		swapin_readahead(entry, address, vma);
>   		page = read_swap_cache_async(entry, vma, address);
>  		if (!page) {
>  			/*
> -			 * Back out if somebody else faulted in this pte while
> -			 * we released the page table lock.
> +			 * Back out if somebody else faulted in this pte
>  			 */
>  			spin_lock(&mm->page_table_lock);
>  			page_table = pte_offset_map(pmd, address);

The comment above, which is a few lines down on do_swap_page() is now 
bogus (the "while we released the page table lock"). 

        /*
         * Back out if somebody else faulted in this pte while we
         * released the page table lock.
         */
        spin_lock(&mm->page_table_lock);
        page_table = pte_offset_map(pmd, address);
        if (unlikely(!pte_same(*page_table, orig_pte))) {

> @@ -1406,14 +1403,12 @@
>  }
> 
>  /*
> - * We are called with the MM semaphore and page_table_lock
> - * spinlock held to protect against concurrent faults in
> - * multithreaded programs.
> + * We are called with the MM semaphore held.
>   */
>  static int
>  do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		pte_t *page_table, pmd_t *pmd, int write_access,
> -		unsigned long addr)
> +		unsigned long addr, pte_t orig_entry)
>  {
>  	pte_t entry;
>  	struct page * page = ZERO_PAGE(addr);
> @@ -1425,7 +1420,6 @@
>  	if (write_access) {
>  		/* Allocate our own private page. */
>  		pte_unmap(page_table);
> -		spin_unlock(&mm->page_table_lock);
> 
>  		if (unlikely(anon_vma_prepare(vma)))
>  			goto no_mem;
> @@ -1434,30 +1428,39 @@
>  			goto no_mem;
>  		clear_user_highpage(page, addr);
> 
> -		spin_lock(&mm->page_table_lock);
> +		lock_page(page);

Question: Why do you need to hold the pagelock now?

I can't seem to figure that out myself.

>  		page_table = pte_offset_map(pmd, addr);
> 
> -		if (!pte_none(*page_table)) {
> -			pte_unmap(page_table);
> -			page_cache_release(page);
> -			spin_unlock(&mm->page_table_lock);
> -			goto out;
> -		}
> -		atomic_inc(&mm->mm_rss);
>  		entry = maybe_mkwrite(pte_mkdirty(mk_pte(page,
>  							 vma->vm_page_prot)),
>  				      vma);
> -		lru_cache_add_active(page);
>  		mark_page_accessed(page);
> -		page_add_anon_rmap(page, vma, addr);
>  	}
> 
> -	set_pte(page_table, entry);
> +	/* update the entry */
> +	if (!ptep_cmpxchg(vma, addr, page_table, orig_entry, entry)) {
> +		if (write_access) {
> +			pte_unmap(page_table);
> +			unlock_page(page);
> +			page_cache_release(page);
> +		}
> +		goto out;
> +	}
> +	if (write_access) {
> +		/*
> +		 * The following two functions are safe to use without
> +		 * the page_table_lock but do they need to come before
> +		 * the cmpxchg?
> +		 */

They do need to come after AFAICS - from the point they are in the reverse map 
and the page is on the LRU try_to_unmap() can come in and try to 
unmap the pte (now that we dont hold page_table_lock anymore).

> +		lru_cache_add_active(page);
> +		page_add_anon_rmap(page, vma, addr);
> +		atomic_inc(&mm->mm_rss);
> +		unlock_page(page);
> +	}
>  	pte_unmap(page_table);
> 
>  	/* No need to invalidate - it was non-present before */
>  	update_mmu_cache(vma, addr, entry);
> -	spin_unlock(&mm->page_table_lock);
>  out:
>  	return VM_FAULT_MINOR;
>  no_mem:
> @@ -1473,12 +1476,12 @@
>   * As this is called only for pages that do not currently exist, we
>   * do not need to flush old virtual caches or the TLB.
>   *
> - * This is called with the MM semaphore held and the page table
> - * spinlock held. Exit with the spinlock released.
> + * This is called with the MM semaphore held.
>   */
>  static int
>  do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
> -	unsigned long address, int write_access, pte_t *page_table, pmd_t *pmd)
> +	unsigned long address, int write_access, pte_t *page_table,
> +        pmd_t *pmd, pte_t orig_entry)
>  {
>  	struct page * new_page;
>  	struct address_space *mapping = NULL;
> @@ -1489,9 +1492,8 @@
> 
>  	if (!vma->vm_ops || !vma->vm_ops->nopage)
>  		return do_anonymous_page(mm, vma, page_table,
> -					pmd, write_access, address);
> +					pmd, write_access, address, orig_entry);
>  	pte_unmap(page_table);
> -	spin_unlock(&mm->page_table_lock);
> 
>  	if (vma->vm_file) {
>  		mapping = vma->vm_file->f_mapping;
> @@ -1589,7 +1591,7 @@
>   * nonlinear vmas.
>   */
>  static int do_file_page(struct mm_struct * mm, struct vm_area_struct * vma,
> -	unsigned long address, int write_access, pte_t *pte, pmd_t *pmd)
> +	unsigned long address, int write_access, pte_t *pte, pmd_t *pmd, pte_t entry)
>  {
>  	unsigned long pgoff;
>  	int err;
> @@ -1602,13 +1604,12 @@
>  	if (!vma->vm_ops || !vma->vm_ops->populate ||
>  			(write_access && !(vma->vm_flags & VM_SHARED))) {
>  		pte_clear(pte);
> -		return do_no_page(mm, vma, address, write_access, pte, pmd);
> +		return do_no_page(mm, vma, address, write_access, pte, pmd, entry);
>  	}
> 
>  	pgoff = pte_to_pgoff(*pte);
> 
>  	pte_unmap(pte);
> -	spin_unlock(&mm->page_table_lock);
> 
>  	err = vma->vm_ops->populate(vma, address & PAGE_MASK, PAGE_SIZE, vma->vm_page_prot, pgoff, 0);
>  	if (err == -ENOMEM)
> @@ -1627,49 +1628,49 @@
>   * with external mmu caches can use to update those (ie the Sparc or
>   * PowerPC hashed page tables that act as extended TLBs).
>   *
> - * Note the "page_table_lock". It is to protect against kswapd removing
> - * pages from under us. Note that kswapd only ever _removes_ pages, never
> - * adds them. As such, once we have noticed that the page is not present,
> - * we can drop the lock early.
> - *
> + * Note that kswapd only ever _removes_ pages, never adds them.
> + * We need to insure to handle that case properly.
> + *
>   * The adding of pages is protected by the MM semaphore (which we hold),
>   * so we don't need to worry about a page being suddenly been added into
>   * our VM.
> - *
> - * We enter with the pagetable spinlock held, we are supposed to
> - * release it when done.
>   */
>  static inline int handle_pte_fault(struct mm_struct *mm,
>  	struct vm_area_struct * vma, unsigned long address,
>  	int write_access, pte_t *pte, pmd_t *pmd)
>  {
>  	pte_t entry;
> +	pte_t new_entry;
> 
>  	entry = *pte;
>  	if (!pte_present(entry)) {
>  		/*
>  		 * If it truly wasn't present, we know that kswapd
>  		 * and the PTE updates will not touch it later. So
> -		 * drop the lock.
> +		 * no need to acquire the page_table_lock.
>  		 */
>  		if (pte_none(entry))
> -			return do_no_page(mm, vma, address, write_access, pte, pmd);
> +			return do_no_page(mm, vma, address, write_access, pte, pmd, entry);
>  		if (pte_file(entry))
> -			return do_file_page(mm, vma, address, write_access, pte, pmd);
> +			return do_file_page(mm, vma, address, write_access, pte, pmd, entry);
>  		return do_swap_page(mm, vma, address, pte, pmd, entry, write_access);
>  	}

I wonder what happens if kswapd, through try_to_unmap_one(), unmap's the
pte right here ? 

Aren't we going to proceed with the "pte_mkyoung(entry)" of a potentially 
now unmapped pte? Isnt that case possible now?


  reply	other threads:[~2004-10-15 21:52 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-15 13:50 page fault fastpath: Increasing SMP scalability by introducing pte locks? Christoph Lameter
2004-08-15 20:09 ` David S. Miller
2004-08-15 22:58   ` Christoph Lameter
2004-08-15 23:58     ` David S. Miller
2004-08-16  0:11       ` Christoph Lameter
2004-08-16  1:56         ` David S. Miller
2004-08-16  3:29           ` Christoph Lameter
2004-08-16  7:00             ` Ray Bryant
2004-08-16 15:18               ` Christoph Lameter
2004-08-16 16:18                 ` William Lee Irwin III
2004-08-16 14:39             ` William Lee Irwin III
2004-08-17 15:28               ` page fault fastpath patch v2: fix race conditions, stats for 8,32 and 512 cpu SMP Christoph Lameter
2004-08-17 15:37                 ` Christoph Hellwig
2004-08-17 15:51                 ` William Lee Irwin III
2004-08-18 17:55                 ` Hugh Dickins
2004-08-18 20:20                   ` William Lee Irwin III
2004-08-19  1:19                   ` Christoph Lameter
     [not found]               ` <B6E8046E1E28D34EB815A11AC8CA3129027B679F@mtv-atc-605e--n.corp.sgi.com>
2004-08-24  4:43                 ` page fault scalability patch v3: use cmpxchg, make rss atomic Christoph Lameter
2004-08-24  5:49                   ` Christoph Lameter
2004-08-24 12:34                     ` Matthew Wilcox
2004-08-24 14:47                       ` Christoph Lameter
     [not found]                 ` <B6E8046E1E28D34EB815A11AC8CA3129027B67A9@mtv-atc-605e--n.corp.sgi.com>
2004-08-26 15:20                   ` page fault scalability patch v4: reduce page_table_lock use, atomic pmd,pgd handlin Christoph Lameter
     [not found]                   ` <B6E8046E1E28D34EB815A11AC8CA3129027B67B4@mtv-atc-605e--n.corp.sgi.com>
2004-08-27 23:20                     ` page fault scalability patch final : i386 tested, x86_64 support added Christoph Lameter
2004-08-27 23:36                       ` Andi Kleen
2004-08-27 23:43                         ` David S. Miller
2004-08-28  0:19                         ` Christoph Lameter
2004-08-28  0:23                           ` David S. Miller
2004-08-28  0:36                             ` Andrew Morton
2004-08-28  0:40                               ` David S. Miller
2004-08-28  1:05                                 ` Andi Kleen
2004-08-28  1:11                                   ` David S. Miller
2004-08-28  1:17                                     ` Andi Kleen
2004-08-28  1:02                               ` Andi Kleen
2004-08-28  1:39                                 ` Andrew Morton
2004-08-28  2:08                                   ` Paul Mackerras
2004-08-28  3:32                                     ` Christoph Lameter
2004-08-28  3:42                                       ` Andrew Morton
2004-08-28  4:24                                         ` Christoph Lameter
2004-08-28  5:39                                           ` Andrew Morton
2004-08-28  5:58                                             ` Christoph Lameter
2004-08-28  6:03                                               ` William Lee Irwin III
2004-08-28  6:06                                               ` Andrew Morton
2004-08-30 17:02                                                 ` Herbert Poetzl
2004-08-30 17:05                                                   ` Andi Kleen
2004-08-28 13:19                                             ` Andi Kleen
2004-08-28 15:48                                             ` Matt Mackall
2004-09-01  4:13                                             ` Benjamin Herrenschmidt
2004-09-02 21:26                                               ` Andi Kleen
2004-09-02 21:55                                                 ` David S. Miller
2004-09-01 18:03                                             ` Matthew Wilcox
2004-09-01 18:19                                               ` Andrew Morton
2004-09-01 19:06                                                 ` William Lee Irwin III
2004-08-28 21:41                                 ` Daniel Phillips
2004-09-01  4:24                       ` Benjamin Herrenschmidt
2004-09-01  5:22                         ` David S. Miller
2004-09-01 16:43                         ` Christoph Lameter
2004-09-01 23:09                           ` Benjamin Herrenschmidt
     [not found]                             ` <Pine.LNX.4.58.0409012140440.23186@schroedinger.engr.sgi.com>
     [not found]                               ` <20040901215741.3538bbf4.davem@davemloft.net>
2004-09-02  5:18                                 ` William Lee Irwin III
2004-09-09 15:38                                   ` page fault scalability patch: V7 (+fallback for atomic page table ops) Christoph Lameter
2004-09-02 16:24                                 ` page fault scalability patch final : i386 tested, x86_64 support added Christoph Lameter
2004-09-02 20:10                                   ` David S. Miller
2004-09-02 21:02                                     ` Christoph Lameter
2004-09-02 21:07                                       ` David S. Miller
2004-09-18 23:23                                         ` page fault scalability patch V8: [0/7] Description Christoph Lameter
     [not found]                                         ` <B6E8046E1E28D34EB815A11AC8CA312902CD3243@mtv-atc-605e--n.corp.sgi.com>
2004-09-18 23:24                                           ` page fault scalability patch V8: [1/7] make mm->rss atomic Christoph Lameter
2004-09-18 23:26                                           ` page fault scalability patch V8: [2/7] avoid page_table_lock in handle_mm_fault Christoph Lameter
2004-09-19  9:04                                             ` Christoph Hellwig
2004-09-18 23:27                                           ` page fault scalability patch V8: [3/7] atomic pte operations for ia64 Christoph Lameter
2004-09-18 23:28                                           ` page fault scalability patch V8: [4/7] universally available cmpxchg on i386 Christoph Lameter
     [not found]                                             ` <200409191430.37444.vda@port.imtp.ilyichevsk.odessa.ua>
2004-09-19 12:11                                               ` Andi Kleen
2004-09-20 15:45                                               ` Christoph Lameter
     [not found]                                                 ` <200409202043.00580.vda@port.imtp.ilyichevsk.odessa.ua>
2004-09-20 20:49                                                   ` Christoph Lameter
2004-09-20 20:57                                                     ` Andi Kleen
     [not found]                                                       ` <200409211841.25507.vda@port.imtp.ilyichevsk.odessa.ua>
2004-09-21 15:45                                                         ` Andi Kleen
     [not found]                                                           ` <200409212306.38800.vda@port.imtp.ilyichevsk.odessa.ua>
2004-09-21 20:14                                                             ` Andi Kleen
2004-09-23  7:17                                                           ` Andy Lutomirski
2004-09-23  9:03                                                             ` Andi Kleen
2004-09-27 19:06                                                               ` page fault scalability patch V9: [0/7] overview Christoph Lameter
2004-10-15 19:02                                                                 ` page fault scalability patch V10: " Christoph Lameter
2004-10-15 19:03                                                                   ` page fault scalability patch V10: [1/7] make rss atomic Christoph Lameter
2004-10-15 19:04                                                                   ` page fault scalability patch V10: [2/7] defer/omit taking page_table_lock Christoph Lameter
2004-10-15 20:00                                                                     ` Marcelo Tosatti [this message]
2004-10-18 15:59                                                                       ` Christoph Lameter
2004-10-19  5:25                                                                       ` [revised] " Christoph Lameter
2004-10-15 19:05                                                                   ` page fault scalability patch V10: [3/7] IA64 atomic pte operations Christoph Lameter
2004-10-15 19:06                                                                   ` page fault scalability patch V10: [4/7] cmpxchg for 386 and 486 Christoph Lameter
2004-10-15 19:06                                                                   ` page fault scalability patch V10: [5/7] i386 atomic pte operations Christoph Lameter
2004-10-15 19:07                                                                   ` page fault scalability patch V10: [6/7] x86_64 " Christoph Lameter
2004-10-15 19:08                                                                   ` page fault scalability patch V10: [7/7] s/390 " Christoph Lameter
     [not found]                                                               ` <B6E8046E1E28D34EB815A11AC8CA312902CD3282@mtv-atc-605e--n.corp.sgi.com>
2004-09-27 19:07                                                                 ` page fault scalability patch V9: [1/7] make mm->rss atomic Christoph Lameter
2004-09-27 19:08                                                                 ` page fault scalability patch V9: [2/7] defer/remove page_table_lock Christoph Lameter
2004-09-27 19:10                                                                 ` page fault scalability patch V9: [3/7] atomic pte operatios for ia64 Christoph Lameter
2004-09-27 19:10                                                                 ` page fault scalability patch V9: [4/7] generally available cmpxchg on i386 Christoph Lameter
2004-09-27 19:11                                                                 ` page fault scalability patch V9: [5/7] atomic pte operations for i386 Christoph Lameter
2004-09-27 19:12                                                                 ` page fault scalability patch V9: [6/7] atomic pte operations for x86_64 Christoph Lameter
2004-09-27 19:13                                                                 ` page fault scalability patch V9: [7/7] atomic pte operatiosn for s390 Christoph Lameter
2004-09-18 23:29                                           ` page fault scalability patch V8: [5/7] atomic pte operations for i386 Christoph Lameter
2004-09-18 23:30                                           ` page fault scalability patch V8: [6/7] atomic pte operations for x86_64 Christoph Lameter
2004-09-18 23:31                                           ` page fault scalability patch V8: [7/7] atomic pte operations for s390 Christoph Lameter
     [not found]                                             ` <200409191435.09445.vda@port.imtp.ilyichevsk.odessa.ua>
2004-09-20 15:44                                               ` Christoph Lameter
2004-08-15 22:38 ` page fault fastpath: Increasing SMP scalability by introducing pte locks? Benjamin Herrenschmidt
2004-08-16 17:28   ` Christoph Lameter
2004-08-17  8:01     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041015200041.GD4937@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox