All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@zip.com.au>
To: Daniel Phillips <phillips@arcor.de>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Rmap speedup
Date: Mon, 05 Aug 2002 00:05:40 -0700	[thread overview]
Message-ID: <3D4E23C4.F746CF3D@zip.com.au> (raw)
In-Reply-To: 3D4DB9E4.E785184E@zip.com.au

Andrew Morton wrote:
> 
> I'll code it tomorrow.

Tomorrow came early.

It makes basically no difference at all.  Maybe pulled back 10 of the lost
50%.  The kernel is still spending much time in the pte_chain walk in
page_remove_rmap().

Despite the fact that the number of pte_chain references in page_add/remove_rmap
now just averages two in that test.  So I'm rather stumped.  It seems that
adding any misses at all to the copy_page_range/zap_page_range path hurts.

This patch _has_ to help, even if not on my box.  Plus it's gorgeous ;)  See
how it minimises internal fragmentation and cache misses at the same time.

Let's look at `make -j6 bzImage':

2.5.30+rmap-locking+rmap-speedup:

c0133444 99       0.597213    __pagevec_lru_add       
c0134e10 99       0.597213    __alloc_pages           
c0115d60 102      0.61531     do_schedule             
c01218fc 116      0.699765    update_one_process      
c01331f4 125      0.754057    __pagevec_release       
c0121a34 133      0.802316    timer_bh                
c012afa4 134      0.808349    sys_brk                 
c012bfd4 139      0.838511    do_brk                  
c0128980 155      0.93503     pte_alloc_map           
c01350b4 158      0.953128    nr_free_pages           
c013c4a0 164      0.989323    __page_add_rmap         
c0107100 168      1.01345     system_call             
c01346d0 182      1.09791     __free_pages_ok         
c013c570 186      1.12204     page_add_rmap           
c01df770 186      1.12204     __generic_copy_to_user  
c012b88c 188      1.1341      find_vma                
c013c5d4 199      1.20046     __page_remove_rmap      
c012ae90 212      1.27888     vm_enough_memory        
c0151cd4 214      1.29095     __d_lookup              
c01351ac 227      1.36937     get_page_state          
c012ab0c 230      1.38746     handle_mm_fault         
c01126e4 247      1.49002     smp_apic_timer_interrupt 
c0148ce8 282      1.70115     link_path_walk          
c0128ed0 291      1.75544     zap_pte_range           
c0115a48 338      2.03897     scheduler_tick          
c012a7f0 349      2.10533     do_no_page              
c0107c90 368      2.21994     page_fault              
c01142b4 388      2.34059     do_page_fault           
c0129d98 389      2.34662     do_wp_page              
c010c674 515      3.10671     timer_interrupt         
c01349bc 547      3.29975     rmqueue                 
c01df7bc 711      4.28908     __generic_copy_from_user 
c012da60 1045     6.30392     file_read_actor         
c012a5fc 3343     20.1665     do_anonymous_page 

2.5.26:

c0128160 92       0.594853    pte_alloc_map           
c0106fca 96       0.620716    restore_all             
c01d98d8 104      0.672443    strnlen_user            
c0113118 105      0.678909    set_ioapic_affinity     
c012a048 108      0.698306    sys_brk                 
c01213cc 110      0.711238    update_one_process      
c012c3b8 121      0.782361    find_get_page           
c012af50 131      0.847019    do_brk                  
c01338dc 133      0.859951    nr_free_pages           
c0131d00 145      0.93754     lru_cache_add           
c0139a08 147      0.950472    do_page_cache_readahead 
c0129c8c 151      0.976335    handle_mm_fault         
c0106f88 164      1.06039     system_call             
c01d9730 169      1.09272     __generic_copy_to_user  
c01db08c 178      1.15091     radix_tree_lookup       
c0132f30 190      1.2285      __free_pages_ok         
c012a8ac 194      1.25436     find_vma                
c01284f0 226      1.46127     zap_pte_range           
c014eb84 234      1.513       __d_lookup              
c01339b4 237      1.53239     get_page_state          
c011268c 245      1.58412     smp_apic_timer_interrupt 
c011415c 246      1.59059     do_page_fault           
c0145e08 298      1.92681     link_path_walk          
c0107b58 319      2.06259     page_fault              
c01157d4 346      2.23717     scheduler_tick          
c0129a04 346      2.23717     do_no_page              
c0129124 378      2.44407     do_wp_page              
c010c7f4 529      3.42041     timer_interrupt         
c01331c0 638      4.12518     rmqueue                 
c01d977c 682      4.40967     __generic_copy_from_user 
c012c964 988      6.38821     file_read_actor         
c0129868 3179     20.5548     do_anonymous_page       

Not much stands out there, except that the increase in HZ hurts
more than rmap.

2.5.26:
make -j6 bzImage  395.35s user 31.21s system 377% cpu 1:52.94 total
2.5.30:
make -j6 bzImage  395.55s user 33.00s system 375% cpu 1:54.19 total
2.5.30+everything
make -j6 bzImage  395.76s user 32.97s system 382% cpu 1:52.00 total
2.4.19
make -j6 bzImage  397.74s user 28.27s system 373% cpu 1:53.91 total
2.5.30+everything+HZ=100
make -j6 bzImage  393.60s user 28.91s system 373% cpu 1:53.06 total


Should we count PageDirect rmaps in /proc/meminfo:ReverseMaps?
I chose not to, so we can compare that number with the slabinfo
for pte_chains and see how much memory is being wasted.  Plus the
PageDirect rmaps aren't very interesting, because they don't consume
any resources.




 rmap.c |  233 ++++++++++++++++++++++++++++++++++++++++++++---------------------
 1 files changed, 161 insertions(+), 72 deletions(-)

--- 2.5.30/mm/rmap.c~rmap-speedup	Sun Aug  4 23:23:51 2002
+++ 2.5.30-akpm/mm/rmap.c	Sun Aug  4 23:58:27 2002
@@ -41,24 +41,39 @@
  * here, the page struct for the page table page contains the process
  * it belongs to and the offset within that process.
  *
- * A singly linked list should be fine for most, if not all, workloads.
- * On fork-after-exec the mapping we'll be removing will still be near
- * the start of the list, on mixed application systems the short-lived
- * processes will have their mappings near the start of the list and
- * in systems with long-lived applications the relative overhead of
- * exit() will be lower since the applications are long-lived.
+ * We use an array of pte pointers in this structure to minimise cache misses
+ * while traversing reverse maps.
  */
+#define NRPTE (L1_CACHE_BYTES/sizeof(void *) - 1)
+
 struct pte_chain {
 	struct pte_chain * next;
-	pte_t * ptep;
+	pte_t *ptes[NRPTE];
 };
 
 spinlock_t rmap_locks[NUM_RMAP_LOCKS];
 
 static kmem_cache_t	*pte_chain_cache;
 static inline struct pte_chain * pte_chain_alloc(void);
-static inline void pte_chain_free(struct pte_chain *, struct pte_chain *,
-		struct page *);
+static void pte_chain_free(struct pte_chain *pte_chain);
+
+/*
+ * pte_chain list management policy:
+ *
+ * - If a page has a pte_chain list then it is shared by at least two processes,
+ *   because a single sharing uses PageDirect. (Well, this isn't true yet,
+ *   coz this code doesn't collapse singletons back to PageDirect on the remove
+ *   path).
+ * - A pte_chain list has free space only in the head member - all succeeding
+ *   members are 100% full.
+ * - If the head element has free space, it occurs in its leading slots.
+ * - All free space in the pte_chain is at the start of the head member.
+ * - Insertion into the pte_chain puts a pte pointer in the last free slot of
+ *   the head member.
+ * - Removal from a pte chain moves the head pte of the head member onto the
+ *   victim pte and frees the head member if it became empty.
+ */
+
 
 /**
  * page_referenced - test if the page was referenced
@@ -82,8 +97,13 @@ int page_referenced(struct page * page)
 	} else {
 		/* Check all the page tables mapping this page. */
 		for (pc = page->pte.chain; pc; pc = pc->next) {
-			if (ptep_test_and_clear_young(pc->ptep))
-				referenced++;
+			int i;
+
+			for (i = 0; i < NRPTE; i++) {
+				pte_t *p = pc->ptes[i];
+				if (p && ptep_test_and_clear_young(p))
+					referenced++;
+			}
 		}
 	}
 	return referenced;
@@ -100,6 +120,7 @@ int page_referenced(struct page * page)
 void __page_add_rmap(struct page *page, pte_t *ptep)
 {
 	struct pte_chain * pte_chain;
+	int i;
 
 #ifdef DEBUG_RMAP
 	if (!page || !ptep)
@@ -121,32 +142,58 @@ void __page_add_rmap(struct page *page, 
 				BUG();
 		} else {
 			for (pc = page->pte.chain; pc; pc = pc->next) {
-				if (pc->ptep == ptep)
-					BUG();
+				for (i = 0; i < NRPTE; i++) {
+					pte_t *p = pc->ptes[i];
+
+					if (p && p == ptep)
+						BUG();
+				}
 			}
 		}
 	}
 #endif
 
+	if (page->pte.chain == NULL) {
+		page->pte.direct = ptep;
+		SetPageDirect(page);
+		goto out;
+	}
+	
 	if (PageDirect(page)) {
 		/* Convert a direct pointer into a pte_chain */
-		pte_chain = pte_chain_alloc();
-		pte_chain->ptep = page->pte.direct;
-		pte_chain->next = NULL;
-		page->pte.chain = pte_chain;
 		ClearPageDirect(page);
-	}
-	if (page->pte.chain) {
-		/* Hook up the pte_chain to the page. */
 		pte_chain = pte_chain_alloc();
-		pte_chain->ptep = ptep;
-		pte_chain->next = page->pte.chain;
+		pte_chain->ptes[NRPTE-1] = page->pte.direct;
+		pte_chain->ptes[NRPTE-2] = ptep;
+		mod_page_state(nr_reverse_maps, 2);
 		page->pte.chain = pte_chain;
-	} else {
-		page->pte.direct = ptep;
-		SetPageDirect(page);
+		goto out;
+	}
+
+	pte_chain = page->pte.chain;
+	if (pte_chain->ptes[0]) {	/* It's full */
+		struct pte_chain *new;
+
+		new = pte_chain_alloc();
+		new->next = pte_chain;
+		page->pte.chain = new;
+		new->ptes[NRPTE-1] = ptep;
+		inc_page_state(nr_reverse_maps);
+		goto out;
 	}
-	inc_page_state(nr_reverse_maps);
+
+	BUG_ON(pte_chain->ptes[NRPTE-1] == NULL);
+
+	for (i = NRPTE-2; i >= 0; i--) {
+		if (pte_chain->ptes[i] == NULL) {
+			pte_chain->ptes[i] = ptep;
+			inc_page_state(nr_reverse_maps);
+			goto out;
+		}
+	}
+	BUG();
+
+out:
 }
 
 void page_add_rmap(struct page *page, pte_t *ptep)
@@ -172,7 +219,7 @@ void page_add_rmap(struct page *page, pt
  */
 void __page_remove_rmap(struct page *page, pte_t *ptep)
 {
-	struct pte_chain * pc, * prev_pc = NULL;
+	struct pte_chain *pc;
 
 	if (!page || !ptep)
 		BUG();
@@ -186,15 +233,32 @@ void __page_remove_rmap(struct page *pag
 			goto out;
 		}
 	} else {
-		for (pc = page->pte.chain; pc; prev_pc = pc, pc = pc->next) {
-			if (pc->ptep == ptep) {
-				pte_chain_free(pc, prev_pc, page);
-				/* Check whether we can convert to direct */
-				pc = page->pte.chain;
-				if (!pc->next) {
-					page->pte.direct = pc->ptep;
-					SetPageDirect(page);
-					pte_chain_free(pc, NULL, NULL);
+		struct pte_chain *start = page->pte.chain;
+		int victim_i = -1;
+
+		for (pc = start; pc; pc = pc->next) {
+			int i;
+
+			if (pc->next)
+				prefetch(pc->next);
+			for (i = 0; i < NRPTE; i++) {
+				pte_t *p = pc->ptes[i];
+
+				if (!p)
+					continue;
+				if (victim_i == -1)
+					victim_i = i;
+				if (p != ptep)
+					continue;
+				pc->ptes[i] = start->ptes[victim_i];
+				start->ptes[victim_i] = NULL;
+				dec_page_state(nr_reverse_maps);
+				if (victim_i == NRPTE-1) {
+					/* Emptied a pte_chain */
+					page->pte.chain = start->next;
+					pte_chain_free(start);
+				} else {
+					/* Do singleton->PageDirect here */
 				}
 				goto out;
 			}
@@ -213,9 +277,9 @@ void __page_remove_rmap(struct page *pag
 	printk("\n");
 	printk(KERN_ERR "page_remove_rmap: driver cleared PG_reserved ?\n");
 #endif
+	return;
 
 out:
-	dec_page_state(nr_reverse_maps);
 	return;
 }
 
@@ -317,8 +381,9 @@ out_unlock:
  */
 int try_to_unmap(struct page * page)
 {
-	struct pte_chain * pc, * next_pc, * prev_pc = NULL;
+	struct pte_chain *pc, *next_pc, *start;
 	int ret = SWAP_SUCCESS;
+	int victim_i = -1;
 
 	/* This page should not be on the pageout lists. */
 	if (PageReserved(page))
@@ -335,35 +400,57 @@ int try_to_unmap(struct page * page)
 			page->pte.direct = NULL;
 			ClearPageDirect(page);
 		}
-	} else {		
-		for (pc = page->pte.chain; pc; pc = next_pc) {
-			next_pc = pc->next;
-			switch (try_to_unmap_one(page, pc->ptep)) {
-				case SWAP_SUCCESS:
-					/* Free the pte_chain struct. */
-					pte_chain_free(pc, prev_pc, page);
-					break;
-				case SWAP_AGAIN:
-					/* Skip this pte, remembering status. */
-					prev_pc = pc;
-					ret = SWAP_AGAIN;
-					continue;
-				case SWAP_FAIL:
-					ret = SWAP_FAIL;
-					break;
-				case SWAP_ERROR:
-					ret = SWAP_ERROR;
-					break;
+		goto out;
+	}		
+
+	start = page->pte.chain;
+	for (pc = start; pc; pc = next_pc) {
+		int i;
+
+		next_pc = pc->next;
+		if (next_pc)
+			prefetch(next_pc);
+		for (i = 0; i < NRPTE; i++) {
+			pte_t *p = pc->ptes[i];
+
+			if (!p)
+				continue;
+			if (victim_i == -1) 
+				victim_i = i;
+
+			switch (try_to_unmap_one(page, p)) {
+			case SWAP_SUCCESS:
+				/*
+				 * Release a slot.  If we're releasing the
+				 * first pte in the first pte_chain then
+				 * pc->ptes[i] and start->ptes[victim_i] both
+				 * refer to the same thing.  It works out.
+				 */
+				pc->ptes[i] = start->ptes[victim_i];
+				start->ptes[victim_i] = NULL;
+				dec_page_state(nr_reverse_maps);
+				victim_i++;
+				if (victim_i == NRPTE) {
+					page->pte.chain = start->next;
+					pte_chain_free(start);
+					start = page->pte.chain;
+					victim_i = 0;
+				}
+				break;
+			case SWAP_AGAIN:
+				/* Skip this pte, remembering status. */
+				ret = SWAP_AGAIN;
+				continue;
+			case SWAP_FAIL:
+				ret = SWAP_FAIL;
+				goto out;
+			case SWAP_ERROR:
+				ret = SWAP_ERROR;
+				goto out;
 			}
 		}
-		/* Check whether we can convert to direct pte pointer */
-		pc = page->pte.chain;
-		if (pc && !pc->next) {
-			page->pte.direct = pc->ptep;
-			SetPageDirect(page);
-			pte_chain_free(pc, NULL, NULL);
-		}
 	}
+out:
 	return ret;
 }
 
@@ -384,14 +471,9 @@ int try_to_unmap(struct page * page)
  * called for new pte_chain structures which aren't on any list yet.
  * Caller needs to hold the rmap_lock if the page is non-NULL.
  */
-static inline void pte_chain_free(struct pte_chain * pte_chain,
-		struct pte_chain * prev_pte_chain, struct page * page)
+static void pte_chain_free(struct pte_chain *pte_chain)
 {
-	if (prev_pte_chain)
-		prev_pte_chain->next = pte_chain->next;
-	else if (page)
-		page->pte.chain = pte_chain->next;
-
+	pte_chain->next = NULL;
 	kmem_cache_free(pte_chain_cache, pte_chain);
 }
 
@@ -406,6 +488,13 @@ static inline struct pte_chain *pte_chai
 	return kmem_cache_alloc(pte_chain_cache, GFP_ATOMIC);
 }
 
+static void pte_chain_ctor(void *p, kmem_cache_t *cachep, unsigned long flags)
+{
+	struct pte_chain *pc = p;
+
+	memset(pc, 0, sizeof(*pc));
+}
+
 void __init pte_chain_init(void)
 {
 	int i;
@@ -417,7 +506,7 @@ void __init pte_chain_init(void)
 						sizeof(struct pte_chain),
 						0,
 						0,
-						NULL,
+						pte_chain_ctor,
 						NULL);
 
 	if (!pte_chain_cache)

.

  parent reply	other threads:[~2002-08-05  6:52 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-02 19:42 [PATCH] Rmap speedup Daniel Phillips
2002-08-02 20:20 ` Andrew Morton
2002-08-02 21:40   ` William Lee Irwin III
2002-08-03  0:14   ` Rik van Riel
2002-08-03  0:31     ` Andrew Morton
2002-08-03  0:52       ` William Lee Irwin III
2002-08-03  0:56       ` Rik van Riel
2002-08-03  3:47   ` Daniel Phillips
2002-08-03  5:24     ` Andrew Morton
2002-08-03 18:43       ` Daniel Phillips
2002-08-03 21:40         ` Andrew Morton
2002-08-03 21:54           ` Rik van Riel
2002-08-03 22:49           ` Daniel Phillips
2002-08-03 23:55             ` Gerrit Huizenga
2002-08-04  0:47             ` Andrew Morton
2002-08-04  1:01               ` Daniel Phillips
2002-08-04 14:11                 ` Thunder from the hill
2002-08-04 14:47                   ` Zwane Mwaikambo
2002-08-04 16:55                   ` Tobias Ringstrom
2002-08-03 23:36           ` Daniel Phillips
2002-08-04  0:44             ` Andrew Morton
2002-08-03 21:05       ` Rik van Riel
2002-08-03 21:36         ` Daniel Phillips
2002-08-03 21:43         ` Andrew Morton
2002-08-03 21:41           ` Daniel Phillips
2002-08-03 21:24       ` [PATCH] Rmap speedup... call for testing Daniel Phillips
2002-08-03 22:05       ` [PATCH] Rmap speedup Daniel Phillips
2002-08-03 22:39         ` Andrew Morton
2002-08-03 22:35           ` Daniel Phillips
2002-08-04 23:33 ` Andrew Morton
2002-08-05  0:35   ` Daniel Phillips
2002-08-05  7:05   ` Andrew Morton [this message]
2002-08-05 13:48     ` Daniel Phillips
2002-08-05 13:57       ` Rik van Riel
2002-08-05 18:16         ` Andrew Morton
2002-08-07 18:59     ` Daniel Phillips
2002-08-07 19:40       ` Andrew Morton
2002-08-07 20:17         ` Daniel Phillips
2002-08-07 20:34           ` Andrew Morton
2002-08-07 20:51             ` Daniel Phillips
2002-08-07 20:54               ` Rik van Riel
2002-08-07 22:21                 ` Daniel Phillips
2002-08-07 22:48                   ` Andrew Morton
2002-08-07 20:39           ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D4E23C4.F746CF3D@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phillips@arcor.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.