From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S263626AbTDNSZY (for ); Mon, 14 Apr 2003 14:25:24 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263711AbTDNSZL (for ); Mon, 14 Apr 2003 14:25:11 -0400 Received: from 217-125-129-224.uc.nombres.ttd.es ([217.125.129.224]:4081 "HELO cocodriloo.com") by vger.kernel.org with SMTP id S263626AbTDNSY3 (for ); Mon, 14 Apr 2003 14:24:29 -0400 Date: Mon, 14 Apr 2003 20:47:15 +0200 From: Antonio Vargas To: Antonio Vargas Cc: "Martin J. Bligh" , linux-kernel@vger.kernel.org, nicoya@apia.dhs.org Subject: Re: cow-ahead N pages for fault clustering Message-ID: <20030414184715.GI14552@wind.cocodriloo.com> References: <10760000.1050332136@[10.10.2.4]> <20030414152947.GB14552@wind.cocodriloo.com> <12790000.1050334744@[10.10.2.4]> <20030414155748.GD14552@wind.cocodriloo.com> <14860000.1050337484@[10.10.2.4]> <20030414164321.GE14552@wind.cocodriloo.com> <15700000.1050338226@[10.10.2.4]> <20030414171419.GG14552@wind.cocodriloo.com> <16700000.1050340965@[10.10.2.4]> <20030414183251.GH14552@wind.cocodriloo.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="O3RTKUHj+75w1tg5" Content-Disposition: inline In-Reply-To: <20030414183251.GH14552@wind.cocodriloo.com> User-Agent: Mutt/1.3.28i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org --O3RTKUHj+75w1tg5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, Apr 14, 2003 at 08:32:51PM +0200, Antonio Vargas wrote: > On Mon, Apr 14, 2003 at 10:22:46AM -0700, Martin J. Bligh wrote: > > >> Ah, you probably don't want to do that ... it's very expensive. Moreover, > > >> if you exec 2ns later, all the effort will be wasted ... and it's very > > >> hard to deterministically predict whether you'll exec or not (stupid > > >> UNIX semantics). Doing it lazily is probably best, and as to "nodes > > >> would not have to reference the memory from others" - you're still > > >> doing that, you're just batching it on the front end. > > > > > > True... What about a vma-level COW-ahead just like we have a file-level > > > read-ahead, then? I mean batching the COW at unCOW-because-of-write time. > > > > That'd be interesting ... and you can test that on a UP box, is not just > > NUMA. Depends on the workload quite heavily, I suspect. > > > > > btw, COW-ahead sound really silly :) > > > > Yeah. So be sure to call it that if it works out ... we need more things > > like that ;-) Moooooo. > > What about the attached one? I'm compiling it right now to test in UML :) > > [ snip fake-NUMA-on-SMP discussion ] > OK, too quick for me... this next one applies, compiles and boots on 2.5.66 + uml. Now I wonder how can I test if this is useful... ideas? Greets, Antonio. --O3RTKUHj+75w1tg5 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="cow-ahead.patch" mm/memory.c | 34 +++++++++++++++++++++++++++++----- 1 files changed, 29 insertions(+), 5 deletions(-) diff -puN mm/memory.c~cow-ahead mm/memory.c --- 25/mm/memory.c~cow-ahead Mon Apr 14 20:08:44 2003 +++ 25-wind/mm/memory.c Mon Apr 14 20:37:42 2003 @@ -1452,7 +1452,7 @@ static int do_file_page(struct mm_struct */ static inline int handle_pte_fault(struct mm_struct *mm, struct vm_area_struct * vma, unsigned long address, - int write_access, pte_t *pte, pmd_t *pmd) + int write_access, pte_t *pte, pmd_t *pmd, int *cowahead) { pte_t entry; @@ -1471,8 +1471,11 @@ static inline int handle_pte_fault(struc } if (write_access) { - if (!pte_write(entry)) + if (!pte_write(entry)) { + if(!*cowahead) + *cowahead = 1; return do_wp_page(mm, vma, address, pte, pmd, entry); + } entry = pte_mkdirty(entry); } @@ -1492,6 +1495,17 @@ int handle_mm_fault(struct mm_struct *mm pgd_t *pgd; pmd_t *pmd; + int cowahead, i; + int retval, x; + + /* + * Implement cow-ahead: copy-on-write several + * pages when we fault one of them + */ + + i = cowahead = 0; + +do_cowahead: __set_current_state(TASK_RUNNING); pgd = pgd_offset(mm, address); @@ -1507,10 +1521,20 @@ int handle_mm_fault(struct mm_struct *mm spin_lock(&mm->page_table_lock); pmd = pmd_alloc(mm, pgd, address); - if (pmd) { + while (pmd) { pte_t * pte = pte_alloc_map(mm, pmd, address); - if (pte) - return handle_pte_fault(mm, vma, address, write_access, pte, pmd); + if (!pte) break; + + x = handle_pte_fault(mm, vma, address, write_access, pte, pmd, &cowahead); + if(!i) retval = x; + + i++; + address += PAGE_SIZE; + + if(!cowahead || i >= 0 || address >= vma->vm_end) + return retval; + + goto do_cowahead; } spin_unlock(&mm->page_table_lock); return VM_FAULT_OOM; _ --O3RTKUHj+75w1tg5--