public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 230-objrmap fixes for 2.6.3-mjb2
@ 2004-03-03  7:09 Andrea Arcangeli
  2004-03-03 10:58 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Andrea Arcangeli @ 2004-03-03  7:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Martin J. Bligh, Hugh Dickins, William Lee Irwin III

While merging 230-objrmap in my tree I spotted 2 bugs potentially
generating random memory corruption and 1 superflous bit that I dropped
(mostly for documentation reasons, I like strict and in turn self
documenting). Here below the fixes.

in the first file we needs the page_table_lock while changing the rbtree
etc... both the page_table_lock and the down_write must be held during
all writes, so the reader can choose between a down_read or a spin_lock.

The second one is a bug in mainline 2.6 too apparently, maybe I'm
missing something but I don't see how you prevent the vm to swapout a
reserved region without my fix. A reserved region must not be messed
from the vm since it's a dma hardware region that we page lazily instead
of using PG_reserved (or MMIO) + remap_page_range. It's different from
VM_LOCKED so you can't clear that bit IIRC but that's the same,
VM_LOCKED == VM_RESERVED in VM terms.  As said I believe you inherit
this bug from mainline 2.6 (2.4 has always been safe instead).

The third is a superflous down_read, it's not needed because the
page_table_lock is held during the call and it seems not to need to drop
it to schedule (and either we use the spinlock or the semaphore, both
doesn't make much sense for a reader).

Please double check, thanks.

I'm running some shm swap regression test on this right now and I'll
leave it running for a day. In a few hours I will proceed starting
dropping the pte_chain from the page sturcture and then I'll test the
anon swapout. I will also follow the 6 great-effort anobjrmap posted by
Hugh against objrmap while doing that, they're quite old (almost 1 year)
but they still apply cleanly by hand so they're useful.

--- sles-objrmap/mm/mmap.c.~1~	2004-03-03 06:45:38.980596736 +0100
+++ sles-objrmap/mm/mmap.c	2004-03-03 06:53:46.945414808 +0100
@@ -1284,8 +1284,8 @@ int do_munmap(struct mm_struct *mm, unsi
 	/*
 	 * Remove the vma's, and unmap the actual pages
 	 */
-	detach_vmas_to_be_unmapped(mm, mpnt, prev, end);
 	spin_lock(&mm->page_table_lock);
+	detach_vmas_to_be_unmapped(mm, mpnt, prev, end);
 	unmap_region(mm, mpnt, prev, start, end);
 	spin_unlock(&mm->page_table_lock);
 
--- sles-objrmap/mm/rmap.c.~1~	2004-03-03 06:45:38.995594456 +0100
+++ sles-objrmap/mm/rmap.c	2004-03-03 07:01:39.200621104 +0100
@@ -470,7 +470,7 @@ try_to_unmap_obj_one(struct vm_area_stru
 	if (!pte)
 		goto out;
 
-	if (vma->vm_flags & VM_LOCKED) {
+	if (vma->vm_flags & (VM_LOCKED|VM_RESERVED)) {
 		ret =  SWAP_FAIL;
 		goto out_unmap;
 	}
--- sles-objrmap/mm/swapfile.c.~1~	2004-03-03 06:45:39.023590200 +0100
+++ sles-objrmap/mm/swapfile.c	2004-03-03 07:03:33.128301464 +0100
@@ -499,7 +499,6 @@ static int unuse_process(struct mm_struc
 	/*
 	 * Go through process' page directory.
 	 */
-	down_read(&mm->mmap_sem);
 	spin_lock(&mm->page_table_lock);
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		pgd_t * pgd = pgd_offset(mm, vma->vm_start);
@@ -507,7 +506,6 @@ static int unuse_process(struct mm_struc
 			break;
 	}
 	spin_unlock(&mm->page_table_lock);
-	up_read(&mm->mmap_sem);
 	pte_chain_free(pte_chain);
 	return 0;
 }



About 2.5:1.5 it seems not everybody is happy to lose 512m (and it's not
Oracle), but before ruling it out I'd like to get some real life number,
to be sure the performance of 2.0^W4:4 are really close (if not
"better") than 3:1 as someone said. If we go with 4:4 IMHO at the very
least the vgettimeofday backport from x86-64 is a must. In the meantime
I keep going with the rmap removal to fixup the fork  and to get back
the 128m of normal zone useful on the 32G boxes. Could be also that new
cpus are a lot better at reloading the tlbs from the pagetables dunno,
the first numbers I recall about 4:4 predates to 2000 when PII was quite
optimal.  I'd only like to see an opteron and a xeon dealing with 4:4.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2004-03-27 15:30 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-03  7:09 230-objrmap fixes for 2.6.3-mjb2 Andrea Arcangeli
2004-03-03 10:58 ` Andrew Morton
2004-03-03 15:46   ` Martin J. Bligh
2004-03-03 15:58     ` Dave McCracken
2004-03-03 16:08       ` Christoph Hellwig
2004-03-03 19:04         ` Mike Kravetz
2004-03-04 15:41         ` Rik van Riel
2004-03-04 15:47           ` Christoph Hellwig
2004-03-04 16:21             ` Rik van Riel
2004-03-04 17:03           ` Matt Mackall
2004-03-03 16:57     ` Andrea Arcangeli
2004-03-03 17:07       ` Dave McCracken
2004-03-03 18:39         ` Andrea Arcangeli
2004-03-03 18:44           ` Dave McCracken
2004-03-03 18:51             ` Andrea Arcangeli
2004-03-03 19:01               ` Dave McCracken
2004-03-03 21:05                 ` Andrea Arcangeli
2004-03-03 21:39               ` Martin J. Bligh
2004-03-03 23:16                 ` Andrea Arcangeli
2004-03-03 17:06   ` Andrea Arcangeli
2004-03-03 15:54 ` Martin J. Bligh
2004-03-03 16:14   ` Andrea Arcangeli
2004-03-25 21:45 ` Andrea Arcangeli
2004-03-26 11:57   ` Hugh Dickins
2004-03-26 18:02     ` Andrea Arcangeli
2004-03-26 23:25       ` Andrew Morton
2004-03-27 14:24         ` Andrea Arcangeli
2004-03-27 15:29         ` Dave McCracken

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox