public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* downgrade_write replacement in remap_file_pages
@ 2004-06-08 15:44 Andrea Arcangeli
  2004-06-08 16:31 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Andrea Arcangeli @ 2004-06-08 15:44 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds; +Cc: linux-kernel

Apparently downgrade_write deadlocks the kernel in the mmap_sem
under load. I suspect some rwsem bug. Anyways it's matter of time before
in my tree I replace the rwsem implementation with my spinlock based
common code implementation again that I can understand trivially (unlike
the current code). I don't mind a microoptimization when the code is so
complicated, so I don't mind too much to fix whatever current bug in
downgrade_write.

In the meantime to workaround the deadlock (and to verify if this make
the deadlock go away, which returned a positive result) I implemented
this patch: this doesn't fix downgrade_wite, but I'm posting it here
because I believe it's much better code regardless of the
downgrade_write issue.  With this patch we'll never run down_write again
in production, the down_write will be taken only once when the db or the
simulator startup (during the very first page fault) and never again, in
turn providing (at least in theory) better runtime scalability.

Index: linux-2.5/mm/fremap.c
===================================================================
RCS file: /home/andrea/crypto/cvs/linux-2.5/mm/fremap.c,v
retrieving revision 1.24
diff -u -p -r1.24 fremap.c
--- linux-2.5/mm/fremap.c	23 May 2004 05:07:26 -0000	1.24
+++ linux-2.5/mm/fremap.c	8 Jun 2004 15:38:21 -0000
@@ -161,6 +161,7 @@ asmlinkage long sys_remap_file_pages(uns
 	unsigned long end = start + size;
 	struct vm_area_struct *vma;
 	int err = -EINVAL;
+	int has_write_lock = 0;
 
 	if (__prot)
 		return err;
@@ -181,7 +182,8 @@ asmlinkage long sys_remap_file_pages(uns
 #endif
 
 	/* We need down_write() to change vma->vm_flags. */
-	down_write(&mm->mmap_sem);
+	down_read(&mm->mmap_sem);
+ retry:
 	vma = find_vma(mm, start);
 
 	/*
@@ -198,8 +200,13 @@ asmlinkage long sys_remap_file_pages(uns
 				end <= vma->vm_end) {
 
 		/* Must set VM_NONLINEAR before any pages are populated. */
-		if (pgoff != linear_page_index(vma, start) &&
-		    !(vma->vm_flags & VM_NONLINEAR)) {
+		if (unlikely(pgoff != linear_pgoff && !(vma->vm_flags & VM_NONLINEAR))) {
+			if (!has_write_lock) {
+				up_read(&mm->mmap_sem);
+				down_write(&mm->mmap_sem);
+				has_write_lock = 1;
+				goto retry;
+			}
 			mapping = vma->vm_file->f_mapping;
 			spin_lock(&mapping->i_mmap_lock);
 			flush_dcache_mmap_lock(mapping);
@@ -212,8 +219,6 @@ asmlinkage long sys_remap_file_pages(uns
 			spin_unlock(&mapping->i_mmap_lock);
 		}
 
-		/* ->populate can take a long time, so downgrade the lock. */
-		downgrade_write(&mm->mmap_sem);
 		err = vma->vm_ops->populate(vma, start, size,
 					    vma->vm_page_prot,
 					    pgoff, flags & MAP_NONBLOCK);
@@ -223,10 +228,11 @@ asmlinkage long sys_remap_file_pages(uns
 		 * it after ->populate completes, and that would prevent
 		 * downgrading the lock.  (Locks can't be upgraded).
 		 */
+	}
+	if (likely(!has_write_lock))
 		up_read(&mm->mmap_sem);
-	} else {
+	else
 		up_write(&mm->mmap_sem);
-	}
 
 	return err;
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2004-06-28 11:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-08 15:44 downgrade_write replacement in remap_file_pages Andrea Arcangeli
2004-06-08 16:31 ` Andrew Morton
2004-06-08 16:39   ` Linus Torvalds
2004-06-08 19:04   ` David Howells
2004-06-08 17:05 ` David Howells
2004-06-08 22:33   ` Andrea Arcangeli
2004-06-08 19:36 ` William Lee Irwin III
2004-06-08 22:52   ` Andrea Arcangeli
2004-06-09 12:19   ` [PATCH] A generic_file_sendpage() Alexander Nyberg
2004-06-10 19:49     ` Pavel Machek
2004-06-25 19:19     ` Jörn Engel
2004-06-25 19:46       ` viro
2004-06-25 20:03         ` Jörn Engel
2004-06-26  0:53           ` Trond Myklebust
2004-06-28 11:41             ` Jörn Engel
2004-06-25 20:05       ` Andreas Dilger
2004-06-25 20:09         ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox