[RFC][PATCH] vm_operation to avoid pagefault/inval race

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@digeo.com
Cc: mjbligh@us.ibm.com
Subject: [RFC][PATCH] vm_operation to avoid pagefault/inval race
Date: Tue, 13 May 2003 13:53:26 -0700	[thread overview]
Message-ID: <20030513135326.D2929@us.ibm.com> (raw)

This patch adds a vm_operations_struct function pointer that allows
networked and distributed filesystems to avoid a race between a
pagefault on an mmap and an invalidation request from some other
node.  The race goes as follows:

1.	A user process on node A accesses a portion of a mapped
	file, resulting in a page fault.  The pagefault handler
	invokes the corresponding nopage function, which reads
	the page into memory.

2.	A user process on node B writes to the same portion of
	the file (either via mmap or write()), therefore sending
	node A an invalidation request to node A.

3.	Node A receives this invalidate request, and dutifully
	invalidates all mmaps.  Except for the one that has
	not yet been fully mapped by step 1.

4.	Node A then executes the rest of do_no_page(), entering
	the now-invalid page into the PTEs.

5.	One way or another, life is now hard.

One solution would be for the distributed filesystem to hold
onto a lock or semaphore upon return from the nopage function.
The problem is that there is no way to determine (in a timely
fashion) when it safe to release this lock or semaphore.

The attached patch addresses this by adding a nopagedone
function for when do_no_page() exits.  The filesystem may then
drop the lock or semaphore in this nopagedone function.

Thoughts?  Is there some other existing way to get this done?

					Thanx, Paul

diff -urN -X dontdiff linux-2.5.69/include/linux/mm.h linux-2.5.69.stmmap/include/linux/mm.h
--- linux-2.5.69/include/linux/mm.h	Sun May  4 16:53:00 2003
+++ linux-2.5.69.stmmap/include/linux/mm.h	Fri May  9 09:30:37 2003
@@ -134,6 +134,7 @@
 	void (*open)(struct vm_area_struct * area);
 	void (*close)(struct vm_area_struct * area);
 	struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int unused);
+	void (*nopagedone)(struct vm_area_struct * area, unsigned long address, int status);
 	int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
 };

diff -urN -X dontdiff linux-2.5.69/mm/memory.c linux-2.5.69.stmmap/mm/memory.c
--- linux-2.5.69/mm/memory.c	Sun May  4 16:53:14 2003
+++ linux-2.5.69.stmmap/mm/memory.c	Fri May  9 17:04:09 2003
@@ -1426,6 +1487,9 @@
 	ret = VM_FAULT_OOM;
 out:
 	pte_chain_free(pte_chain);
+	if (vma->vm_ops && vma->vm_ops->nopagedone) {
+		vma->vm_ops->nopagedone(vma, address & PAGE_MASK, ret);
+	}
 	return ret;
 }

WARNING: multiple messages have this Message-ID (diff)

From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@digeo.com
Cc: mjbligh@us.ibm.com
Subject: [RFC][PATCH] vm_operation to avoid pagefault/inval race
Date: Tue, 13 May 2003 13:53:26 -0700	[thread overview]
Message-ID: <20030513135326.D2929@us.ibm.com> (raw)

This patch adds a vm_operations_struct function pointer that allows
networked and distributed filesystems to avoid a race between a
pagefault on an mmap and an invalidation request from some other
node.  The race goes as follows:

1.	A user process on node A accesses a portion of a mapped
	file, resulting in a page fault.  The pagefault handler
	invokes the corresponding nopage function, which reads
	the page into memory.

2.	A user process on node B writes to the same portion of
	the file (either via mmap or write()), therefore sending
	node A an invalidation request to node A.

3.	Node A receives this invalidate request, and dutifully
	invalidates all mmaps.  Except for the one that has
	not yet been fully mapped by step 1.

4.	Node A then executes the rest of do_no_page(), entering
	the now-invalid page into the PTEs.

5.	One way or another, life is now hard.

One solution would be for the distributed filesystem to hold
onto a lock or semaphore upon return from the nopage function.
The problem is that there is no way to determine (in a timely
fashion) when it safe to release this lock or semaphore.

The attached patch addresses this by adding a nopagedone
function for when do_no_page() exits.  The filesystem may then
drop the lock or semaphore in this nopagedone function.

Thoughts?  Is there some other existing way to get this done?

					Thanx, Paul

diff -urN -X dontdiff linux-2.5.69/include/linux/mm.h linux-2.5.69.stmmap/include/linux/mm.h
--- linux-2.5.69/include/linux/mm.h	Sun May  4 16:53:00 2003
+++ linux-2.5.69.stmmap/include/linux/mm.h	Fri May  9 09:30:37 2003
@@ -134,6 +134,7 @@
 	void (*open)(struct vm_area_struct * area);
 	void (*close)(struct vm_area_struct * area);
 	struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int unused);
+	void (*nopagedone)(struct vm_area_struct * area, unsigned long address, int status);
 	int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
 };

diff -urN -X dontdiff linux-2.5.69/mm/memory.c linux-2.5.69.stmmap/mm/memory.c
--- linux-2.5.69/mm/memory.c	Sun May  4 16:53:14 2003
+++ linux-2.5.69.stmmap/mm/memory.c	Fri May  9 17:04:09 2003
@@ -1426,6 +1487,9 @@
 	ret = VM_FAULT_OOM;
 out:
 	pte_chain_free(pte_chain);
+	if (vma->vm_ops && vma->vm_ops->nopagedone) {
+		vma->vm_ops->nopagedone(vma, address & PAGE_MASK, ret);
+	}
 	return ret;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next             reply	other threads:[~2003-05-13 21:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-05-13 20:53 Paul E. McKenney [this message]
2003-05-13 20:53 ` [RFC][PATCH] vm_operation to avoid pagefault/inval race Paul E. McKenney
2003-05-17 16:06 ` Christoph Hellwig
2003-05-17 16:06   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2003-05-17 18:21 Daniel Phillips
2003-05-17 18:21 ` Daniel Phillips
2003-05-17 19:49 ` Andrew Morton
2003-05-17 19:49   ` Andrew Morton
2003-05-20  1:23   ` Paul E. McKenney
2003-05-20  1:23     ` Paul E. McKenney
2003-05-20  8:11     ` Andrew Morton
2003-05-20  8:11       ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030513135326.D2929@us.ibm.com \
    --to=paulmck@us.ibm.com \
    --cc=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mjbligh@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.