All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: Linus Torvalds <torvalds@osdl.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Andrew Morton <akpm@osdl.org>, Robin Holt <holt@sgi.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, Ingo Molnar <mingo@elte.hu>,
	Roland McGrath <roland@redhat.com>
Subject: Re: [patch 2.6.13-rc4] fix get_user_pages bug
Date: Wed, 03 Aug 2005 20:24:01 +1000	[thread overview]
Message-ID: <42F09B41.3050409@yahoo.com.au> (raw)
In-Reply-To: <Pine.LNX.4.61.0508022150530.10815@goblin.wat.veritas.com>

[-- Attachment #1: Type: text/plain, Size: 687 bytes --]

Hugh Dickins wrote:
> On Tue, 2 Aug 2005, Linus Torvalds wrote:
> 
>>Go for it, I think whatever we do won't be wonderfully pretty.
> 
> 
> Here we are: get_user_pages quite untested, let alone the racy case,
> but I think it should work.  Please all hack it around as you see fit,
> I'll check mail when I get home, but won't be very responsive...
> 

Seems OK to me. I don't know why you think handle_mm_fault can't
be inline, but if it can be, then I have a modification attached
that removes the condition - any good?

Oh, it gets rid of the -1 for VM_FAULT_OOM. Doesn't seem like there
is a good reason for it, but might that break out of tree drivers?

-- 
SUSE Labs, Novell Inc.


[-- Attachment #2: mm-gup-hugh.patch --]
[-- Type: text/plain, Size: 6183 bytes --]

Checking pte_dirty instead of pte_write in __follow_page is problematic
for s390, and for copy_one_pte which leaves dirty when clearing write.

So revert __follow_page to check pte_write as before, and let do_wp_page
pass back a special code VM_FAULT_WRITE to say it has done its full job
(whereas VM_FAULT_MINOR when it backs out on race): once get_user_pages
receives this value, it no longer requires pte_write in __follow_page.

But most callers of handle_mm_fault, in the various architectures, have
switch statements which do not expect this new case.  To avoid changing
them all in a hurry, only pass back VM_FAULT_WRITE when write_access arg
says VM_FAULT_WRITE_EXPECTED - chosen as -1 since some arches pass
write_access as a boolean, some as a bitflag, but none as -1.

Yes, we do have a call to do_wp_page from do_swap_page, but no need to
change that: in rare case it's needed, another do_wp_page will follow.

Signed-off-by: Hugh Dickins <hugh@veritas.com>

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -625,10 +625,16 @@ static inline int page_mapped(struct pag
  * Used to decide whether a process gets delivered SIGBUS or
  * just gets major/minor fault counters bumped up.
  */
-#define VM_FAULT_OOM	(-1)
-#define VM_FAULT_SIGBUS	0
-#define VM_FAULT_MINOR	1
-#define VM_FAULT_MAJOR	2
+#define VM_FAULT_OOM	0x00
+#define VM_FAULT_SIGBUS	0x01
+#define VM_FAULT_MINOR	0x02
+#define VM_FAULT_MAJOR	0x03
+
+/* 
+ * Special case for get_user_pages.
+ * Must be in a distinct bit from the above VM_FAULT_ flags.
+ */
+#define VM_FAULT_WRITE	0x10
 
 #define offset_in_page(p)	((unsigned long)(p) & ~PAGE_MASK)
 
@@ -704,7 +710,13 @@ extern pte_t *FASTCALL(pte_alloc_kernel(
 extern pte_t *FASTCALL(pte_alloc_map(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
 extern int install_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, struct page *page, pgprot_t prot);
 extern int install_file_pte(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long pgoff, pgprot_t prot);
-extern int handle_mm_fault(struct mm_struct *mm,struct vm_area_struct *vma, unsigned long address, int write_access);
+extern int __handle_mm_fault(struct mm_struct *mm,struct vm_area_struct *vma, unsigned long address, int write_access);
+
+static inline int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access)
+{
+	return __handle_mm_fault(mm, vma, address, write_access) & (~VM_FAULT_WRITE);
+}
+
 extern int make_pages_present(unsigned long addr, unsigned long end);
 extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
 void install_arg_page(struct vm_area_struct *, struct page *, unsigned long);
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -811,15 +811,18 @@ static struct page *__follow_page(struct
 	pte = *ptep;
 	pte_unmap(ptep);
 	if (pte_present(pte)) {
-		if (write && !pte_dirty(pte))
+		if (write && !pte_write(pte))
 			goto out;
 		if (read && !pte_read(pte))
 			goto out;
 		pfn = pte_pfn(pte);
 		if (pfn_valid(pfn)) {
 			page = pfn_to_page(pfn);
-			if (accessed)
+			if (accessed) {
+				if (write && !pte_dirty(pte) &&!PageDirty(page))
+					set_page_dirty(page);
 				mark_page_accessed(page);
+			}
 			return page;
 		}
 	}
@@ -941,10 +944,11 @@ int get_user_pages(struct task_struct *t
 		}
 		spin_lock(&mm->page_table_lock);
 		do {
+			int write_access = write;
 			struct page *page;
 
 			cond_resched_lock(&mm->page_table_lock);
-			while (!(page = follow_page(mm, start, write))) {
+			while (!(page = follow_page(mm, start, write_access))) {
 				/*
 				 * Shortcut for anonymous pages. We don't want
 				 * to force the creation of pages tables for
@@ -957,7 +961,16 @@ int get_user_pages(struct task_struct *t
 					break;
 				}
 				spin_unlock(&mm->page_table_lock);
-				switch (handle_mm_fault(mm,vma,start,write)) {
+				switch (__handle_mm_fault(mm, vma, start,
+							write_access)) {
+				case VM_FAULT_WRITE:
+					/*
+					 * do_wp_page has broken COW when
+					 * necessary, even if maybe_mkwrite
+					 * decided not to set pte_write
+					 */
+					write_access = 0;
+					/* FALLTHRU */
 				case VM_FAULT_MINOR:
 					tsk->min_flt++;
 					break;
@@ -1220,6 +1233,7 @@ static int do_wp_page(struct mm_struct *
 	struct page *old_page, *new_page;
 	unsigned long pfn = pte_pfn(pte);
 	pte_t entry;
+	int ret;
 
 	if (unlikely(!pfn_valid(pfn))) {
 		/*
@@ -1247,7 +1261,7 @@ static int do_wp_page(struct mm_struct *
 			lazy_mmu_prot_update(entry);
 			pte_unmap(page_table);
 			spin_unlock(&mm->page_table_lock);
-			return VM_FAULT_MINOR;
+			return VM_FAULT_MINOR|VM_FAULT_WRITE;
 		}
 	}
 	pte_unmap(page_table);
@@ -1274,6 +1288,7 @@ static int do_wp_page(struct mm_struct *
 	/*
 	 * Re-check the pte - we dropped the lock
 	 */
+	ret = VM_FAULT_MINOR;
 	spin_lock(&mm->page_table_lock);
 	page_table = pte_offset_map(pmd, address);
 	if (likely(pte_same(*page_table, pte))) {
@@ -1290,12 +1305,13 @@ static int do_wp_page(struct mm_struct *
 
 		/* Free the old page.. */
 		new_page = old_page;
+		ret |= VM_FAULT_WRITE;
 	}
 	pte_unmap(page_table);
 	page_cache_release(new_page);
 	page_cache_release(old_page);
 	spin_unlock(&mm->page_table_lock);
-	return VM_FAULT_MINOR;
+	return ret;
 
 no_new_page:
 	page_cache_release(old_page);
@@ -1987,7 +2003,6 @@ static inline int handle_pte_fault(struc
 	if (write_access) {
 		if (!pte_write(entry))
 			return do_wp_page(mm, vma, address, pte, pmd, entry);
-
 		entry = pte_mkdirty(entry);
 	}
 	entry = pte_mkyoung(entry);
@@ -2002,7 +2017,7 @@ static inline int handle_pte_fault(struc
 /*
  * By the time we get here, we already hold the mm semaphore
  */
-int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct * vma,
+int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct * vma,
 		unsigned long address, int write_access)
 {
 	pgd_t *pgd;

  reply	other threads:[~2005-08-03 10:24 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-07-30 20:53 get_user_pages() with write=1 and force=1 gets read-only pages Robin Holt
2005-07-30 22:13 ` Hugh Dickins
2005-07-31  1:52   ` Nick Piggin
2005-07-31 10:52     ` Robin Holt
2005-07-31 11:07       ` Nick Piggin
2005-07-31 11:30         ` Robin Holt
2005-07-31 11:39           ` Robin Holt
2005-07-31 12:09           ` Robin Holt
2005-07-31 22:27             ` Nick Piggin
2005-08-01  3:22               ` Roland McGrath
2005-08-01  8:21                 ` [patch 2.6.13-rc4] fix get_user_pages bug Nick Piggin
2005-08-01  9:19                   ` Ingo Molnar
2005-08-01  9:19                     ` Ingo Molnar
2005-08-01  9:27                     ` Nick Piggin
2005-08-01  9:27                       ` Nick Piggin
2005-08-01 10:15                       ` Ingo Molnar
2005-08-01 10:15                         ` Ingo Molnar
2005-08-01 10:57                         ` Nick Piggin
2005-08-01 10:57                           ` Nick Piggin
2005-08-01 19:43                           ` Hugh Dickins
2005-08-01 19:43                             ` Hugh Dickins
2005-08-01 20:08                             ` Linus Torvalds
2005-08-01 20:08                               ` Linus Torvalds
2005-08-01 21:06                               ` Hugh Dickins
2005-08-01 21:06                                 ` Hugh Dickins
2005-08-01 21:51                                 ` Linus Torvalds
2005-08-01 21:51                                   ` Linus Torvalds
2005-08-01 22:01                                   ` Linus Torvalds
2005-08-01 22:01                                     ` Linus Torvalds
2005-08-02 12:01                                     ` Martin Schwidefsky
2005-08-02 12:01                                       ` Martin Schwidefsky
2005-08-02 12:26                                       ` Hugh Dickins
2005-08-02 12:26                                         ` Hugh Dickins
2005-08-02 12:28                                         ` Nick Piggin
2005-08-02 15:19                                         ` Martin Schwidefsky
2005-08-02 15:19                                           ` Martin Schwidefsky
2005-08-02 15:30                                       ` Linus Torvalds
2005-08-02 15:30                                         ` Linus Torvalds
2005-08-02 16:03                                         ` Hugh Dickins
2005-08-02 16:03                                           ` Hugh Dickins
2005-08-02 16:25                                           ` Linus Torvalds
2005-08-02 16:25                                             ` Linus Torvalds
2005-08-02 17:02                                             ` Linus Torvalds
2005-08-02 17:02                                               ` Linus Torvalds
2005-08-02 17:27                                               ` Hugh Dickins
2005-08-02 17:27                                                 ` Hugh Dickins
2005-08-02 17:21                                             ` Hugh Dickins
2005-08-02 17:21                                               ` Hugh Dickins
2005-08-02 18:47                                               ` Linus Torvalds
2005-08-02 18:47                                                 ` Linus Torvalds
2005-08-02 19:20                                                 ` Hugh Dickins
2005-08-02 19:20                                                   ` Hugh Dickins
2005-08-02 19:54                                                   ` Linus Torvalds
2005-08-02 19:54                                                     ` Linus Torvalds
2005-08-02 20:55                                                     ` Hugh Dickins
2005-08-02 20:55                                                       ` Hugh Dickins
2005-08-03 10:24                                                       ` Nick Piggin [this message]
2005-08-03 11:47                                                         ` Hugh Dickins
2005-08-03 11:47                                                           ` Hugh Dickins
2005-08-03 12:13                                                           ` Nick Piggin
2005-08-03 12:13                                                             ` Nick Piggin
2005-08-03 16:12                                                         ` Linus Torvalds
2005-08-03 16:12                                                           ` Linus Torvalds
2005-08-03 16:39                                                           ` Linus Torvalds
2005-08-03 16:39                                                             ` Linus Torvalds
2005-08-03 16:42                                                             ` Linus Torvalds
2005-08-03 16:42                                                               ` Linus Torvalds
2005-08-03 17:12                                                           ` Hugh Dickins
2005-08-03 17:12                                                             ` Hugh Dickins
2005-08-03 23:03                                                           ` Nick Piggin
2005-08-03 23:03                                                             ` Nick Piggin
2005-08-04 14:14                                                           ` Alexander Nyberg
2005-08-04 14:14                                                             ` Alexander Nyberg
2005-08-04 14:30                                                             ` Nick Piggin
2005-08-04 14:30                                                               ` Nick Piggin
2005-08-04 15:00                                                               ` Alexander Nyberg
2005-08-04 15:00                                                                 ` Alexander Nyberg
2005-08-04 15:35                                                                 ` Hugh Dickins
2005-08-04 15:35                                                                   ` Hugh Dickins
2005-08-04 16:32                                                                   ` Russell King
2005-08-04 16:32                                                                     ` Russell King
2005-08-04 15:36                                                                 ` Linus Torvalds
2005-08-04 15:36                                                                   ` Linus Torvalds
2005-08-04 16:29                                                               ` Russell King
2005-08-04 16:29                                                                 ` Russell King
2005-08-03 10:24                                                       ` Martin Schwidefsky
2005-08-03 10:24                                                         ` Martin Schwidefsky
2005-08-03 11:57                                                         ` Hugh Dickins
2005-08-03 11:57                                                           ` Hugh Dickins
2005-08-02 16:44                                         ` Martin Schwidefsky
2005-08-02 16:44                                           ` Martin Schwidefsky
2005-08-01 15:42                   ` Linus Torvalds
2005-08-01 15:42                     ` Linus Torvalds
2005-08-01 18:18                     ` Linus Torvalds
2005-08-01 18:18                       ` Linus Torvalds
2005-08-03  8:24                       ` Robin Holt
2005-08-03  8:24                         ` Robin Holt
2005-08-03 11:31                         ` Hugh Dickins
2005-08-03 11:31                           ` Hugh Dickins
2005-08-04 11:48                           ` Robin Holt
2005-08-04 11:48                             ` Robin Holt
2005-08-04 13:04                             ` Hugh Dickins
2005-08-04 13:04                               ` Hugh Dickins
2005-08-01 19:29                     ` Hugh Dickins
2005-08-01 19:29                       ` Hugh Dickins
2005-08-01 19:48                       ` Linus Torvalds
2005-08-01 19:48                         ` Linus Torvalds
2005-08-02  8:07                         ` Martin Schwidefsky
2005-08-02  8:07                           ` Martin Schwidefsky
2005-08-01 19:57                       ` Andrew Morton
2005-08-01 19:57                         ` Andrew Morton
2005-08-01 20:16                         ` Linus Torvalds
2005-08-01 20:16                           ` Linus Torvalds
2005-08-02  0:14                     ` Nick Piggin
2005-08-02  0:14                       ` Nick Piggin
2005-08-02  1:27                     ` Nick Piggin
2005-08-02  1:27                       ` Nick Piggin
2005-08-02  3:45                       ` Linus Torvalds
2005-08-02  3:45                         ` Linus Torvalds
2005-08-02  4:25                         ` Nick Piggin
2005-08-02  4:25                           ` Nick Piggin
2005-08-02  4:35                           ` Linus Torvalds
2005-08-02  4:35                             ` Linus Torvalds
2005-08-01 20:03                   ` Hugh Dickins
2005-08-01 20:03                     ` Hugh Dickins
2005-08-01 20:12                     ` Andrew Morton
2005-08-01 20:12                       ` Andrew Morton
2005-08-01 20:26                       ` Linus Torvalds
2005-08-01 20:26                         ` Linus Torvalds
2005-08-01 20:51                       ` Hugh Dickins
2005-08-01 20:51                         ` Hugh Dickins
  -- strict thread matches above, loose matches on Subject: below --
2005-08-02 14:02 Dan Higgins
2005-08-02 14:02 ` Dan Higgins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42F09B41.3050409@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=holt@sgi.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=roland@redhat.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.