public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hugh Dickins <hughd@google.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Shaohua Li <shaohua.li@intel.com>, Michal Hocko <mhocko@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, kosaki.motohiro@gmail.com
Subject: Re: [PATCH] fix bug introduced in "mm: simplify find_vma_prev()"
Date: Tue, 06 Mar 2012 19:27:32 -0500	[thread overview]
Message-ID: <4F56AB74.2080301@gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.1203061326120.22195@file.rdu.redhat.com>

(3/6/12 1:57 PM), Mikulas Patocka wrote:
>
>
> On Sun, 4 Mar 2012, Linus Torvalds wrote:
>
>> On Sun, Mar 4, 2012 at 4:52 PM, Mikulas Patocka<mpatocka@redhat.com>  wrote:
>>>
>>> This patch fixes a bug introduced in "mm: simplify find_vma_prev()". You
>>> can apply this, or alternatively revert the original patch.
>>
>> Hmm.
>>
>> I realize the patch is a bug-fix, but I do wonder if we shouldn't look
>> at alternatives.
>>
>> In particular, how about we just change the rule to be clearer, and
>> make the rule be:
>>
>>   - no more find_vma_prev() AT ALL.
>>
>>   - callers get turned into "find_vma()" and then we have a
>> "vma_prev()", which basically does your thing.
>>
>>   - many of the callers seem to be interested in "prev" *only* if they
>> found a vma. For example, the do_mlock() kind of logic seems to be
>> common:
>>
>>      vma = find_vma_prev(current->mm, start,&prev);
>>      if (!vma || vma->vm_start>  start)
>>          return -ENOMEM;
>>
>>     which implies that the current find_vma_prev() semantics are really
>> wrong, because they force us to do extra work in this case where we
>> really really don't care about the result.
>>
>> So we could do an entirely mechanical conversion of
>>
>>     vma = find_vma_prev(mm, address,&prev_vma);
>>
>> into
>>
>>     vma = find_vma(mm, address);
>>     prev_vma = vma_prev(vma);
>>
>> and then after we've done that conversion, it looks like the bulk of
>> them will not care about "prev_vma" if vma is NULL, so they can then
>> be replaced with a pure
>>
>>     prev_vma = vma->vm_prev;
>>
>> instead. Leaving just the (few) cases that may care about the
>> "previous vma may be the last vma of the VM" case.
>>
>> Hmm? And we'd get away from that horrible "find_vma_prev()" interface
>> that uses pointers to vma pointers for return values. Ugh. There only
>> seems to be nine callers in the whole kernel.
>>
>>                   Linus
>
> You can try this to remove most users of find_vma_prev (only three are
> left --- those dealing with up-growing stack). But the patch should be
> delayed until the next merge window.
>
> Mikulas
>

Thank you, Mikulas for finding and fixing the bug. And sorry for the long
delay. I was offlined a while.

Following is the replacing last three caller of find_vma_prev. oh, I use
find_last_vma() instead vma_prev(vma) because parisc need last vma search
when Milulas's original testcase. Does this make sense to you? if yes, we
can remove find_vma_prev() completely.



=========================
Subject: [PATCH] mm: introduce find_last_vma()

Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
  arch/ia64/mm/fault.c   |   20 +++++++++++++-------
  arch/parisc/mm/fault.c |    6 +++---
  include/linux/mm.h     |    3 +++
  mm/mmap.c              |   32 +++++++++++++++++++++++++++++++-
  4 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index 20b3593..27ca30c 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -112,18 +112,16 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
  
  	down_read(&mm->mmap_sem);
  
-	vma = find_vma_prev(mm, address, &prev_vma);
-	if (!vma && !prev_vma )
-		goto bad_area;
+	vma = find_vma(mm, address);
  
-        /*
-         * find_vma_prev() returns vma such that address < vma->vm_end or NULL
+	/*
+         * find_vma() returns vma such that address < vma->vm_end or NULL
           *
           * May find no vma, but could be that the last vm area is the
           * register backing store that needs to expand upwards, in
-         * this case vma will be null, but prev_vma will ne non-null
+         * this case vma will be null and a stack need to be expanded.
           */
-        if (( !vma && prev_vma ) || (address < vma->vm_start) )
+	if (!vma || (address < vma->vm_start))
  		goto check_expansion;
  
    good_area:
@@ -177,6 +175,14 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
  	return;
  
    check_expansion:
+	if (vma)
+		prev_vma = vma->vm_prev;
+	else {
+		prev_vma = find_last_vma(mm);
+		if (!prev_vma)
+			goto bad_area;
+	}
+
  	if (!(prev_vma && (prev_vma->vm_flags & VM_GROWSUP) && (address == prev_vma->vm_end))) {
  		if (!vma)
  			goto bad_area;
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index 18162ce..7b5a466 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -170,7 +170,7 @@ int fixup_exception(struct pt_regs *regs)
  void do_page_fault(struct pt_regs *regs, unsigned long code,
  			      unsigned long address)
  {
-	struct vm_area_struct *vma, *prev_vma;
+	struct vm_area_struct *vma;
  	struct task_struct *tsk = current;
  	struct mm_struct *mm = tsk->mm;
  	unsigned long acc_type;
@@ -180,7 +180,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
  		goto no_context;
  
  	down_read(&mm->mmap_sem);
-	vma = find_vma_prev(mm, address, &prev_vma);
+	vma = find_vma(mm, address);
  	if (!vma || address < vma->vm_start)
  		goto check_expansion;
  /*
@@ -222,7 +222,7 @@ good_area:
  	return;
  
  check_expansion:
-	vma = prev_vma;
+	vma = vma ? vma->vm_prev : find_last_vma(mm);
  	if (vma && (expand_stack(vma, address) == 0))
  		goto good_area;
  
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 17b27cd..d758818 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1463,6 +1463,9 @@ extern int expand_upwards(struct vm_area_struct *vma, unsigned long address);
  
  /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
  extern struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long addr);
+#if defined(CONFIG_STACK_GROWSUP) || defined(CONFIG_IA64)
+extern struct vm_area_struct *find_last_vma(struct mm_struct *mm);
+#endif
  extern struct vm_area_struct * find_vma_prev(struct mm_struct * mm, unsigned long addr,
  					     struct vm_area_struct **pprev);
  
diff --git a/mm/mmap.c b/mm/mmap.c
index 22e1a0b..3e9c186 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1605,6 +1605,32 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
  
  EXPORT_SYMBOL(find_vma);
  
+#if defined(CONFIG_STACK_GROWSUP) || defined(CONFIG_IA64)
+/* Look up the last VMA,  NULL if mm have no vma. */
+struct vm_area_struct *find_last_vma(struct mm_struct *mm)
+{
+	struct vm_area_struct *vma = NULL;
+	struct rb_node *rb_node;
+
+	BUG_ON(!mm);
+
+	rb_node = mm->mm_rb.rb_node;
+	while (rb_node) {
+		vma = rb_entry(rb_node, struct vm_area_struct, vm_rb);
+		rb_node = rb_node->rb_right;
+	}
+
+	/*
+	 * This function is mainly used for a page fault. Thus recording vma
+	 * may improve the performance.
+	 */
+	if (vma)
+		mm->mmap_cache = vma;
+
+	return vma;
+}
+#endif
+
  /*
   * Same as find_vma, but also return a pointer to the previous VMA in *pprev.
   * Note: pprev is set to NULL when return value is NULL.
@@ -1789,9 +1815,13 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
  	struct vm_area_struct *vma, *prev;
  
  	addr &= PAGE_MASK;
-	vma = find_vma_prev(mm, addr, &prev);
+	vma = find_vma(mm, addr);
  	if (vma && (vma->vm_start <= addr))
  		return vma;
+
+	prev = vma->vm_prev;
+	if (!prev)
+		prev = find_last_vma(mm);
  	if (!prev || expand_stack(prev, addr))
  		return NULL;
  	if (prev->vm_flags & VM_LOCKED) {
-- 
1.7.1



  reply	other threads:[~2012-03-07  0:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-05  0:52 [PATCH] fix bug introduced in "mm: simplify find_vma_prev()" Mikulas Patocka
2012-03-05  1:22 ` Linus Torvalds
2012-03-06 18:57   ` Mikulas Patocka
2012-03-07  0:27     ` KOSAKI Motohiro [this message]
2012-03-07  0:30 ` KOSAKI Motohiro
2012-03-07  2:27   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F56AB74.2080301@gmail.com \
    --to=kosaki.motohiro@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=mpatocka@redhat.com \
    --cc=shaohua.li@intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox