linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Shan Hai <haishan.bai@gmail.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: tony.luck@intel.com, Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, cmetcalf@tilera.com,
	dhowells@redhat.com, paulus@samba.org, tglx@linutronix.de,
	walken@google.com, linuxppc-dev@lists.ozlabs.org,
	akpm@linux-foundation.org
Subject: Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
Date: Tue, 19 Jul 2011 11:30:25 +0800	[thread overview]
Message-ID: <4E24FA51.70602@gmail.com> (raw)
In-Reply-To: <1310974591.25044.298.camel@pasglop>

On 07/18/2011 03:36 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2011-07-18 at 15:26 +0800, Shan Hai wrote:
>> I am sorry I hadn't tried your newer patch, I tried it but it still
>> could not work in my test environment, I will dig into and tell you
>> why that failed later.
> Ok, please let me know what you find !
>

Have not been finding out the reason why failed,
I tried the following based on your code,
(1)
diff --git a/kernel/futex.c b/kernel/futex.c
index fe28dc2..820556d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -353,10 +353,11 @@ static int fault_in_user_writeable(u32 __user *uaddr)
  {
         struct mm_struct *mm = current->mm;
         int ret;
+       int flags = FOLL_TOUCH | FOLL_GET | FOLL_WRITE | FOLL_FIXFAULT;

         down_read(&mm->mmap_sem);
-       ret = get_user_pages(current, mm, (unsigned long)uaddr,
-                            1, 1, 0, NULL, NULL);
+       ret = __get_user_pages(current, mm, (unsigned long)uaddr, 1,
+                              flags, NULL, NULL, NULL);
         up_read(&mm->mmap_sem);

         return ret < 0 ? ret : 0;

(2)
diff --git a/mm/memory.c b/mm/memory.c
index 9b8a01d..f7ba26e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
...
+
+       if ((flags & (FOLL_WRITE | FOLL_FIXFAULT)) && !pte_dirty(pte))
+               handle_pte_sw_young_dirty(vma, address, ptep,
+                                          FAULT_FLAG_WRITE);
...

And everything lookes good, but still couldn't work, need more 
investigation.

>> Yep, I know holding lots of ifdef's everywhere is not so good,
>> but if we have some other way(I don't know how till now) to
>> figure out the arch has the need to fixup up the write permission
>> we could eradicate the ugly ifdef's here.
>>
>> I think the handle_mm_fault could do all dirty/young tracking,
>> because the purpose of making follow_page return NULL to
>> its caller is that want to the handle_mm_fault to be called
>> on write permission protection fault.
> I see your point. Rather than factoring the fixup code out, we could
> force gup to call handle_mm_fault()... that makes sense.
>
> However, I don't think we should special case archs. There's plenty of
> cases where we don't care about this fixup even on archs that do SW
> tracking of dirty and young. For example when gup is using for
> subsequent DMA.
>
> Only the (rare ?) cases where it's used as a mean to fixup a failing
> "atomic" user access are relevant.
>
> So I believe we should still pass an explicit flag to __get_user_pages()
> as I propose to activate that behaviour.
>

How about the following one?
the write permission fixup behaviour is triggered explicitly by
the trouble making parts like futex as you suggested.

In this way, the follow_page() mimics exactly how the MMU
faults on atomic access to the user pages, and we could handle
the fault by already existing handle_mm_fault which also do
the dirty/young tracking properly.


diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9670f71..8a76694 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1546,6 +1546,7 @@ struct page *follow_page(struct vm_area_struct *, 
unsigned long address,
  #define FOLL_MLOCK    0x40    /* mark page as mlocked */
  #define FOLL_SPLIT    0x80    /* don't return transhuge pages, split 
them */
  #define FOLL_HWPOISON    0x100    /* check page is hwpoisoned */
+#define FOLL_FIXFAULT    0x200    /* fixup after a fault (PTE 
dirty/young upd) */

  typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
              void *data);
diff --git a/kernel/futex.c b/kernel/futex.c
index fe28dc2..820556d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -353,10 +353,11 @@ static int fault_in_user_writeable(u32 __user *uaddr)
  {
      struct mm_struct *mm = current->mm;
      int ret;
+    int flags = FOLL_TOUCH | FOLL_GET | FOLL_WRITE | FOLL_FIXFAULT;

      down_read(&mm->mmap_sem);
-    ret = get_user_pages(current, mm, (unsigned long)uaddr,
-                 1, 1, 0, NULL, NULL);
+    ret = __get_user_pages(current, mm, (unsigned long)uaddr, 1,
+                   flags, NULL, NULL, NULL);
      up_read(&mm->mmap_sem);

      return ret < 0 ? ret : 0;
diff --git a/mm/memory.c b/mm/memory.c
index 9b8a01d..5682501 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1442,6 +1442,7 @@ struct page *follow_page(struct vm_area_struct 
*vma, unsigned long address,
      spinlock_t *ptl;
      struct page *page;
      struct mm_struct *mm = vma->vm_mm;
+    int fix_write_permission = 0;

      page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
      if (!IS_ERR(page)) {
@@ -1519,6 +1520,9 @@ split_fallthrough:
          if ((flags & FOLL_WRITE) &&
              !pte_dirty(pte) && !PageDirty(page))
              set_page_dirty(page);
+
+        if ((flags & (FOLL_WRITE | FOLL_FIXFAULT)) && !pte_dirty(pte))
+            fix_write_permission = 1;
          /*
           * pte_mkyoung() would be more correct here, but atomic care
           * is needed to avoid losing the dirty bit: it is easier to use
@@ -1551,7 +1555,7 @@ split_fallthrough:
  unlock:
      pte_unmap_unlock(ptep, ptl);
  out:
-    return page;
+    return (fix_write_permission) ? NULL : page;

  bad_page:
      pte_unmap_unlock(ptep, ptl);

> At this point, since we have isolated the special case callers, I think
> we are pretty much in a situation where there's no point trying to
> optimize the x86 case more, it's a fairly slow path anyway, and so no
> ifdef should be needed (and x86 already #define out the TLB flush for
> spurious faults in handle_pte_fault today).
>
> We don't even need to change follow_page()... we just don't call it the
> first time around.
>
> I'll cook up another patch later but first we need to find out why the
> one you have doesn't work. There might be another problem lurking (or I
> just made a stupid mistake).
>
> BTW. Can you give me some details about how you reproduce the problem ?
> I should setup something on a booke machine here to verify things.
>
> Cheers,
> Ben.
>

  parent reply	other threads:[~2011-07-19  3:27 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-15  8:07 [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core Shan Hai
2011-07-15  8:07 ` [PATCH 1/1] " Shan Hai
2011-07-15 10:23   ` Peter Zijlstra
2011-07-15 15:18     ` Shan Hai
2011-07-15 15:24       ` Peter Zijlstra
2011-07-16 15:36         ` Shan Hai
2011-07-16 14:50     ` Shan Hai
2011-07-16 23:49       ` Benjamin Herrenschmidt
2011-07-17  9:38         ` Peter Zijlstra
2011-07-17 14:29           ` Benjamin Herrenschmidt
2011-07-17 23:14             ` Benjamin Herrenschmidt
2011-07-18  3:53               ` Benjamin Herrenschmidt
2011-07-18  4:02                 ` Benjamin Herrenschmidt
2011-07-18  4:01               ` Benjamin Herrenschmidt
2011-07-18  6:48                 ` Shan Hai
2011-07-18  7:01                   ` Benjamin Herrenschmidt
2011-07-18  7:26                     ` Shan Hai
2011-07-18  7:36                       ` Benjamin Herrenschmidt
2011-07-18  7:50                         ` Shan Hai
2011-07-19  3:30                         ` Shan Hai [this message]
2011-07-19  4:20                           ` Benjamin Herrenschmidt
2011-07-19  4:29                           ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW tracking of dirty & young Benjamin Herrenschmidt
2011-07-19  4:55                             ` Shan Hai
2011-07-19  5:17                             ` Shan Hai
2011-07-19  5:24                               ` Benjamin Herrenschmidt
2011-07-19  5:38                                 ` Shan Hai
2011-07-19  7:46                                   ` Benjamin Herrenschmidt
2011-07-19  8:24                                     ` Shan Hai
2011-07-19  8:26                                       ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW trackingof " David Laight
2011-07-19  8:45                                         ` Benjamin Herrenschmidt
2011-07-19  8:45                                         ` Shan Hai
2011-07-19 11:10                             ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW tracking of " Peter Zijlstra
2011-07-20 14:39                             ` Darren Hart
2011-07-21 22:36                             ` Andrew Morton
2011-07-21 22:52                               ` Benjamin Herrenschmidt
2011-07-21 22:57                                 ` Benjamin Herrenschmidt
2011-07-21 22:59                                 ` Andrew Morton
2011-07-22  1:40                                   ` Benjamin Herrenschmidt
2011-07-22  1:54                                   ` Shan Hai
2011-07-27  6:50                             ` Mike Frysinger
2011-07-27  7:58                               ` Benjamin Herrenschmidt
2011-07-27  8:59                                 ` Peter Zijlstra
2011-07-27 10:09                                 ` David Howells
2011-07-27 10:17                                   ` Peter Zijlstra
2011-07-27 10:20                                     ` Benjamin Herrenschmidt
2011-07-28  0:12                                       ` Mike Frysinger
2011-08-08  2:31                                     ` Mike Frysinger
2011-07-28 10:55                                   ` David Howells
2011-07-17 11:02         ` [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core Peter Zijlstra
2011-07-17 13:33           ` Shan Hai
2011-07-17 14:48             ` Benjamin Herrenschmidt
2011-07-17 15:40               ` Shan Hai
2011-07-17 22:34                 ` Benjamin Herrenschmidt
2011-07-17 14:34           ` Benjamin Herrenschmidt
2011-07-15  8:20 ` [PATCH 0/1] " Peter Zijlstra
2011-07-15  8:38   ` MailingLists
2011-07-15  8:44     ` Peter Zijlstra
2011-07-15  9:08       ` Shan Hai
2011-07-15  9:12         ` Benjamin Herrenschmidt
2011-07-15  9:50         ` Peter Zijlstra
2011-07-15 10:06           ` Shan Hai
2011-07-15 10:32             ` David Laight
2011-07-15 10:39               ` Peter Zijlstra
2011-07-15 15:32               ` Shan Hai
2011-07-16  0:20                 ` Benjamin Herrenschmidt
2011-07-16 15:03                   ` Shan Hai
2011-07-15 23:47               ` Benjamin Herrenschmidt
2011-07-15  9:07     ` Benjamin Herrenschmidt
2011-07-15  9:05   ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E24FA51.70602@gmail.com \
    --to=haishan.bai@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=cmetcalf@tilera.com \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).