All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shan Hai <haishan.bai@gmail.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: tony.luck@intel.com, Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, cmetcalf@tilera.com,
	dhowells@redhat.com, paulus@samba.org, tglx@linutronix.de,
	walken@google.com, linuxppc-dev@lists.ozlabs.org,
	akpm@linux-foundation.org
Subject: Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
Date: Tue, 19 Jul 2011 11:30:25 +0800	[thread overview]
Message-ID: <4E24FA51.70602@gmail.com> (raw)
In-Reply-To: <1310974591.25044.298.camel@pasglop>

On 07/18/2011 03:36 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2011-07-18 at 15:26 +0800, Shan Hai wrote:
>> I am sorry I hadn't tried your newer patch, I tried it but it still
>> could not work in my test environment, I will dig into and tell you
>> why that failed later.
> Ok, please let me know what you find !
>

Have not been finding out the reason why failed,
I tried the following based on your code,
(1)
diff --git a/kernel/futex.c b/kernel/futex.c
index fe28dc2..820556d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -353,10 +353,11 @@ static int fault_in_user_writeable(u32 __user *uaddr)
  {
         struct mm_struct *mm = current->mm;
         int ret;
+       int flags = FOLL_TOUCH | FOLL_GET | FOLL_WRITE | FOLL_FIXFAULT;

         down_read(&mm->mmap_sem);
-       ret = get_user_pages(current, mm, (unsigned long)uaddr,
-                            1, 1, 0, NULL, NULL);
+       ret = __get_user_pages(current, mm, (unsigned long)uaddr, 1,
+                              flags, NULL, NULL, NULL);
         up_read(&mm->mmap_sem);

         return ret < 0 ? ret : 0;

(2)
diff --git a/mm/memory.c b/mm/memory.c
index 9b8a01d..f7ba26e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
...
+
+       if ((flags & (FOLL_WRITE | FOLL_FIXFAULT)) && !pte_dirty(pte))
+               handle_pte_sw_young_dirty(vma, address, ptep,
+                                          FAULT_FLAG_WRITE);
...

And everything lookes good, but still couldn't work, need more 
investigation.

>> Yep, I know holding lots of ifdef's everywhere is not so good,
>> but if we have some other way(I don't know how till now) to
>> figure out the arch has the need to fixup up the write permission
>> we could eradicate the ugly ifdef's here.
>>
>> I think the handle_mm_fault could do all dirty/young tracking,
>> because the purpose of making follow_page return NULL to
>> its caller is that want to the handle_mm_fault to be called
>> on write permission protection fault.
> I see your point. Rather than factoring the fixup code out, we could
> force gup to call handle_mm_fault()... that makes sense.
>
> However, I don't think we should special case archs. There's plenty of
> cases where we don't care about this fixup even on archs that do SW
> tracking of dirty and young. For example when gup is using for
> subsequent DMA.
>
> Only the (rare ?) cases where it's used as a mean to fixup a failing
> "atomic" user access are relevant.
>
> So I believe we should still pass an explicit flag to __get_user_pages()
> as I propose to activate that behaviour.
>

How about the following one?
the write permission fixup behaviour is triggered explicitly by
the trouble making parts like futex as you suggested.

In this way, the follow_page() mimics exactly how the MMU
faults on atomic access to the user pages, and we could handle
the fault by already existing handle_mm_fault which also do
the dirty/young tracking properly.


diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9670f71..8a76694 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1546,6 +1546,7 @@ struct page *follow_page(struct vm_area_struct *, 
unsigned long address,
  #define FOLL_MLOCK    0x40    /* mark page as mlocked */
  #define FOLL_SPLIT    0x80    /* don't return transhuge pages, split 
them */
  #define FOLL_HWPOISON    0x100    /* check page is hwpoisoned */
+#define FOLL_FIXFAULT    0x200    /* fixup after a fault (PTE 
dirty/young upd) */

  typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
              void *data);
diff --git a/kernel/futex.c b/kernel/futex.c
index fe28dc2..820556d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -353,10 +353,11 @@ static int fault_in_user_writeable(u32 __user *uaddr)
  {
      struct mm_struct *mm = current->mm;
      int ret;
+    int flags = FOLL_TOUCH | FOLL_GET | FOLL_WRITE | FOLL_FIXFAULT;

      down_read(&mm->mmap_sem);
-    ret = get_user_pages(current, mm, (unsigned long)uaddr,
-                 1, 1, 0, NULL, NULL);
+    ret = __get_user_pages(current, mm, (unsigned long)uaddr, 1,
+                   flags, NULL, NULL, NULL);
      up_read(&mm->mmap_sem);

      return ret < 0 ? ret : 0;
diff --git a/mm/memory.c b/mm/memory.c
index 9b8a01d..5682501 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1442,6 +1442,7 @@ struct page *follow_page(struct vm_area_struct 
*vma, unsigned long address,
      spinlock_t *ptl;
      struct page *page;
      struct mm_struct *mm = vma->vm_mm;
+    int fix_write_permission = 0;

      page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
      if (!IS_ERR(page)) {
@@ -1519,6 +1520,9 @@ split_fallthrough:
          if ((flags & FOLL_WRITE) &&
              !pte_dirty(pte) && !PageDirty(page))
              set_page_dirty(page);
+
+        if ((flags & (FOLL_WRITE | FOLL_FIXFAULT)) && !pte_dirty(pte))
+            fix_write_permission = 1;
          /*
           * pte_mkyoung() would be more correct here, but atomic care
           * is needed to avoid losing the dirty bit: it is easier to use
@@ -1551,7 +1555,7 @@ split_fallthrough:
  unlock:
      pte_unmap_unlock(ptep, ptl);
  out:
-    return page;
+    return (fix_write_permission) ? NULL : page;

  bad_page:
      pte_unmap_unlock(ptep, ptl);

> At this point, since we have isolated the special case callers, I think
> we are pretty much in a situation where there's no point trying to
> optimize the x86 case more, it's a fairly slow path anyway, and so no
> ifdef should be needed (and x86 already #define out the TLB flush for
> spurious faults in handle_pte_fault today).
>
> We don't even need to change follow_page()... we just don't call it the
> first time around.
>
> I'll cook up another patch later but first we need to find out why the
> one you have doesn't work. There might be another problem lurking (or I
> just made a stupid mistake).
>
> BTW. Can you give me some details about how you reproduce the problem ?
> I should setup something on a booke machine here to verify things.
>
> Cheers,
> Ben.
>

WARNING: multiple messages have this Message-ID (diff)
From: Shan Hai <haishan.bai@gmail.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	paulus@samba.org, tglx@linutronix.de, walken@google.com,
	dhowells@redhat.com, cmetcalf@tilera.com, tony.luck@intel.com,
	akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
Date: Tue, 19 Jul 2011 11:30:25 +0800	[thread overview]
Message-ID: <4E24FA51.70602@gmail.com> (raw)
In-Reply-To: <1310974591.25044.298.camel@pasglop>

On 07/18/2011 03:36 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2011-07-18 at 15:26 +0800, Shan Hai wrote:
>> I am sorry I hadn't tried your newer patch, I tried it but it still
>> could not work in my test environment, I will dig into and tell you
>> why that failed later.
> Ok, please let me know what you find !
>

Have not been finding out the reason why failed,
I tried the following based on your code,
(1)
diff --git a/kernel/futex.c b/kernel/futex.c
index fe28dc2..820556d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -353,10 +353,11 @@ static int fault_in_user_writeable(u32 __user *uaddr)
  {
         struct mm_struct *mm = current->mm;
         int ret;
+       int flags = FOLL_TOUCH | FOLL_GET | FOLL_WRITE | FOLL_FIXFAULT;

         down_read(&mm->mmap_sem);
-       ret = get_user_pages(current, mm, (unsigned long)uaddr,
-                            1, 1, 0, NULL, NULL);
+       ret = __get_user_pages(current, mm, (unsigned long)uaddr, 1,
+                              flags, NULL, NULL, NULL);
         up_read(&mm->mmap_sem);

         return ret < 0 ? ret : 0;

(2)
diff --git a/mm/memory.c b/mm/memory.c
index 9b8a01d..f7ba26e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
...
+
+       if ((flags & (FOLL_WRITE | FOLL_FIXFAULT)) && !pte_dirty(pte))
+               handle_pte_sw_young_dirty(vma, address, ptep,
+                                          FAULT_FLAG_WRITE);
...

And everything lookes good, but still couldn't work, need more 
investigation.

>> Yep, I know holding lots of ifdef's everywhere is not so good,
>> but if we have some other way(I don't know how till now) to
>> figure out the arch has the need to fixup up the write permission
>> we could eradicate the ugly ifdef's here.
>>
>> I think the handle_mm_fault could do all dirty/young tracking,
>> because the purpose of making follow_page return NULL to
>> its caller is that want to the handle_mm_fault to be called
>> on write permission protection fault.
> I see your point. Rather than factoring the fixup code out, we could
> force gup to call handle_mm_fault()... that makes sense.
>
> However, I don't think we should special case archs. There's plenty of
> cases where we don't care about this fixup even on archs that do SW
> tracking of dirty and young. For example when gup is using for
> subsequent DMA.
>
> Only the (rare ?) cases where it's used as a mean to fixup a failing
> "atomic" user access are relevant.
>
> So I believe we should still pass an explicit flag to __get_user_pages()
> as I propose to activate that behaviour.
>

How about the following one?
the write permission fixup behaviour is triggered explicitly by
the trouble making parts like futex as you suggested.

In this way, the follow_page() mimics exactly how the MMU
faults on atomic access to the user pages, and we could handle
the fault by already existing handle_mm_fault which also do
the dirty/young tracking properly.


diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9670f71..8a76694 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1546,6 +1546,7 @@ struct page *follow_page(struct vm_area_struct *, 
unsigned long address,
  #define FOLL_MLOCK    0x40    /* mark page as mlocked */
  #define FOLL_SPLIT    0x80    /* don't return transhuge pages, split 
them */
  #define FOLL_HWPOISON    0x100    /* check page is hwpoisoned */
+#define FOLL_FIXFAULT    0x200    /* fixup after a fault (PTE 
dirty/young upd) */

  typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
              void *data);
diff --git a/kernel/futex.c b/kernel/futex.c
index fe28dc2..820556d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -353,10 +353,11 @@ static int fault_in_user_writeable(u32 __user *uaddr)
  {
      struct mm_struct *mm = current->mm;
      int ret;
+    int flags = FOLL_TOUCH | FOLL_GET | FOLL_WRITE | FOLL_FIXFAULT;

      down_read(&mm->mmap_sem);
-    ret = get_user_pages(current, mm, (unsigned long)uaddr,
-                 1, 1, 0, NULL, NULL);
+    ret = __get_user_pages(current, mm, (unsigned long)uaddr, 1,
+                   flags, NULL, NULL, NULL);
      up_read(&mm->mmap_sem);

      return ret < 0 ? ret : 0;
diff --git a/mm/memory.c b/mm/memory.c
index 9b8a01d..5682501 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1442,6 +1442,7 @@ struct page *follow_page(struct vm_area_struct 
*vma, unsigned long address,
      spinlock_t *ptl;
      struct page *page;
      struct mm_struct *mm = vma->vm_mm;
+    int fix_write_permission = 0;

      page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
      if (!IS_ERR(page)) {
@@ -1519,6 +1520,9 @@ split_fallthrough:
          if ((flags & FOLL_WRITE) &&
              !pte_dirty(pte) && !PageDirty(page))
              set_page_dirty(page);
+
+        if ((flags & (FOLL_WRITE | FOLL_FIXFAULT)) && !pte_dirty(pte))
+            fix_write_permission = 1;
          /*
           * pte_mkyoung() would be more correct here, but atomic care
           * is needed to avoid losing the dirty bit: it is easier to use
@@ -1551,7 +1555,7 @@ split_fallthrough:
  unlock:
      pte_unmap_unlock(ptep, ptl);
  out:
-    return page;
+    return (fix_write_permission) ? NULL : page;

  bad_page:
      pte_unmap_unlock(ptep, ptl);

> At this point, since we have isolated the special case callers, I think
> we are pretty much in a situation where there's no point trying to
> optimize the x86 case more, it's a fairly slow path anyway, and so no
> ifdef should be needed (and x86 already #define out the TLB flush for
> spurious faults in handle_pte_fault today).
>
> We don't even need to change follow_page()... we just don't call it the
> first time around.
>
> I'll cook up another patch later but first we need to find out why the
> one you have doesn't work. There might be another problem lurking (or I
> just made a stupid mistake).
>
> BTW. Can you give me some details about how you reproduce the problem ?
> I should setup something on a booke machine here to verify things.
>
> Cheers,
> Ben.
>


  parent reply	other threads:[~2011-07-19  3:27 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-15  8:07 [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core Shan Hai
2011-07-15  8:07 ` Shan Hai
2011-07-15  8:07 ` [PATCH 1/1] " Shan Hai
2011-07-15  8:07   ` Shan Hai
2011-07-15 10:23   ` Peter Zijlstra
2011-07-15 10:23     ` Peter Zijlstra
2011-07-15 15:18     ` Shan Hai
2011-07-15 15:18       ` Shan Hai
2011-07-15 15:24       ` Peter Zijlstra
2011-07-15 15:24         ` Peter Zijlstra
2011-07-16 15:36         ` Shan Hai
2011-07-16 15:36           ` Shan Hai
2011-07-16 14:50     ` Shan Hai
2011-07-16 14:50       ` Shan Hai
2011-07-16 23:49       ` Benjamin Herrenschmidt
2011-07-16 23:49         ` Benjamin Herrenschmidt
2011-07-17  9:38         ` Peter Zijlstra
2011-07-17  9:38           ` Peter Zijlstra
2011-07-17 14:29           ` Benjamin Herrenschmidt
2011-07-17 14:29             ` Benjamin Herrenschmidt
2011-07-17 23:14             ` Benjamin Herrenschmidt
2011-07-17 23:14               ` Benjamin Herrenschmidt
2011-07-18  3:53               ` Benjamin Herrenschmidt
2011-07-18  3:53                 ` Benjamin Herrenschmidt
2011-07-18  4:02                 ` Benjamin Herrenschmidt
2011-07-18  4:02                   ` Benjamin Herrenschmidt
2011-07-18  4:01               ` Benjamin Herrenschmidt
2011-07-18  4:01                 ` Benjamin Herrenschmidt
2011-07-18  6:48                 ` Shan Hai
2011-07-18  6:48                   ` Shan Hai
2011-07-18  7:01                   ` Benjamin Herrenschmidt
2011-07-18  7:01                     ` Benjamin Herrenschmidt
2011-07-18  7:26                     ` Shan Hai
2011-07-18  7:26                       ` Shan Hai
2011-07-18  7:36                       ` Benjamin Herrenschmidt
2011-07-18  7:36                         ` Benjamin Herrenschmidt
2011-07-18  7:50                         ` Shan Hai
2011-07-18  7:50                           ` Shan Hai
2011-07-19  3:30                         ` Shan Hai [this message]
2011-07-19  3:30                           ` Shan Hai
2011-07-19  4:20                           ` Benjamin Herrenschmidt
2011-07-19  4:20                             ` Benjamin Herrenschmidt
2011-07-19  4:29                           ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW tracking of dirty & young Benjamin Herrenschmidt
2011-07-19  4:29                             ` Benjamin Herrenschmidt
2011-07-19  4:55                             ` Shan Hai
2011-07-19  4:55                               ` Shan Hai
2011-07-19  5:17                             ` Shan Hai
2011-07-19  5:17                               ` Shan Hai
2011-07-19  5:24                               ` Benjamin Herrenschmidt
2011-07-19  5:24                                 ` Benjamin Herrenschmidt
2011-07-19  5:38                                 ` Shan Hai
2011-07-19  5:38                                   ` Shan Hai
2011-07-19  7:46                                   ` Benjamin Herrenschmidt
2011-07-19  7:46                                     ` Benjamin Herrenschmidt
2011-07-19  8:24                                     ` Shan Hai
2011-07-19  8:24                                       ` Shan Hai
2011-07-19  8:26                                       ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW trackingof " David Laight
2011-07-19  8:26                                         ` David Laight
2011-07-19  8:45                                         ` Benjamin Herrenschmidt
2011-07-19  8:45                                           ` Benjamin Herrenschmidt
2011-07-19  8:45                                         ` Shan Hai
2011-07-19  8:45                                           ` Shan Hai
2011-07-19 11:10                             ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW tracking of " Peter Zijlstra
2011-07-19 11:10                               ` Peter Zijlstra
2011-07-20 14:39                             ` Darren Hart
2011-07-20 14:39                               ` Darren Hart
2011-07-21 22:36                             ` Andrew Morton
2011-07-21 22:36                               ` Andrew Morton
2011-07-21 22:52                               ` Benjamin Herrenschmidt
2011-07-21 22:52                                 ` Benjamin Herrenschmidt
2011-07-21 22:57                                 ` Benjamin Herrenschmidt
2011-07-21 22:57                                   ` Benjamin Herrenschmidt
2011-07-21 22:59                                 ` Andrew Morton
2011-07-21 22:59                                   ` Andrew Morton
2011-07-22  1:40                                   ` Benjamin Herrenschmidt
2011-07-22  1:40                                     ` Benjamin Herrenschmidt
2011-07-22  1:54                                   ` Shan Hai
2011-07-22  1:54                                     ` Shan Hai
2011-07-27  6:50                             ` Mike Frysinger
2011-07-27  6:50                               ` Mike Frysinger
2011-07-27  7:58                               ` Benjamin Herrenschmidt
2011-07-27  7:58                                 ` Benjamin Herrenschmidt
2011-07-27  8:59                                 ` Peter Zijlstra
2011-07-27  8:59                                   ` Peter Zijlstra
2011-07-27 10:09                                   ` David Howells
2011-07-27 10:09                                     ` David Howells
2011-07-27 10:17                                     ` Peter Zijlstra
2011-07-27 10:17                                       ` Peter Zijlstra
2011-07-27 10:20                                       ` Benjamin Herrenschmidt
2011-07-27 10:20                                         ` Benjamin Herrenschmidt
2011-07-28  0:12                                         ` Mike Frysinger
2011-07-28  0:12                                           ` Mike Frysinger
2011-07-28 10:55                                       ` David Howells
2011-07-28 10:55                                         ` David Howells
2011-08-08  2:31                                       ` Mike Frysinger
2011-08-08  2:31                                         ` Mike Frysinger
2011-07-17 11:02         ` [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core Peter Zijlstra
2011-07-17 11:02           ` Peter Zijlstra
2011-07-17 13:33           ` Shan Hai
2011-07-17 13:33             ` Shan Hai
2011-07-17 14:48             ` Benjamin Herrenschmidt
2011-07-17 14:48               ` Benjamin Herrenschmidt
2011-07-17 15:40               ` Shan Hai
2011-07-17 15:40                 ` Shan Hai
2011-07-17 22:34                 ` Benjamin Herrenschmidt
2011-07-17 22:34                   ` Benjamin Herrenschmidt
2011-07-17 14:34           ` Benjamin Herrenschmidt
2011-07-17 14:34             ` Benjamin Herrenschmidt
2011-07-15  8:20 ` [PATCH 0/1] " Peter Zijlstra
2011-07-15  8:20   ` Peter Zijlstra
2011-07-15  8:38   ` MailingLists
2011-07-15  8:38     ` MailingLists
2011-07-15  8:44     ` Peter Zijlstra
2011-07-15  8:44       ` Peter Zijlstra
2011-07-15  9:08       ` Shan Hai
2011-07-15  9:08         ` Shan Hai
2011-07-15  9:12         ` Benjamin Herrenschmidt
2011-07-15  9:12           ` Benjamin Herrenschmidt
2011-07-15  9:50         ` Peter Zijlstra
2011-07-15  9:50           ` Peter Zijlstra
2011-07-15 10:06           ` Shan Hai
2011-07-15 10:06             ` Shan Hai
2011-07-15 10:32             ` David Laight
2011-07-15 10:32               ` David Laight
2011-07-15 10:39               ` Peter Zijlstra
2011-07-15 10:39                 ` Peter Zijlstra
2011-07-15 15:32               ` Shan Hai
2011-07-15 15:32                 ` Shan Hai
2011-07-16  0:20                 ` Benjamin Herrenschmidt
2011-07-16  0:20                   ` Benjamin Herrenschmidt
2011-07-16 15:03                   ` Shan Hai
2011-07-16 15:03                     ` Shan Hai
2011-07-15 23:47               ` Benjamin Herrenschmidt
2011-07-15 23:47                 ` Benjamin Herrenschmidt
2011-07-15  9:07     ` Benjamin Herrenschmidt
2011-07-15  9:07       ` Benjamin Herrenschmidt
2011-07-15  9:05   ` Benjamin Herrenschmidt
2011-07-15  9:05     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E24FA51.70602@gmail.com \
    --to=haishan.bai@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=cmetcalf@tilera.com \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.