From: Shan Hai <haishan.bai@gmail.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: tony.luck@intel.com, Peter Zijlstra <a.p.zijlstra@chello.nl>,
Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org, cmetcalf@tilera.com,
dhowells@redhat.com, paulus@samba.org, tglx@linutronix.de,
walken@google.com, linuxppc-dev@lists.ozlabs.org,
akpm@linux-foundation.org
Subject: Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
Date: Tue, 19 Jul 2011 11:30:25 +0800 [thread overview]
Message-ID: <4E24FA51.70602@gmail.com> (raw)
In-Reply-To: <1310974591.25044.298.camel@pasglop>
On 07/18/2011 03:36 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2011-07-18 at 15:26 +0800, Shan Hai wrote:
>> I am sorry I hadn't tried your newer patch, I tried it but it still
>> could not work in my test environment, I will dig into and tell you
>> why that failed later.
> Ok, please let me know what you find !
>
Have not been finding out the reason why failed,
I tried the following based on your code,
(1)
diff --git a/kernel/futex.c b/kernel/futex.c
index fe28dc2..820556d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -353,10 +353,11 @@ static int fault_in_user_writeable(u32 __user *uaddr)
{
struct mm_struct *mm = current->mm;
int ret;
+ int flags = FOLL_TOUCH | FOLL_GET | FOLL_WRITE | FOLL_FIXFAULT;
down_read(&mm->mmap_sem);
- ret = get_user_pages(current, mm, (unsigned long)uaddr,
- 1, 1, 0, NULL, NULL);
+ ret = __get_user_pages(current, mm, (unsigned long)uaddr, 1,
+ flags, NULL, NULL, NULL);
up_read(&mm->mmap_sem);
return ret < 0 ? ret : 0;
(2)
diff --git a/mm/memory.c b/mm/memory.c
index 9b8a01d..f7ba26e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
...
+
+ if ((flags & (FOLL_WRITE | FOLL_FIXFAULT)) && !pte_dirty(pte))
+ handle_pte_sw_young_dirty(vma, address, ptep,
+ FAULT_FLAG_WRITE);
...
And everything lookes good, but still couldn't work, need more
investigation.
>> Yep, I know holding lots of ifdef's everywhere is not so good,
>> but if we have some other way(I don't know how till now) to
>> figure out the arch has the need to fixup up the write permission
>> we could eradicate the ugly ifdef's here.
>>
>> I think the handle_mm_fault could do all dirty/young tracking,
>> because the purpose of making follow_page return NULL to
>> its caller is that want to the handle_mm_fault to be called
>> on write permission protection fault.
> I see your point. Rather than factoring the fixup code out, we could
> force gup to call handle_mm_fault()... that makes sense.
>
> However, I don't think we should special case archs. There's plenty of
> cases where we don't care about this fixup even on archs that do SW
> tracking of dirty and young. For example when gup is using for
> subsequent DMA.
>
> Only the (rare ?) cases where it's used as a mean to fixup a failing
> "atomic" user access are relevant.
>
> So I believe we should still pass an explicit flag to __get_user_pages()
> as I propose to activate that behaviour.
>
How about the following one?
the write permission fixup behaviour is triggered explicitly by
the trouble making parts like futex as you suggested.
In this way, the follow_page() mimics exactly how the MMU
faults on atomic access to the user pages, and we could handle
the fault by already existing handle_mm_fault which also do
the dirty/young tracking properly.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9670f71..8a76694 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1546,6 +1546,7 @@ struct page *follow_page(struct vm_area_struct *,
unsigned long address,
#define FOLL_MLOCK 0x40 /* mark page as mlocked */
#define FOLL_SPLIT 0x80 /* don't return transhuge pages, split
them */
#define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */
+#define FOLL_FIXFAULT 0x200 /* fixup after a fault (PTE
dirty/young upd) */
typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
void *data);
diff --git a/kernel/futex.c b/kernel/futex.c
index fe28dc2..820556d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -353,10 +353,11 @@ static int fault_in_user_writeable(u32 __user *uaddr)
{
struct mm_struct *mm = current->mm;
int ret;
+ int flags = FOLL_TOUCH | FOLL_GET | FOLL_WRITE | FOLL_FIXFAULT;
down_read(&mm->mmap_sem);
- ret = get_user_pages(current, mm, (unsigned long)uaddr,
- 1, 1, 0, NULL, NULL);
+ ret = __get_user_pages(current, mm, (unsigned long)uaddr, 1,
+ flags, NULL, NULL, NULL);
up_read(&mm->mmap_sem);
return ret < 0 ? ret : 0;
diff --git a/mm/memory.c b/mm/memory.c
index 9b8a01d..5682501 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1442,6 +1442,7 @@ struct page *follow_page(struct vm_area_struct
*vma, unsigned long address,
spinlock_t *ptl;
struct page *page;
struct mm_struct *mm = vma->vm_mm;
+ int fix_write_permission = 0;
page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
if (!IS_ERR(page)) {
@@ -1519,6 +1520,9 @@ split_fallthrough:
if ((flags & FOLL_WRITE) &&
!pte_dirty(pte) && !PageDirty(page))
set_page_dirty(page);
+
+ if ((flags & (FOLL_WRITE | FOLL_FIXFAULT)) && !pte_dirty(pte))
+ fix_write_permission = 1;
/*
* pte_mkyoung() would be more correct here, but atomic care
* is needed to avoid losing the dirty bit: it is easier to use
@@ -1551,7 +1555,7 @@ split_fallthrough:
unlock:
pte_unmap_unlock(ptep, ptl);
out:
- return page;
+ return (fix_write_permission) ? NULL : page;
bad_page:
pte_unmap_unlock(ptep, ptl);
> At this point, since we have isolated the special case callers, I think
> we are pretty much in a situation where there's no point trying to
> optimize the x86 case more, it's a fairly slow path anyway, and so no
> ifdef should be needed (and x86 already #define out the TLB flush for
> spurious faults in handle_pte_fault today).
>
> We don't even need to change follow_page()... we just don't call it the
> first time around.
>
> I'll cook up another patch later but first we need to find out why the
> one you have doesn't work. There might be another problem lurking (or I
> just made a stupid mistake).
>
> BTW. Can you give me some details about how you reproduce the problem ?
> I should setup something on a booke machine here to verify things.
>
> Cheers,
> Ben.
>
next prev parent reply other threads:[~2011-07-19 3:27 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-15 8:07 [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core Shan Hai
2011-07-15 8:07 ` [PATCH 1/1] " Shan Hai
2011-07-15 10:23 ` Peter Zijlstra
2011-07-15 15:18 ` Shan Hai
2011-07-15 15:24 ` Peter Zijlstra
2011-07-16 15:36 ` Shan Hai
2011-07-16 14:50 ` Shan Hai
2011-07-16 23:49 ` Benjamin Herrenschmidt
2011-07-17 9:38 ` Peter Zijlstra
2011-07-17 14:29 ` Benjamin Herrenschmidt
2011-07-17 23:14 ` Benjamin Herrenschmidt
2011-07-18 3:53 ` Benjamin Herrenschmidt
2011-07-18 4:02 ` Benjamin Herrenschmidt
2011-07-18 4:01 ` Benjamin Herrenschmidt
2011-07-18 6:48 ` Shan Hai
2011-07-18 7:01 ` Benjamin Herrenschmidt
2011-07-18 7:26 ` Shan Hai
2011-07-18 7:36 ` Benjamin Herrenschmidt
2011-07-18 7:50 ` Shan Hai
2011-07-19 3:30 ` Shan Hai [this message]
2011-07-19 4:20 ` Benjamin Herrenschmidt
2011-07-19 4:29 ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW tracking of dirty & young Benjamin Herrenschmidt
2011-07-19 4:55 ` Shan Hai
2011-07-19 5:17 ` Shan Hai
2011-07-19 5:24 ` Benjamin Herrenschmidt
2011-07-19 5:38 ` Shan Hai
2011-07-19 7:46 ` Benjamin Herrenschmidt
2011-07-19 8:24 ` Shan Hai
2011-07-19 8:26 ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW trackingof " David Laight
2011-07-19 8:45 ` Benjamin Herrenschmidt
2011-07-19 8:45 ` Shan Hai
2011-07-19 11:10 ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW tracking of " Peter Zijlstra
2011-07-20 14:39 ` Darren Hart
2011-07-21 22:36 ` Andrew Morton
2011-07-21 22:52 ` Benjamin Herrenschmidt
2011-07-21 22:57 ` Benjamin Herrenschmidt
2011-07-21 22:59 ` Andrew Morton
2011-07-22 1:40 ` Benjamin Herrenschmidt
2011-07-22 1:54 ` Shan Hai
2011-07-27 6:50 ` Mike Frysinger
2011-07-27 7:58 ` Benjamin Herrenschmidt
2011-07-27 8:59 ` Peter Zijlstra
2011-07-27 10:09 ` David Howells
2011-07-27 10:17 ` Peter Zijlstra
2011-07-27 10:20 ` Benjamin Herrenschmidt
2011-07-28 0:12 ` Mike Frysinger
2011-08-08 2:31 ` Mike Frysinger
2011-07-28 10:55 ` David Howells
2011-07-17 11:02 ` [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core Peter Zijlstra
2011-07-17 13:33 ` Shan Hai
2011-07-17 14:48 ` Benjamin Herrenschmidt
2011-07-17 15:40 ` Shan Hai
2011-07-17 22:34 ` Benjamin Herrenschmidt
2011-07-17 14:34 ` Benjamin Herrenschmidt
2011-07-15 8:20 ` [PATCH 0/1] " Peter Zijlstra
2011-07-15 8:38 ` MailingLists
2011-07-15 8:44 ` Peter Zijlstra
2011-07-15 9:08 ` Shan Hai
2011-07-15 9:12 ` Benjamin Herrenschmidt
2011-07-15 9:50 ` Peter Zijlstra
2011-07-15 10:06 ` Shan Hai
2011-07-15 10:32 ` David Laight
2011-07-15 10:39 ` Peter Zijlstra
2011-07-15 15:32 ` Shan Hai
2011-07-16 0:20 ` Benjamin Herrenschmidt
2011-07-16 15:03 ` Shan Hai
2011-07-15 23:47 ` Benjamin Herrenschmidt
2011-07-15 9:07 ` Benjamin Herrenschmidt
2011-07-15 9:05 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E24FA51.70602@gmail.com \
--to=haishan.bai@gmail.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=cmetcalf@tilera.com \
--cc=dhowells@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).