* Re: [PATCH v2] dtc: Remove unused variable in flat_read_mem_reserve
From: Jon Loeliger @ 2011-07-17 12:36 UTC (permalink / raw)
To: David Gibson; +Cc: linuxppc-dev, Josh Boyer
In-Reply-To: <20110715135755.GD4368@yookeroo.fritz.box>
> On Tue, Jun 28, 2011 at 09:47:11AM -0400, Josh Boyer wrote:
> > The *p variable is declared and used to save inb->ptr, however p is
> > later never used. This has been the case since commit 6c0f3676 and can
> > lead to build failures with -Werror=unused-but-set-variable:
> >
> > flattree.c: In function 'flat_read_mem_reserve':
> > flattree.c:700:14: error: variable 'p' set but not used [-Werror=unused-but-set-variable]
> > cc1: all warnings being treated as errors
> > make: *** [flattree.o] Error 1
> >
> > Remove the variable.
> >
> > Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
>
> Acked-by: David Gibson <david@gibson.dropbear.id.au>
Applied.
jdl
^ permalink raw reply
* Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
From: Peter Zijlstra @ 2011-07-17 11:02 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: tony.luck, Shan Hai, linux-kernel, cmetcalf, dhowells, paulus,
tglx, walken, linuxppc-dev, akpm
In-Reply-To: <1310860194.25044.17.camel@pasglop>
On Sun, 2011-07-17 at 09:49 +1000, Benjamin Herrenschmidt wrote:
> In the meantime, other than rewriting the futex code to not require
> those in-atomic accesses (can't it just access the pages via the linear
> mapping and/or kmap after the gup ?),
That'll wreck performance on things like ARM and SPARC that have to deal
with cache aliasing.
> all I see would be a way to force
> dirty and young after gup, with appropriate locks, or a variant of gup
> (via a flag ?) to require it to do so.=20
Again, _WHY_ isn't gup(.write=3D1) a complete write fault? Its supposed to
be, it needs to break COW, do dirty page tracking and call page_mkwrite.
I'm still thinking this e500 stuff is smoking crack.
ARM has no hardware dirty bit either, and yet it works for them. I can't
exactly tell how because I got lost in there, but it does, again,
suggest e500 is on crack.
^ permalink raw reply
* Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
From: Peter Zijlstra @ 2011-07-17 9:38 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Shan Hai
Cc: tony.luck, Peter Zijlstra, linux-kernel, cmetcalf, dhowells,
paulus, tglx, walken, linuxppc-dev, akpm
In-Reply-To: <1310860194.25044.17.camel@pasglop>
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>On Sat, 2011-07-16 at 10:50 -0400, Shan Hai wrote:
>>
>> That's right the gup(.write=1) want to do the same thing as the
>> above code snippet, but it failed for the following reason:
>> because the get_user_pages() would not dirty pte for the reason
>> the follow_page() returns not NULL on *present* and *writable*
>> page, the page which holds the lock is present because its a shared
>> page,
>> writable because demand paging set that up so for shared
>> writable page, so the handle_mm_fault() in the __get_user_page()
>> could not be called.
>>
>> Why the above code could do the same task, because by calling
>> handle_mm_fault() will set pte dirty by
>> [do_annonymous_page(), memory.c]
>> if (vma->vm_flags & VM_WRITE)
>> entry = pte_mkwrite(pte_mkdirty(entry));
>>
>
>Right. gup won't set page_dirty, it expects the caller to do so (in
>case
>it doesn't dirty all the gup'ed pages a suppose).
>
>You could probably fix the problem here by setting dirty after gup in
>the futex code if you know you're going to write. You must do that with
>the PTE lock held though and -not- at interrupt time.
>
>Note however that the exact same problem exist with normal "read"
>accesses and page_young (_PAGE_ACCESSED on powerpc). The page will not
>be accessible until that bit is set and it's set by SW.
>
>As I wrote earlier, fixing that by making "atomic" page faults perform
>the dirty/accessed tracking is not right, since such faults can happen
>at interrupt time and the PTE lock cannot be taken at interrupt time.
>
>IE. The implementation of those "SW" TLB archs heavily relies on the
>PTE
>lock to serialize write access to the PTE and writing it outside of
>that
>lock would do really bad things.
>
>So there's a deeper problem here. The whole user access "in atomic"
>concept is by itself a violation of some of the basic access rules of
>user memory that have existed from day 1 of the kernel. That we allow
>it
>for semi-harmless (and allowed to fail) things like snapshot of
>backtraces for perf is one thing, but relying on it for the futex case
>like that is not going to fly very well. I sincerely hope that this
>kind
>of usage is not going to become a habit.
>
>In the meantime, other than rewriting the futex code to not require
>those in-atomic accesses (can't it just access the pages via the linear
>mapping and/or kmap after the gup ?), all I see would be a way to force
>dirty and young after gup, with appropriate locks, or a variant of gup
>(via a flag ?) to require it to do so.
>
>Cheers,
>Ben.
Whats this talk about interrup context? There is non of that involved.
Furthermore, I still dont see the problem. The futex code is optimistically trying to poke at user memory while holding spinlocks.
We fully expect that to fail, hence the error path that drops all locks and does the gup(.write=1) to fix up the mapping after which we try again.
This has worked for years, its by no means new code. Nor do I see how its broken.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
^ permalink raw reply
* Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
From: Benjamin Herrenschmidt @ 2011-07-16 23:49 UTC (permalink / raw)
To: Shan Hai
Cc: tony.luck, Peter Zijlstra, linux-kernel, cmetcalf, dhowells,
paulus, tglx, walken, linuxppc-dev, akpm
In-Reply-To: <4E21A526.8010904@gmail.com>
On Sat, 2011-07-16 at 10:50 -0400, Shan Hai wrote:
>
> That's right the gup(.write=1) want to do the same thing as the
> above code snippet, but it failed for the following reason:
> because the get_user_pages() would not dirty pte for the reason
> the follow_page() returns not NULL on *present* and *writable*
> page, the page which holds the lock is present because its a shared
> page,
> writable because demand paging set that up so for shared
> writable page, so the handle_mm_fault() in the __get_user_page()
> could not be called.
>
> Why the above code could do the same task, because by calling
> handle_mm_fault() will set pte dirty by
> [do_annonymous_page(), memory.c]
> if (vma->vm_flags & VM_WRITE)
> entry = pte_mkwrite(pte_mkdirty(entry));
>
Right. gup won't set page_dirty, it expects the caller to do so (in case
it doesn't dirty all the gup'ed pages a suppose).
You could probably fix the problem here by setting dirty after gup in
the futex code if you know you're going to write. You must do that with
the PTE lock held though and -not- at interrupt time.
Note however that the exact same problem exist with normal "read"
accesses and page_young (_PAGE_ACCESSED on powerpc). The page will not
be accessible until that bit is set and it's set by SW.
As I wrote earlier, fixing that by making "atomic" page faults perform
the dirty/accessed tracking is not right, since such faults can happen
at interrupt time and the PTE lock cannot be taken at interrupt time.
IE. The implementation of those "SW" TLB archs heavily relies on the PTE
lock to serialize write access to the PTE and writing it outside of that
lock would do really bad things.
So there's a deeper problem here. The whole user access "in atomic"
concept is by itself a violation of some of the basic access rules of
user memory that have existed from day 1 of the kernel. That we allow it
for semi-harmless (and allowed to fail) things like snapshot of
backtraces for perf is one thing, but relying on it for the futex case
like that is not going to fly very well. I sincerely hope that this kind
of usage is not going to become a habit.
In the meantime, other than rewriting the futex code to not require
those in-atomic accesses (can't it just access the pages via the linear
mapping and/or kmap after the gup ?), all I see would be a way to force
dirty and young after gup, with appropriate locks, or a variant of gup
(via a flag ?) to require it to do so.
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
From: Shan Hai @ 2011-07-16 15:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: tony.luck, linux-kernel, cmetcalf, dhowells, paulus, tglx, walken,
linuxppc-dev, akpm
In-Reply-To: <1310743477.2586.358.camel@twins>
On 07/15/2011 11:24 AM, Peter Zijlstra wrote:
> On Fri, 2011-07-15 at 11:18 -0400, Shan Hai wrote:
>
>>>> + vma = find_vma(mm, address);
>>> Uhm, find_vma() needs mmap_sem, and futex_atomic_cmpxchg_inatomic() is
>>> most certainly not called with that lock held.
>>>
>> My fault, that will be fixed in the V2 patch.
> But you cannot, the function isn't called _atomic_ just for kicks, its
> used while holding spinlocks.
>
Yes we can do that, _atomic_ here is just atomic for cmpxchg
implemented by the combination of 'lwarx' and 'stwcx.' instructions
as done in the spin lock implementation, so even we hold the
mmap_sem that has no impact on the _atomic_ feature of the
futex_atomic_cmpxchg_inatomic().
>>>> + if (likely(vma)) {
>>>> + /* only fixup present page */
>>>> + if (follow_page(vma, address, FOLL_WRITE)) {
>>>> + handle_mm_fault(mm, vma, address, FAULT_FLAG_WRITE);
>>> So how can this toggle your sw dirty/young tracking, that's pretty much
>>> what gup(.write=1) does too!
>>>
>> because of the kernel read only permission of the page is transparent
>> to the follow_page(), the handle_mm_fault() is not to be activated
>> in the __get_use_pages(), so the gup(.write=1) could not help to fixup
>> the write permission.
> So why do you need the vma? Is it like I wrote earlier that you don't
> have spare PTE bits and need the vma flags to see if it may become
> writable?
>
Need vma for the reason to call handle_mm_fault(), that's all.
> gup(.write=1) not triggering this is a serious problem though, not
> something you can just paper over. I wouldn't be at all surprised to
> find there's more things broken because of that.
In my opinion another solution might be check the read only for kernel
feature of a page in the follow_page() on gup(.write=1) to avoid this
problem on all architectures.
Thanks
Shan Hai
^ permalink raw reply
* Re: [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core
From: Shan Hai @ 2011-07-16 15:03 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: tony.luck, Peter Zijlstra, linux-kernel, cmetcalf, dhowells,
David Laight, paulus, tglx, walken, linuxppc-dev, akpm
In-Reply-To: <1310775649.25044.5.camel@pasglop>
On 07/15/2011 08:20 PM, Benjamin Herrenschmidt wrote:
> On Fri, 2011-07-15 at 11:32 -0400, Shan Hai wrote:
>> I agree with you, the problem could be triggered by accessing
>> any user space page which has kernel read only permission
>> in the page fault disabled context, the problem also affects
>> architectures which depend on SW dirty/young tracking as
>> stated by Benjamin in this thread.
>>
>> In the e500 case, the commit 6cfd8990e27d3a491c1c605d6cbc18a46ae51fef
>> removed the write permission fixup from TLB miss handlers and left it to
>> generic code, so it might be right time to fixup the write permission here
>> in the generic code.
> But we can't. The must not modify the PTE from an interrupt context and
> the "atomic" variants of user accesses can be called in such contexts.
>
> I think the problem is that we try to actually do things other than just
> "peek" at user memory (for backtraces etc...) but actually useful things
> in page fault disabled contexts. That's bad and various archs mm were
> designed with the assumption that this never happens.
>
Yes I understood, the *here* above means 'generic code' like futex code,
I am sorry for my ambiguous description.
> If the futex case is seldom here, we could probably find a way to work
> around in that specific case.
>
That's what my patch wants to do.
> However, I -still- don't understand why gup didn't fixup the write
> permission. gup doesn't set dirty ?
>
Yep, gup doesn't set dirty, because when the page fault
occurs on the kernel accessing a user page which is
read only to the kernel the following conditions hold,
- the page is present, because its a shared page
- the page is writable, because demand paging
sets up the pte for the current process to so
The follow_page() called in the __get_user_page()
returns non NULL to its caller on the above mentioned
present and writable page, so the gup(.write=1) has no
chance to set pte dirty by calling handle_mm_fault
Thanks
Shan Hai
s
> Cheers,
> Ben.
>
>
^ permalink raw reply
* Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
From: Shan Hai @ 2011-07-16 14:50 UTC (permalink / raw)
To: Peter Zijlstra
Cc: tony.luck, linux-kernel, cmetcalf, dhowells, paulus, tglx, walken,
linuxppc-dev, akpm
In-Reply-To: <1310725418.2586.309.camel@twins>
On 07/15/2011 06:23 AM, Peter Zijlstra wrote:
> On Fri, 2011-07-15 at 16:07 +0800, Shan Hai wrote:
>> The kernel has no write permission on COW pages by default on e500 core, this
>> will cause endless loop in futex_lock_pi, because futex code assumes the kernel
>> has write permission on COW pages. Grant write permission to the kernel on COW
>> pages when access violation page fault occurs.
>>
>> Signed-off-by: Shan Hai<haishan.bai@gmail.com>
>> ---
>> arch/powerpc/include/asm/futex.h | 11 ++++++++++-
>> arch/powerpc/include/asm/tlb.h | 25 +++++++++++++++++++++++++
>> 2 files changed, 35 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h
>> index c94e4a3..54c3e74 100644
>> --- a/arch/powerpc/include/asm/futex.h
>> +++ b/arch/powerpc/include/asm/futex.h
>> @@ -8,6 +8,7 @@
>> #include<asm/errno.h>
>> #include<asm/synch.h>
>> #include<asm/asm-compat.h>
>> +#include<asm/tlb.h>
>>
>> #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \
>> __asm__ __volatile ( \
>> @@ -113,7 +114,15 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
>> : "cc", "memory");
>>
>> *uval = prev;
>> - return ret;
>> +
>> + /* Futex assumes the kernel has permission to write to
>> + * COW pages, grant the kernel write permission on COW
>> + * pages because it has none by default.
>> + */
>> + if (ret == -EFAULT)
>> + __tlb_fixup_write_permission(current->mm, (unsigned long)uaddr);
>> +
>> + return ret;
>> }
>>
>> #endif /* __KERNEL__ */
>> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
>> index e2b428b..3863c6a 100644
>> --- a/arch/powerpc/include/asm/tlb.h
>> +++ b/arch/powerpc/include/asm/tlb.h
>> @@ -45,5 +45,30 @@ static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
>> #endif
>> }
>>
>> +/* Grant write permission to the kernel on a page. */
>> +static inline void __tlb_fixup_write_permission(struct mm_struct *mm,
>> + unsigned long address)
>> +{
>> +#if defined(CONFIG_FSL_BOOKE)
>> + /* Grant write permission to the kernel on a page by setting TLB.SW
>> + * bit, the bit setting operation is tricky here, calling
>> + * handle_mm_fault with FAULT_FLAG_WRITE causes _PAGE_DIRTY bit of
>> + * the pte to be set, the _PAGE_DIRTY of the pte is translated into
>> + * TLB.SW on Powerpc e500 core.
>> + */
>> +
>> + struct vm_area_struct *vma;
>> +
>> + vma = find_vma(mm, address);
> Uhm, find_vma() needs mmap_sem, and futex_atomic_cmpxchg_inatomic() is
> most certainly not called with that lock held.
>
>> + if (likely(vma)) {
>> + /* only fixup present page */
>> + if (follow_page(vma, address, FOLL_WRITE)) {
>> + handle_mm_fault(mm, vma, address, FAULT_FLAG_WRITE);
> So how can this toggle your sw dirty/young tracking, that's pretty much
> what gup(.write=1) does too!
>
That's right the gup(.write=1) want to do the same thing as the
above code snippet, but it failed for the following reason:
because the get_user_pages() would not dirty pte for the reason
the follow_page() returns not NULL on *present* and *writable*
page, the page which holds the lock is present because its a shared page,
writable because demand paging set that up so for shared
writable page, so the handle_mm_fault() in the __get_user_page()
could not be called.
Why the above code could do the same task, because by calling
handle_mm_fault() will set pte dirty by
[do_annonymous_page(), memory.c]
if (vma->vm_flags & VM_WRITE)
entry = pte_mkwrite(pte_mkdirty(entry));
Thanks
Shan Hai
>> + flush_tlb_page(vma, address);
>> + }
>> + }
>> +#endif
>> +}
>> +
>> #endif /* __KERNEL__ */
>> #endif /* __ASM_POWERPC_TLB_H */
^ permalink raw reply
* [PATCH] powerpc: Exporting boot_cpuid_phys
From: Andrew Gabbasov @ 2011-07-16 13:22 UTC (permalink / raw)
To: benh; +Cc: openmcapi-dev, linuxppc-dev
Kernel loadable module can use hard_smp_processor_id() if building with SMP
kernel. In order to make it work for UP kernels too, boot_cpuid_phys
symbol (which is what hard_smp_processor_id() macro resolves to
in non-SMP configuration) must be exported.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
---
arch/powerpc/kernel/setup_32.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 1d2fbc9..3dffce6 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -49,6 +49,7 @@ extern void bootx_init(unsigned long r4, unsigned long phys);
int boot_cpuid = -1;
EXPORT_SYMBOL_GPL(boot_cpuid);
int boot_cpuid_phys;
+EXPORT_SYMBOL_GPL(boot_cpuid_phys);
int smp_hw_index[NR_CPUS];
^ permalink raw reply related
* RE: [v3 PATCH 1/1] booke/kprobe: make program exception to use one dedicated exception stack
From: Chen, Tiejun @ 2011-07-16 3:25 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev@ozlabs.org
In-Reply-To: <20110715134232.56373e03@schlenkerla.am.freescale.net>
> -----Original Message-----
> From: Scott Wood [mailto:scottwood@freescale.com]=20
> Sent: Saturday, July 16, 2011 2:43 AM
> To: Chen, Tiejun
> Cc: Kumar Gala; linuxppc-dev@ozlabs.org
> Subject: Re: [v3 PATCH 1/1] booke/kprobe: make program=20
> exception to use one dedicated exception stack
>=20
> On Fri, 15 Jul 2011 13:28:15 +0800
> tiejun.chen <tiejun.chen@windriver.com> wrote:
>=20
> > Kumar Gala wrote:
> > > I'm still very confused why we need a unique stack frame=20
> for kprobe/program exceptions on book-e devices.
> >=20
> > Its a bug at least for Book-E.
>=20
> But why only booke? There's nothing booke-specific about the=20
I don't mean this is reproduced only on booke, so I use 'at least' carefull=
y to notice we really see this problem on booke.
> stwu instruction.
Please note this root cause to this bug is not related to how to emulate st=
wu instruction. That should be issued from the overlap between an exception=
frame and the kprobed function stack frame on booke. Would you like to see=
that example I showed?
>=20
> > And if you'd like to check another topic thread,
> > "[BUG?]3.0-rc4+ftrace+kprobe: set kprobe at instruction=20
> 'stwu' lead to=20
> > system crash/freeze", you can know this story completely :)
> >=20
> > This bug should not be reproduced on PPC64 with the exception=20
> > prolog/endlog dedicated to PPC64.
>=20
> What if the function you're stepping through is creating a=20
> stack frame that is larger than the 288 bytes that get skipped?
>From that bug email looks PPC64 should be good from some feedbacks. So here=
I think guys who are very familiar with PPC64 can reply this rather than m=
e :)
>=20
> > But I have no enough time to check other PPC32 & !BOOKE so I'm not=20
> > sure if we should extend this modification.
>=20
> Someone should.
This is fine to me.
But currently we have to fix this based on booke firstly.
Tiejun=20
>=20
> -Scott
>=20
> =
^ permalink raw reply
* Re: [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core
From: Benjamin Herrenschmidt @ 2011-07-16 0:20 UTC (permalink / raw)
To: Shan Hai
Cc: tony.luck, Peter Zijlstra, linux-kernel, cmetcalf, dhowells,
David Laight, paulus, tglx, walken, linuxppc-dev, akpm
In-Reply-To: <4E205D96.7010109@gmail.com>
On Fri, 2011-07-15 at 11:32 -0400, Shan Hai wrote:
>
> I agree with you, the problem could be triggered by accessing
> any user space page which has kernel read only permission
> in the page fault disabled context, the problem also affects
> architectures which depend on SW dirty/young tracking as
> stated by Benjamin in this thread.
>
> In the e500 case, the commit 6cfd8990e27d3a491c1c605d6cbc18a46ae51fef
> removed the write permission fixup from TLB miss handlers and left it to
> generic code, so it might be right time to fixup the write permission here
> in the generic code.
But we can't. The must not modify the PTE from an interrupt context and
the "atomic" variants of user accesses can be called in such contexts.
I think the problem is that we try to actually do things other than just
"peek" at user memory (for backtraces etc...) but actually useful things
in page fault disabled contexts. That's bad and various archs mm were
designed with the assumption that this never happens.
If the futex case is seldom here, we could probably find a way to work
around in that specific case.
However, I -still- don't understand why gup didn't fixup the write
permission. gup doesn't set dirty ?
Cheers,
Ben.
^ permalink raw reply
* RE: [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core
From: Benjamin Herrenschmidt @ 2011-07-15 23:47 UTC (permalink / raw)
To: David Laight
Cc: tony.luck, Peter Zijlstra, Shan Hai, linux-kernel, cmetcalf,
dhowells, paulus, tglx, walken, linuxppc-dev, akpm
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6D8ADFC@saturn3.aculab.com>
On Fri, 2011-07-15 at 11:32 +0100, David Laight wrote:
> > The fault causing futex_atomic_cmpxchg_inatomic() is
> > protected by pagefault_disable(), so the page fault handler has
> > no chance to toggle the SW dirty/young tracking.
>
> Perhaps that is the bug!
> Whatever pagefault_disable() does, it shouldn't disable the
> SW dirty/young tracking - which should only needs bits moving
> in the page table itself (and TLB update??) rather than any
> operations on the rest of the data areas.
>
> It looks to me as though this could happen any time a page
> is marked inaccessible by the dirty/young tracking.
> Not just as a result of COW.
Except that for many architectures, there's a hard wired assumption that
the state of the PTEs won't change at interrupt time.
If we allow the "atomic" user accesses, we'll break that rule (think
about perf backtraces for example), and so would have to at -least-
disable interrupts around all the PTE accessors, or use atomic ops,
which will slow things down all over the place.
Cheers,
Ben.
^ permalink raw reply
* Powermac11,2
From: kevin diggs @ 2011-07-15 22:29 UTC (permalink / raw)
To: linuxppc-dev
Hi,
Anyone know what it means when a dual core dual cpu (total 4 cores)
beeps three times on startup? The white power led also blinks 3 times
every few seconds.
The machine in question is a recent acquisition. I have got it to boot once.
Thanks!
kevin
^ permalink raw reply
* Re: [PATCH] powerpc/5200: add GPIO functions for simple interrupt GPIOs
From: Grant Likely @ 2011-07-15 20:08 UTC (permalink / raw)
To: Anatolij Gustschin; +Cc: linuxppc-dev
In-Reply-To: <20110709121409.4ee9249e@wker>
On Sat, Jul 09, 2011 at 12:14:09PM +0200, Anatolij Gustschin wrote:
> On Wed, 6 Jul 2011 11:52:45 -0600
> Grant Likely <grant.likely@secretlab.ca> wrote:
>
> > On Mon, May 23, 2011 at 11:25:30AM +0200, Anatolij Gustschin wrote:
> > > The mpc52xx_gpio driver currently supports 8 wakeup GPIOs and 32
> > > simple GPIOs. Extend it to also support GPIO function of 8 simple
> > > interrupt GPIOs controlled in the standard GPIO register module.
> > >
> > > Signed-off-by: Anatolij Gustschin <agust@denx.de>
> > > ---
> > > arch/powerpc/platforms/52xx/mpc52xx_gpio.c | 117 ++++++++++++++++++++++++++++
> >
> > I don't want to merge more open coded MMIO gpio driver code. This whole driver really needs to be converted to use GENERIC_GPIO.
>
> I'm not sure I understand what you mean. Do you mean
> the conversion to drop of_mm_* stuff?
No, I mean conversion to use the generic gpio register access
functions instead of creating new ones.
g.
^ permalink raw reply
* Re: [v3 PATCH 1/1] booke/kprobe: make program exception to use one dedicated exception stack
From: Scott Wood @ 2011-07-15 18:42 UTC (permalink / raw)
To: tiejun.chen; +Cc: linuxppc-dev
In-Reply-To: <4E1FCFEF.7070702@windriver.com>
On Fri, 15 Jul 2011 13:28:15 +0800
tiejun.chen <tiejun.chen@windriver.com> wrote:
> Kumar Gala wrote:
> > I'm still very confused why we need a unique stack frame for kprobe/program exceptions on book-e devices.
>
> Its a bug at least for Book-E.
But why only booke? There's nothing booke-specific about the stwu
instruction.
> And if you'd like to check another topic thread,
> "[BUG?]3.0-rc4+ftrace+kprobe: set kprobe at instruction 'stwu' lead to system
> crash/freeze", you can know this story completely :)
>
> This bug should not be reproduced on PPC64 with the exception prolog/endlog
> dedicated to PPC64.
What if the function you're stepping through is creating a stack frame that
is larger than the 288 bytes that get skipped?
> But I have no enough time to check other PPC32 & !BOOKE so
> I'm not sure if we should extend this modification.
Someone should.
-Scott
^ permalink raw reply
* RE: [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core
From: Peter Zijlstra @ 2011-07-15 10:39 UTC (permalink / raw)
To: David Laight
Cc: tony.luck, Shan Hai, linux-kernel, cmetcalf, dhowells, paulus,
tglx, walken, linuxppc-dev, akpm
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6D8ADFC@saturn3.aculab.com>
On Fri, 2011-07-15 at 11:32 +0100, David Laight wrote:
> > The fault causing futex_atomic_cmpxchg_inatomic() is
> > protected by pagefault_disable(), so the page fault handler has
> > no chance to toggle the SW dirty/young tracking.
>=20
> Perhaps that is the bug!
> Whatever pagefault_disable() does, it shouldn't disable the
> SW dirty/young tracking - which should only needs bits moving
> in the page table itself (and TLB update??) rather than any
> operations on the rest of the data areas.
>=20
> It looks to me as though this could happen any time a page
> is marked inaccessible by the dirty/young tracking.
> Not just as a result of COW.
I've thought of that as well, but couldn't find the actual code in the
ppc fault bits. It could be it relies on vma information to know what it
should allow due to lack of bits in the pte.
If it requires the vma, it requires mmap_sem and it thus has to avoid
the whole thing when pagefault_disabled().
^ permalink raw reply
* Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
From: Peter Zijlstra @ 2011-07-15 15:24 UTC (permalink / raw)
To: Shan Hai
Cc: tony.luck, linux-kernel, cmetcalf, dhowells, paulus, tglx, walken,
linuxppc-dev, akpm
In-Reply-To: <4E205A63.90401@gmail.com>
On Fri, 2011-07-15 at 11:18 -0400, Shan Hai wrote:
> >> + vma =3D find_vma(mm, address);
> > Uhm, find_vma() needs mmap_sem, and futex_atomic_cmpxchg_inatomic() is
> > most certainly not called with that lock held.
> >
>=20
> My fault, that will be fixed in the V2 patch.
But you cannot, the function isn't called _atomic_ just for kicks, its
used while holding spinlocks.
> >> + if (likely(vma)) {
> >> + /* only fixup present page */
> >> + if (follow_page(vma, address, FOLL_WRITE)) {
> >> + handle_mm_fault(mm, vma, address, FAULT_FLAG_WRITE);
> > So how can this toggle your sw dirty/young tracking, that's pretty much
> > what gup(.write=3D1) does too!
> >
>=20
> because of the kernel read only permission of the page is transparent
> to the follow_page(), the handle_mm_fault() is not to be activated
> in the __get_use_pages(), so the gup(.write=3D1) could not help to fixup
> the write permission.
So why do you need the vma? Is it like I wrote earlier that you don't
have spare PTE bits and need the vma flags to see if it may become
writable?
gup(.write=3D1) not triggering this is a serious problem though, not
something you can just paper over. I wouldn't be at all surprised to
find there's more things broken because of that.
^ permalink raw reply
* [v2 PATCH 1/1] powerpc/4xx: enable and fix pcie gen1/gen2 on the 460sx
From: Ayman Elkhashab @ 2011-07-15 16:40 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Tony Breeds, linuxppc-dev,
linux-kernel
Cc: Ayman El-Khashab
In-Reply-To: <1310603611-8960-2-git-send-email-ayman@elkhashab.com>
From: Ayman El-Khashab <ayman@elkhashab.com>
Adds a register to the config space for the 460sx. Changes the vc0
detect to a pll detect. maps configuration space to test the link
status. changes the setup to enable gen2 devices to operate at gen2
speeds. fixes mapping that was not correct for the 460sx. added
bit definitions for the OMRxMSKL registers
tested on the 460sx eiger and custom board
Signed-off-by: Ayman El-Khashab <ayman@elkhashab.com>
---
arch/powerpc/sysdev/ppc4xx_pci.c | 89 ++++++++++++++++++++++++++++++-------
arch/powerpc/sysdev/ppc4xx_pci.h | 12 +++++
2 files changed, 84 insertions(+), 17 deletions(-)
diff --git a/arch/powerpc/sysdev/ppc4xx_pci.c b/arch/powerpc/sysdev/ppc4xx_pci.c
index ad330fe..eeeeb12 100644
--- a/arch/powerpc/sysdev/ppc4xx_pci.c
+++ b/arch/powerpc/sysdev/ppc4xx_pci.c
@@ -1092,6 +1092,10 @@ static int __init ppc460sx_pciex_core_init(struct device_node *np)
mtdcri(SDR0, PESDR1_460SX_HSSSLEW, 0xFFFF0000);
mtdcri(SDR0, PESDR2_460SX_HSSSLEW, 0xFFFF0000);
+ /* Set HSS PRBS enabled */
+ mtdcri(SDR0, PESDR0_460SX_HSSCTLSET, 0x00001130);
+ mtdcri(SDR0, PESDR2_460SX_HSSCTLSET, 0x00001130);
+
udelay(100);
/* De-assert PLLRESET */
@@ -1132,9 +1136,6 @@ static int ppc460sx_pciex_init_port_hw(struct ppc4xx_pciex_port *port)
dcri_clrset(SDR0, port->sdr_base + PESDRn_UTLSET2,
0, 0x01000000);
- /*Gen-1*/
- mtdcri(SDR0, port->sdr_base + PESDRn_460SX_RCEI, 0x08000000);
-
dcri_clrset(SDR0, port->sdr_base + PESDRn_RCSSET,
(PESDRx_RCSSET_RSTGU | PESDRx_RCSSET_RSTDL),
PESDRx_RCSSET_RSTPYN);
@@ -1148,14 +1149,42 @@ static int ppc460sx_pciex_init_utl(struct ppc4xx_pciex_port *port)
{
/* Max 128 Bytes */
out_be32 (port->utl_base + PEUTL_PBBSZ, 0x00000000);
+ /* Assert VRB and TXE - per datasheet turn off addr validation */
+ out_be32(port->utl_base + PEUTL_PCTL, 0x80800000);
return 0;
}
+static void __init ppc460sx_pciex_check_link(struct ppc4xx_pciex_port *port)
+{
+ void __iomem *mbase;
+ int attempt = 50;
+
+ port->link = 0;
+
+ mbase = ioremap(port->cfg_space.start + 0x10000000, 0x1000);
+ if (mbase == NULL) {
+ printk(KERN_ERR "%s: Can't map internal config space !",
+ port->node->full_name);
+ goto done;
+ }
+
+ while (attempt && (0 == (in_le32(mbase + PECFG_460SX_DLLSTA)
+ & PECFG_460SX_DLLSTA_LINKUP))) {
+ attempt--;
+ mdelay(10);
+ }
+ if (attempt)
+ port->link = 1;
+done:
+ iounmap(mbase);
+
+}
+
static struct ppc4xx_pciex_hwops ppc460sx_pcie_hwops __initdata = {
.core_init = ppc460sx_pciex_core_init,
.port_init_hw = ppc460sx_pciex_init_port_hw,
.setup_utl = ppc460sx_pciex_init_utl,
- .check_link = ppc4xx_pciex_check_link_sdr,
+ .check_link = ppc460sx_pciex_check_link,
};
#endif /* CONFIG_44x */
@@ -1338,15 +1367,15 @@ static int __init ppc4xx_pciex_port_init(struct ppc4xx_pciex_port *port)
if (rc != 0)
return rc;
- if (ppc4xx_pciex_hwops->check_link)
- ppc4xx_pciex_hwops->check_link(port);
-
/*
* Initialize mapping: disable all regions and configure
* CFG and REG regions based on resources in the device tree
*/
ppc4xx_pciex_port_init_mapping(port);
+ if (ppc4xx_pciex_hwops->check_link)
+ ppc4xx_pciex_hwops->check_link(port);
+
/*
* Map UTL
*/
@@ -1360,13 +1389,23 @@ static int __init ppc4xx_pciex_port_init(struct ppc4xx_pciex_port *port)
ppc4xx_pciex_hwops->setup_utl(port);
/*
- * Check for VC0 active and assert RDY.
+ * Check for VC0 active or PLL Locked and assert RDY.
*/
if (port->sdr_base) {
- if (port->link &&
- ppc4xx_pciex_wait_on_sdr(port, PESDRn_RCSSTS,
- 1 << 16, 1 << 16, 5000)) {
- printk(KERN_INFO "PCIE%d: VC0 not active\n", port->index);
+ if (of_device_is_compatible(port->node,
+ "ibm,plb-pciex-460sx")){
+ if (port->link && ppc4xx_pciex_wait_on_sdr(port,
+ PESDRn_RCSSTS,
+ 1 << 12, 1 << 12, 5000)) {
+ printk(KERN_INFO "PCIE%d: PLL not locked\n",
+ port->index);
+ port->link = 0;
+ }
+ } else if (port->link &&
+ ppc4xx_pciex_wait_on_sdr(port, PESDRn_RCSSTS,
+ 1 << 16, 1 << 16, 5000)) {
+ printk(KERN_INFO "PCIE%d: VC0 not active\n",
+ port->index);
port->link = 0;
}
@@ -1573,8 +1612,15 @@ static int __init ppc4xx_setup_one_pciex_POM(struct ppc4xx_pciex_port *port,
dcr_write(port->dcrs, DCRO_PEGPL_OMR1BAH, lah);
dcr_write(port->dcrs, DCRO_PEGPL_OMR1BAL, lal);
dcr_write(port->dcrs, DCRO_PEGPL_OMR1MSKH, 0x7fffffff);
- /* Note that 3 here means enabled | single region */
- dcr_write(port->dcrs, DCRO_PEGPL_OMR1MSKL, sa | 3);
+ /*Enabled and single region */
+ if (of_device_is_compatible(port->node, "ibm,plb-pciex-460sx"))
+ dcr_write(port->dcrs, DCRO_PEGPL_OMR1MSKL,
+ sa | DCRO_PEGPL_460SX_OMR1MSKL_UOT
+ | DCRO_PEGPL_OMRxMSKL_VAL);
+ else
+ dcr_write(port->dcrs, DCRO_PEGPL_OMR1MSKL,
+ sa | DCRO_PEGPL_OMR1MSKL_UOT
+ | DCRO_PEGPL_OMRxMSKL_VAL);
break;
case 1:
out_le32(mbase + PECFG_POM1LAH, pciah);
@@ -1582,8 +1628,8 @@ static int __init ppc4xx_setup_one_pciex_POM(struct ppc4xx_pciex_port *port,
dcr_write(port->dcrs, DCRO_PEGPL_OMR2BAH, lah);
dcr_write(port->dcrs, DCRO_PEGPL_OMR2BAL, lal);
dcr_write(port->dcrs, DCRO_PEGPL_OMR2MSKH, 0x7fffffff);
- /* Note that 3 here means enabled | single region */
- dcr_write(port->dcrs, DCRO_PEGPL_OMR2MSKL, sa | 3);
+ dcr_write(port->dcrs, DCRO_PEGPL_OMR2MSKL,
+ sa | DCRO_PEGPL_OMRxMSKL_VAL);
break;
case 2:
out_le32(mbase + PECFG_POM2LAH, pciah);
@@ -1592,7 +1638,9 @@ static int __init ppc4xx_setup_one_pciex_POM(struct ppc4xx_pciex_port *port,
dcr_write(port->dcrs, DCRO_PEGPL_OMR3BAL, lal);
dcr_write(port->dcrs, DCRO_PEGPL_OMR3MSKH, 0x7fffffff);
/* Note that 3 here means enabled | IO space !!! */
- dcr_write(port->dcrs, DCRO_PEGPL_OMR3MSKL, sa | 3);
+ dcr_write(port->dcrs, DCRO_PEGPL_OMR3MSKL,
+ sa | DCRO_PEGPL_OMR3MSKL_IO
+ | DCRO_PEGPL_OMRxMSKL_VAL);
break;
}
@@ -1693,6 +1741,9 @@ static void __init ppc4xx_configure_pciex_PIMs(struct ppc4xx_pciex_port *port,
if (res->flags & IORESOURCE_PREFETCH)
sa |= 0x8;
+ if (of_device_is_compatible(port->node, "ibm,plb-pciex-460sx"))
+ sa |= PCI_BASE_ADDRESS_MEM_TYPE_64;
+
out_le32(mbase + PECFG_BAR0HMPA, RES_TO_U32_HIGH(sa));
out_le32(mbase + PECFG_BAR0LMPA, RES_TO_U32_LOW(sa));
@@ -1854,6 +1905,10 @@ static void __init ppc4xx_pciex_port_setup_hose(struct ppc4xx_pciex_port *port)
}
out_le16(mbase + 0x202, val);
+ /* Enable Bus master, memory, and io space */
+ if (of_device_is_compatible(port->node, "ibm,plb-pciex-460sx"))
+ out_le16(mbase + 0x204, 0x7);
+
if (!port->endpoint) {
/* Set Class Code to PCI-PCI bridge and Revision Id to 1 */
out_le32(mbase + 0x208, 0x06040001);
diff --git a/arch/powerpc/sysdev/ppc4xx_pci.h b/arch/powerpc/sysdev/ppc4xx_pci.h
index 56d9e5d..61b3659 100644
--- a/arch/powerpc/sysdev/ppc4xx_pci.h
+++ b/arch/powerpc/sysdev/ppc4xx_pci.h
@@ -464,6 +464,18 @@
#define PECFG_POM2LAL 0x390
#define PECFG_POM2LAH 0x394
+/* 460sx only */
+#define PECFG_460SX_DLLSTA 0x3f8
+
+/* 460sx Bit Mappings */
+#define PECFG_460SX_DLLSTA_LINKUP 0x00000010
+#define DCRO_PEGPL_460SX_OMR1MSKL_UOT 0x00000004
+
+/* PEGPL Bit Mappings */
+#define DCRO_PEGPL_OMRxMSKL_VAL 0x00000001
+#define DCRO_PEGPL_OMR1MSKL_UOT 0x00000002
+#define DCRO_PEGPL_OMR3MSKL_IO 0x00000002
+
/* SDR Bit Mappings */
#define PESDRx_RCSSET_HLDPLB 0x10000000
#define PESDRx_RCSSET_RSTGU 0x01000000
--
1.7.4.3
^ permalink raw reply related
* Re: [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core
From: Shan Hai @ 2011-07-15 15:32 UTC (permalink / raw)
To: David Laight
Cc: tony.luck, Peter Zijlstra, linux-kernel, cmetcalf, dhowells,
paulus, tglx, walken, linuxppc-dev, akpm
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6D8ADFC@saturn3.aculab.com>
On 07/15/2011 06:32 AM, David Laight wrote:
>
>> The fault causing futex_atomic_cmpxchg_inatomic() is
>> protected by pagefault_disable(), so the page fault handler has
>> no chance to toggle the SW dirty/young tracking.
> Perhaps that is the bug!
> Whatever pagefault_disable() does, it shouldn't disable the
> SW dirty/young tracking - which should only needs bits moving
> in the page table itself (and TLB update??) rather than any
> operations on the rest of the data areas.
>
> It looks to me as though this could happen any time a page
> is marked inaccessible by the dirty/young tracking.
> Not just as a result of COW.
>
I agree with you, the problem could be triggered by accessing
any user space page which has kernel read only permission
in the page fault disabled context, the problem also affects
architectures which depend on SW dirty/young tracking as
stated by Benjamin in this thread.
In the e500 case, the commit 6cfd8990e27d3a491c1c605d6cbc18a46ae51fef
removed the write permission fixup from TLB miss handlers and left it to
generic code, so it might be right time to fixup the write permission here
in the generic code.
Thanks
Shan Hai
> David
>
>
^ permalink raw reply
* Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
From: Shan Hai @ 2011-07-15 15:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: tony.luck, linux-kernel, cmetcalf, dhowells, paulus, tglx, walken,
linuxppc-dev, akpm
In-Reply-To: <1310725418.2586.309.camel@twins>
On 07/15/2011 06:23 AM, Peter Zijlstra wrote:
> On Fri, 2011-07-15 at 16:07 +0800, Shan Hai wrote:
>> The kernel has no write permission on COW pages by default on e500 core, this
>> will cause endless loop in futex_lock_pi, because futex code assumes the kernel
>> has write permission on COW pages. Grant write permission to the kernel on COW
>> pages when access violation page fault occurs.
>>
>> Signed-off-by: Shan Hai<haishan.bai@gmail.com>
>> ---
>> arch/powerpc/include/asm/futex.h | 11 ++++++++++-
>> arch/powerpc/include/asm/tlb.h | 25 +++++++++++++++++++++++++
>> 2 files changed, 35 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h
>> index c94e4a3..54c3e74 100644
>> --- a/arch/powerpc/include/asm/futex.h
>> +++ b/arch/powerpc/include/asm/futex.h
>> @@ -8,6 +8,7 @@
>> #include<asm/errno.h>
>> #include<asm/synch.h>
>> #include<asm/asm-compat.h>
>> +#include<asm/tlb.h>
>>
>> #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \
>> __asm__ __volatile ( \
>> @@ -113,7 +114,15 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
>> : "cc", "memory");
>>
>> *uval = prev;
>> - return ret;
>> +
>> + /* Futex assumes the kernel has permission to write to
>> + * COW pages, grant the kernel write permission on COW
>> + * pages because it has none by default.
>> + */
>> + if (ret == -EFAULT)
>> + __tlb_fixup_write_permission(current->mm, (unsigned long)uaddr);
>> +
>> + return ret;
>> }
>>
>> #endif /* __KERNEL__ */
>> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
>> index e2b428b..3863c6a 100644
>> --- a/arch/powerpc/include/asm/tlb.h
>> +++ b/arch/powerpc/include/asm/tlb.h
>> @@ -45,5 +45,30 @@ static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
>> #endif
>> }
>>
>> +/* Grant write permission to the kernel on a page. */
>> +static inline void __tlb_fixup_write_permission(struct mm_struct *mm,
>> + unsigned long address)
>> +{
>> +#if defined(CONFIG_FSL_BOOKE)
>> + /* Grant write permission to the kernel on a page by setting TLB.SW
>> + * bit, the bit setting operation is tricky here, calling
>> + * handle_mm_fault with FAULT_FLAG_WRITE causes _PAGE_DIRTY bit of
>> + * the pte to be set, the _PAGE_DIRTY of the pte is translated into
>> + * TLB.SW on Powerpc e500 core.
>> + */
>> +
>> + struct vm_area_struct *vma;
>> +
>> + vma = find_vma(mm, address);
> Uhm, find_vma() needs mmap_sem, and futex_atomic_cmpxchg_inatomic() is
> most certainly not called with that lock held.
>
My fault, that will be fixed in the V2 patch.
>> + if (likely(vma)) {
>> + /* only fixup present page */
>> + if (follow_page(vma, address, FOLL_WRITE)) {
>> + handle_mm_fault(mm, vma, address, FAULT_FLAG_WRITE);
> So how can this toggle your sw dirty/young tracking, that's pretty much
> what gup(.write=1) does too!
>
because of the kernel read only permission of the page is transparent
to the follow_page(), the handle_mm_fault() is not to be activated
in the __get_use_pages(), so the gup(.write=1) could not help to fixup
the write permission.
Thanks
Shan Hai
>> + flush_tlb_page(vma, address);
>> + }
>> + }
>> +#endif
>> +}
>> +
>> #endif /* __KERNEL__ */
>> #endif /* __ASM_POWERPC_TLB_H */
^ permalink raw reply
* Re: [PATCH v2] dtc: Remove unused variable in flat_read_mem_reserve
From: David Gibson @ 2011-07-15 13:57 UTC (permalink / raw)
To: Josh Boyer; +Cc: jdl, linuxppc-dev
In-Reply-To: <20110628134711.GF10237@zod.rchland.ibm.com>
On Tue, Jun 28, 2011 at 09:47:11AM -0400, Josh Boyer wrote:
> The *p variable is declared and used to save inb->ptr, however p is
> later never used. This has been the case since commit 6c0f3676 and can
> lead to build failures with -Werror=unused-but-set-variable:
>
> flattree.c: In function 'flat_read_mem_reserve':
> flattree.c:700:14: error: variable 'p' set but not used [-Werror=unused-but-set-variable]
> cc1: all warnings being treated as errors
> make: *** [flattree.o] Error 1
>
> Remove the variable.
>
> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Jon, please apply.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply
* Re: [PATCH] dtc: Remove unused check variable
From: David Gibson @ 2011-07-15 13:57 UTC (permalink / raw)
To: Josh Boyer; +Cc: Jon Loeliger, linuxppc-dev
In-Reply-To: <20110628124709.GC10237@zod.rchland.ibm.com>
On Tue, Jun 28, 2011 at 08:47:09AM -0400, Josh Boyer wrote:
> Commit 376ab6f2 removed the old style check functionality from DTC,
> however the check option and variable were not removed. This leads to
> build failures when -Werror=unused-but-set-variable is specified:
>
> dtc.c: In function 'main':
> dtc.c:102:17: error: variable 'check' set but not used [-Werror=unused-but-set-variable]
> cc1: all warnings being treated as errors
> make: *** [dtc.o] Error 1
> make: *** Waiting for unfinished jobs....
>
> Remove the check variable.
>
> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Jon, please apply.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply
* RE: [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core
From: David Laight @ 2011-07-15 10:32 UTC (permalink / raw)
To: Shan Hai, Peter Zijlstra
Cc: tony.luck, linux-kernel, cmetcalf, dhowells, paulus, tglx, walken,
linuxppc-dev, akpm
In-Reply-To: <4E20112C.6040307@gmail.com>
=20
> The fault causing futex_atomic_cmpxchg_inatomic() is
> protected by pagefault_disable(), so the page fault handler has
> no chance to toggle the SW dirty/young tracking.
Perhaps that is the bug!
Whatever pagefault_disable() does, it shouldn't disable the
SW dirty/young tracking - which should only needs bits moving
in the page table itself (and TLB update??) rather than any
operations on the rest of the data areas.
It looks to me as though this could happen any time a page
is marked inaccessible by the dirty/young tracking.
Not just as a result of COW.
David
^ permalink raw reply
* Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
From: Peter Zijlstra @ 2011-07-15 10:23 UTC (permalink / raw)
To: Shan Hai
Cc: tony.luck, linux-kernel, cmetcalf, dhowells, paulus, tglx, walken,
linuxppc-dev, akpm
In-Reply-To: <1310717238-13857-2-git-send-email-haishan.bai@gmail.com>
On Fri, 2011-07-15 at 16:07 +0800, Shan Hai wrote:
> The kernel has no write permission on COW pages by default on e500 core, =
this
> will cause endless loop in futex_lock_pi, because futex code assumes the =
kernel
> has write permission on COW pages. Grant write permission to the kernel o=
n COW
> pages when access violation page fault occurs.
>=20
> Signed-off-by: Shan Hai <haishan.bai@gmail.com>
> ---
> arch/powerpc/include/asm/futex.h | 11 ++++++++++-
> arch/powerpc/include/asm/tlb.h | 25 +++++++++++++++++++++++++
> 2 files changed, 35 insertions(+), 1 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/=
futex.h
> index c94e4a3..54c3e74 100644
> --- a/arch/powerpc/include/asm/futex.h
> +++ b/arch/powerpc/include/asm/futex.h
> @@ -8,6 +8,7 @@
> #include <asm/errno.h>
> #include <asm/synch.h>
> #include <asm/asm-compat.h>
> +#include <asm/tlb.h>
> =20
> #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \
> __asm__ __volatile ( \
> @@ -113,7 +114,15 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user =
*uaddr,
> : "cc", "memory");
> =20
> *uval =3D prev;
> - return ret;
> +
> + /* Futex assumes the kernel has permission to write to
> + * COW pages, grant the kernel write permission on COW
> + * pages because it has none by default.
> + */
> + if (ret =3D=3D -EFAULT)
> + __tlb_fixup_write_permission(current->mm, (unsigned long)uaddr);
> +
> + return ret;
> }
> =20
> #endif /* __KERNEL__ */
> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tl=
b.h
> index e2b428b..3863c6a 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -45,5 +45,30 @@ static inline void __tlb_remove_tlb_entry(struct mmu_g=
ather *tlb, pte_t *ptep,
> #endif
> }
> =20
> +/* Grant write permission to the kernel on a page. */
> +static inline void __tlb_fixup_write_permission(struct mm_struct *mm,
> + unsigned long address)
> +{
> +#if defined(CONFIG_FSL_BOOKE)
> + /* Grant write permission to the kernel on a page by setting TLB.SW
> + * bit, the bit setting operation is tricky here, calling
> + * handle_mm_fault with FAULT_FLAG_WRITE causes _PAGE_DIRTY bit of
> + * the pte to be set, the _PAGE_DIRTY of the pte is translated into
> + * TLB.SW on Powerpc e500 core.
> + */
> +
> + struct vm_area_struct *vma;
> +
> + vma =3D find_vma(mm, address);
Uhm, find_vma() needs mmap_sem, and futex_atomic_cmpxchg_inatomic() is
most certainly not called with that lock held.
> + if (likely(vma)) {
> + /* only fixup present page */
> + if (follow_page(vma, address, FOLL_WRITE)) {
> + handle_mm_fault(mm, vma, address, FAULT_FLAG_WRITE);
So how can this toggle your sw dirty/young tracking, that's pretty much
what gup(.write=3D1) does too!
> + flush_tlb_page(vma, address);
> + }
> + }
> +#endif
> +}
> +
> #endif /* __KERNEL__ */
> #endif /* __ASM_POWERPC_TLB_H */
^ permalink raw reply
* Re: [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core
From: Shan Hai @ 2011-07-15 10:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: tony.luck, linux-kernel, cmetcalf, dhowells, paulus, tglx, walken,
linuxppc-dev, akpm
In-Reply-To: <1310723441.2586.291.camel@twins>
On 07/15/2011 05:50 PM, Peter Zijlstra wrote:
> On Fri, 2011-07-15 at 17:08 +0800, Shan Hai wrote:
>> The whole scenario should be,
>> - the child process triggers a page fault at the first time access to
>> the lock, and it got its own writable page, but its *clean* for
>> the reason just for checking the status of the lock.
>> I am sorry for above "unbreakable COW".
>> - the futex_lock_pi() is invoked because of the lock contention,
>> and the futex_atomic_cmpxchg_inatomic() tries to get the lock,
>> it found out the lock is free so tries to write to the lock for
>> reservation, a page fault occurs, because the page is read only
>> for kernel(e500 specific), and returns -EFAULT to the caller
>> - the fault_in_user_writeable() tries to fix the fault,
>> but from the get_user_pages() view everything is ok, because
>> the COW was already broken, retry futex_lock_pi_atomic()
> but that's a bug right there, gup(.write=1) _should_ be a complete write
> fault, and as such toggle your sw dirty/young tracking.
>
The fault causing futex_atomic_cmpxchg_inatomic() is
protected by pagefault_disable(), so the page fault handler has
no chance to toggle the SW dirty/young tracking.
Thanks
Shan Hai
>> - futex_lock_pi_atomic() --> futex_atomic_cmpxchg_inatomic(),
>> another write protection page fault
>> - infinite loop
^ permalink raw reply
* Re: [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core
From: Peter Zijlstra @ 2011-07-15 9:50 UTC (permalink / raw)
To: Shan Hai
Cc: tony.luck, linux-kernel, cmetcalf, dhowells, paulus, tglx, walken,
linuxppc-dev, akpm
In-Reply-To: <4E20037C.5070506@gmail.com>
On Fri, 2011-07-15 at 17:08 +0800, Shan Hai wrote:
> The whole scenario should be,
> - the child process triggers a page fault at the first time access to
> the lock, and it got its own writable page, but its *clean* for
> the reason just for checking the status of the lock.
> I am sorry for above "unbreakable COW".
> - the futex_lock_pi() is invoked because of the lock contention,
> and the futex_atomic_cmpxchg_inatomic() tries to get the lock,
> it found out the lock is free so tries to write to the lock for
> reservation, a page fault occurs, because the page is read only
> for kernel(e500 specific), and returns -EFAULT to the caller
> - the fault_in_user_writeable() tries to fix the fault,
> but from the get_user_pages() view everything is ok, because
> the COW was already broken, retry futex_lock_pi_atomic()
but that's a bug right there, gup(.write=3D1) _should_ be a complete write
fault, and as such toggle your sw dirty/young tracking.
> - futex_lock_pi_atomic() --> futex_atomic_cmpxchg_inatomic(),
> another write protection page fault
> - infinite loop
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox