* Re: [PATCH 3/3] mm: Use ptep/pmdp_set_numa for updating _PAGE_NUMA bit
From: Mel Gorman @ 2014-02-11 17:07 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: riel, linux-mm, paulus, linuxppc-dev
In-Reply-To: <1392114895-14997-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Tue, Feb 11, 2014 at 04:04:55PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions. ppc64 also doesn't implement
> flush_tlb_range. ppc64 require the tlb flushing to be batched within ptl locks. The reason
> to do that is to ensure that the hash page table is in sync with linux page table.
> We track the hpte index in linux pte and if we clear them without flushing hash and drop the
> ptl lock, we can have another cpu update the pte and can end up with double hash. We also want
> to keep set_pte_at simpler by not requiring them to do hash flush for performance reason.
> Hence cannot use them while updating _PAGE_NUMA bit. Add new functions for marking pte/pmd numa
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
--
Mel Gorman
SUSE Labs
^ permalink raw reply
* Re: [PATCH V2] powerpc: thp: Fix crash on mremap
From: Greg KH @ 2014-02-11 17:31 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, stable
In-Reply-To: <1391781117-19045-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Fri, Feb 07, 2014 at 07:21:57PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> This patch fix the below crash
>
> NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> ...
> Call Trace:
> [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
>
> On ppc64 we use the pgtable for storing the hpte slot information and
> store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> from new pmd.
>
> We also want to move the withdraw and deposit before the set_pmd so
> that, when page fault find the pmd as trans huge we can be sure that
> pgtable can be located at the offset.
>
> variant of upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
> for 3.12 stable series
This doesn't look like a "variant", it looks totally different. Why
can't I just take the b3084f4db3aeb991c507ca774337c7e7893ed04f patch
(and follow-on fix) for 3.12?
I _REALLY_ dislike patches that are totally different from Linus's tree
in stable trees, it has caused nothing but problems in the past.
greg k-h
^ permalink raw reply
* Re: [PATCH v2] powerpc ticket locks
From: Scott Wood @ 2014-02-11 18:30 UTC (permalink / raw)
To: Torsten Duwe
Cc: Tom Musta, Peter Zijlstra, Raghavendra KT, Raghavendra KT,
Linux Kernel Mailing List, Paul Mackerras, Anton Blanchard,
Paul E. McKenney, linuxppc-dev, Ingo Molnar
In-Reply-To: <20140211104030.GG2107@lst.de>
On Tue, 2014-02-11 at 11:40 +0100, Torsten Duwe wrote:
> On Tue, Feb 11, 2014 at 03:23:51PM +0530, Raghavendra KT wrote:
> > How much important to have holder information for PPC? From my
> > previous experiment
> > on x86, it was lock-waiter preemption which is problematic rather than
> > lock-holder preemption.
>
> It's something very special to IBM pSeries: the hypervisor can assign
> fractions of physical CPUs to guests. Sometimes a guest with 4 quarter
> CPUs will be faster than 1 monoprocessor. (correct me if I'm wrong).
>
> The directed yield resolves the silly situation when holder and waiter
> reside on the same physical CPU, as I understand it.
>
> x86 has nothing comparable.
How is this different from the very ordinary case of an SMP KVM guest
whose vcpus are not bound to host cpus, and thus you could have multiple
vcpus running on the same host cpu?
-Scott
^ permalink raw reply
* Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node
From: Christoph Lameter @ 2014-02-11 18:45 UTC (permalink / raw)
To: Joonsoo Kim
Cc: Han Pingtian, Nishanth Aravamudan, Matt Mackall, Pekka Enberg,
Linux Memory Management List, Paul Mackerras, Anton Blanchard,
David Rientjes, linuxppc-dev, Wanpeng Li
In-Reply-To: <20140210012918.GD12574@lge.com>
On Mon, 10 Feb 2014, Joonsoo Kim wrote:
> On Fri, Feb 07, 2014 at 12:51:07PM -0600, Christoph Lameter wrote:
> > Here is a draft of a patch to make this work with memoryless nodes.
> >
> > The first thing is that we modify node_match to also match if we hit an
> > empty node. In that case we simply take the current slab if its there.
>
> Why not inspecting whether we can get the page on the best node such as
> numa_mem_id() node?
Its expensive to do so.
> empty_node cannot be set on memoryless node, since page allocation would
> succeed on different node.
Ok then we need to add a check for being on the rignt node there too.
^ permalink raw reply
* Re: [PATCH 3/3] mm: Use ptep/pmdp_set_numa for updating _PAGE_NUMA bit
From: Benjamin Herrenschmidt @ 2014-02-11 18:49 UTC (permalink / raw)
To: Mel Gorman
Cc: riel, linux-mm, paulus, Aneesh Kumar K.V, Andrew Morton,
linuxppc-dev
In-Reply-To: <20140211170724.GM6732@suse.de>
On Tue, 2014-02-11 at 17:07 +0000, Mel Gorman wrote:
> On Tue, Feb 11, 2014 at 04:04:55PM +0530, Aneesh Kumar K.V wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >
> > Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions. ppc64 also doesn't implement
> > flush_tlb_range. ppc64 require the tlb flushing to be batched within ptl locks. The reason
> > to do that is to ensure that the hash page table is in sync with linux page table.
> > We track the hpte index in linux pte and if we clear them without flushing hash and drop the
> > ptl lock, we can have another cpu update the pte and can end up with double hash. We also want
> > to keep set_pte_at simpler by not requiring them to do hash flush for performance reason.
> > Hence cannot use them while updating _PAGE_NUMA bit. Add new functions for marking pte/pmd numa
> >
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> Acked-by: Mel Gorman <mgorman@suse.de>
>
How do you guys want me to proceed ? Will you (or Andrew) send these to
Linus or should I do it myself ?
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH V2] powerpc: thp: Fix crash on mremap
From: Benjamin Herrenschmidt @ 2014-02-11 18:52 UTC (permalink / raw)
To: Greg KH; +Cc: linuxppc-dev, paulus, Aneesh Kumar K.V, stable
In-Reply-To: <20140211173129.GB30336@kroah.com>
On Tue, 2014-02-11 at 09:31 -0800, Greg KH wrote:
> On Fri, Feb 07, 2014 at 07:21:57PM +0530, Aneesh Kumar K.V wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >
> > This patch fix the below crash
> >
> > NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> > LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> > ...
> > Call Trace:
> > [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> > [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> > [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> >
> > On ppc64 we use the pgtable for storing the hpte slot information and
> > store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> > pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> > the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> > from new pmd.
> >
> > We also want to move the withdraw and deposit before the set_pmd so
> > that, when page fault find the pmd as trans huge we can be sure that
> > pgtable can be located at the offset.
> >
> > variant of upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
> > for 3.12 stable series
>
> This doesn't look like a "variant", it looks totally different. Why
> can't I just take the b3084f4db3aeb991c507ca774337c7e7893ed04f patch
> (and follow-on fix) for 3.12?
>
> I _REALLY_ dislike patches that are totally different from Linus's tree
> in stable trees, it has caused nothing but problems in the past.
I don't think it applies... (I tried on an internal tree) but the
affected function changed in 3.13 in various ways. Aneesh, please
provide a more details explanation and whether we should backport those
other changes too or whether this is not necessary.
BTW. Aneesh, we need a 3.11.x one too
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH v2] powerpc ticket locks
From: Benjamin Herrenschmidt @ 2014-02-11 19:34 UTC (permalink / raw)
To: Scott Wood
Cc: Tom Musta, Peter Zijlstra, Raghavendra KT, Raghavendra KT,
Linux Kernel Mailing List, Torsten Duwe, Anton Blanchard,
Paul Mackerras, Paul E. McKenney, linuxppc-dev, Ingo Molnar
In-Reply-To: <1392143455.6733.386.camel@snotra.buserror.net>
On Tue, 2014-02-11 at 12:30 -0600, Scott Wood wrote:
> > It's something very special to IBM pSeries: the hypervisor can assign
> > fractions of physical CPUs to guests. Sometimes a guest with 4 quarter
> > CPUs will be faster than 1 monoprocessor. (correct me if I'm wrong).
> >
> > The directed yield resolves the silly situation when holder and waiter
> > reside on the same physical CPU, as I understand it.
> >
> > x86 has nothing comparable.
>
> How is this different from the very ordinary case of an SMP KVM guest
> whose vcpus are not bound to host cpus, and thus you could have multiple
> vcpus running on the same host cpu?
It's not really ... though I can see drawbacks with the scheme as well
and I think in KVM we should be careful to only confer if the owner
vcpu last scheduled on the same physical cpu where the waiter is, other
wise, there's too much chances of us bouncing things around the machine
for minor contention cases.
Paul, what's your policy today ?
Cheers,
Ben.
^ permalink raw reply
* Linux-3.14-rc2: Order of serial node compatibles in DTS files.
From: Stephen N Chivers @ 2014-02-11 20:57 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Chris Proctor
In-Reply-To: <20140211072606.GA26514@visitor2.iram.es>
I have been trial booting a 3.14-rc2 kernel for a 85xx platform
(dtbImage).
After mounting the root filesystem there are no messages from the init
scripts
and the serial console is not available for login.
In the kernel log messages there is:
of_serial f1004500.serial: Unknown serial port found, ignored.
The serial nodes in boards dts file are specified as:
serial0: serial@4500 {
cell-index = <0>;
device_type = "serial";
compatible = "fsl,ns16550", "ns16550";
reg = <0x4500 0x100>;
clock-frequency = <0>;
interrupts = <0x2a 0x2>;
interrupt-parent = <&mpic>;
};
Reversing the order of the compatible:
compatible = "ns16550", "fsl,ns16550";
restores the serial console.
Linux-3.13 does not have this behaviour.
There are 49 dts files in Linux-3.14-rc2 that have the fsl,ns16550
compatible first.
Stephen Chivers,
CSC Australia Pty. Ltd.
^ permalink raw reply
* Re: [PATCH V4 2/3] tick/cpuidle: Initialize hrtimer mode of broadcast
From: Daniel Lezcano @ 2014-02-11 22:01 UTC (permalink / raw)
To: Thomas Gleixner
Cc: deepthi, linux-pm, peterz, rafael.j.wysocki, linux-kernel, paulus,
srivatsa.bhat, fweisbec, Preeti U Murthy, paulmck, linuxppc-dev,
mingo
In-Reply-To: <alpine.DEB.2.02.1402111654190.21991@ionos.tec.linutronix.de>
On 02/11/2014 04:58 PM, Thomas Gleixner wrote:
> On Tue, 11 Feb 2014, Daniel Lezcano wrote:
>> On 02/07/2014 09:06 AM, Preeti U Murthy wrote:
>> Setting the smp affinity on the earliest timer should be handled automatically
>> with the CLOCK_EVT_FEAT_DYNIRQ flag. Did you look at using this flag ?
>
> How should this flag help? Not at all, because the hrtimer based
> broadcast device cannot assign affinities.
>
>> Another comment is the overall approach. We enter the cpuidle idle framework
>> with a specific state to go to and it is the tick framework telling us we
>> mustn't go to this state. IMO the logic is wrong, the decision to not enter
>> this state should be moved somewhere else.
>>
>> Why don't you create a cpuidle driver with the shallow idle states assigned to
>> a cpu (let's say cpu0) and another one with all the deeper idle states for the
>> rest of the cpus ? Using the multiple cpuidle driver support makes it
>> possible. The timer won't be moving around and a cpu will be dedicated to act
>> as the broadcast timer.
>>
>> Wouldn't make sense and be less intrusive than the patchset you proposed ?
>
> How do you arm the broadcast timer on CPU0 from CPU1? You can't!
>
> You cannot access the cpu local timer on a different cpu. So you would
> have to send an IPI over to CPU0 so that it can reevaluate and
> schedule the broadcast. That's even more backwards than telling the
> cpuidle code that the CPU is not in a state to go deep.
Indeed :)
Thanks for the clarification.
-- Daniel
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply
* Re: [PATCH V4 2/3] tick/cpuidle: Initialize hrtimer mode of broadcast
From: Daniel Lezcano @ 2014-02-11 22:05 UTC (permalink / raw)
To: Preeti U Murthy
Cc: deepthi, linux-pm, peterz, rafael.j.wysocki, linux-kernel, paulus,
srivatsa.bhat, fweisbec, tglx, paulmck, linuxppc-dev, mingo
In-Reply-To: <52FA4B2E.4070709@linux.vnet.ibm.com>
On 02/11/2014 05:09 PM, Preeti U Murthy wrote:
> Hi Daniel,
>
> Thank you very much for the review.
>
> On 02/11/2014 03:46 PM, Daniel Lezcano wrote:
>> On 02/07/2014 09:06 AM, Preeti U Murthy wrote:
>>> From: Thomas Gleixner <tglx@linutronix.de>
>>>
>>> On some architectures, in certain CPU deep idle states the local
>>> timers stop.
>>> An external clock device is used to wakeup these CPUs. The kernel
>>> support for the
>>> wakeup of these CPUs is provided by the tick broadcast framework by
>>> using the
>>> external clock device as the wakeup source.
>>>
>>> However not all implementations of architectures provide such an external
>>> clock device. This patch includes support in the broadcast framework
>>> to handle
>>> the wakeup of the CPUs in deep idle states on such systems by queuing
>>> a hrtimer
>>> on one of the CPUs, which is meant to handle the wakeup of CPUs in
>>> deep idle states.
>>>
>>> This patchset introduces a pseudo clock device which can be registered
>>> by the
>>> archs as tick_broadcast_device in the absence of a real external clock
>>> device. Once registered, the broadcast framework will work as is for
>>> these
>>> architectures as long as the archs take care of the BROADCAST_ENTER
>>> notification failing for one of the CPUs. This CPU is made the stand
>>> by CPU to
>>> handle wakeup of the CPUs in deep idle and it *must not enter deep
>>> idle states*.
>>>
>>> The CPU with the earliest wakeup is chosen to be this CPU. Hence this
>>> way the
>>> stand by CPU dynamically moves around and so does the hrtimer which is
>>> queued
>>> to trigger at the next earliest wakeup time. This is consistent with
>>> the case where
>>> an external clock device is present. The smp affinity of this clock
>>> device is
>>> set to the CPU with the earliest wakeup.
>>
>> Hi Preeti,
>>
>> jumping a bit late in the thread...
>>
>> Setting the smp affinity on the earliest timer should be handled
>> automatically with the CLOCK_EVT_FEAT_DYNIRQ flag. Did you look at using
>> this flag ?
>
> This patch is not setting the smp affinity of the pseudo clock device at
> all. Its not required to for the reason that it does not exist.
>
> I mentioned this point because we assign a CPU with the earliest wakeup
> as standby. I compared this logic to the one used by the tick broadcast
> framework for archs which have an external clock device to set the smp
> affinity of the device.
>
> If these archs do not have the flag CLOCK_EVT_FEAT_DYNIRQ set for the
> external clock device, the tick broadcast framework sets the smp
> affinity of this device to the CPU with the earliest wakeup. We are
> using the same logic in this patchset as well to assign the stand by CPU.
>
>>
>> Another comment is the overall approach. We enter the cpuidle idle
>> framework with a specific state to go to and it is the tick framework
>> telling us we mustn't go to this state. IMO the logic is wrong, the
>> decision to not enter this state should be moved somewhere else.
>
> Its not the tick framework which tells us that we cannot enter deep idle
> state, its the *tick broadcast* framework specifically. The tick
> broadcast framework was introduced with the primary intention of
> handling wakeup of CPUs in deep idle states when the local timers become
> non-functional. Therefore there is a co-operation between this tick
> broadcast framework and cpuidle. This has always been the case.
>
> That is why just before cpus go into deep idle, they call into the
> broadcast framework. Till now it was assumed that the tick broadcast
> framework would find no problems with the cpus entering deep idle.
> Therefore cpuidle would simply assume that all is well and go ahead and
> enter deep idle state.
> But today there is a scenario when there could be problems if all cpus
> enter deep idle states and the tick broadcast framework now notifies the
> cpuidle framework to hold back one cpu. This is just a simple extension
> of the current interaction between cpuidle and tick broadcast framework.
>
>>
>> Why don't you create a cpuidle driver with the shallow idle states
>> assigned to a cpu (let's say cpu0) and another one with all the deeper
>> idle states for the rest of the cpus ? Using the multiple cpuidle driver
>> support makes it possible. The timer won't be moving around and a cpu
>> will be dedicated to act as the broadcast timer.
>>
>
> Having a dedicated stand by cpu for broadcasting has some issues which
> were pointed to when I posted the initial versions of this patchset.
> https://lkml.org/lkml/2013/7/27/14
>
> 1. This could create power/thermal imbalance on the chip since only the
> standby cpu cannot enter deep idle state at all times.
>
> 2. If it is cpu0 it is fine, else with the logic that you suggest,
> hot-plugging out the dedicated stand by cpu would mean moving the work
> of broadcasting to another cpu and modifying the cpuidle state table for
> it. Even with cpu0, if support to hotplug it out is enabled (maybe it is
> already), we will face the same issue and this gets very messy.
>
>> Wouldn't make sense and be less intrusive than the patchset you proposed ?
>
> Actually this patchset brings in a solution that is as less intrusive as
> possible. It makes the problem nearly invisible except for a failed
> return from a call into the broadcast framework. It simply asks the
> archs which do not have an external clock device to register a pseudo
> device with the broadcast framework and from then on everything just
> falls in place to enable deep idle states for such archs.
Hi Preeti,
thanks for the detailed explanation. I understand better now why you did
it this way. Furthermore, as Thomas pointed me, what I missed is one cpu
can't setup a local timer for another cpu, so what I was talking about
does not make sense.
-- Daniel
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply
* Re: Linux-3.14-rc2: Order of serial node compatibles in DTS files.
From: Kumar Gala @ 2014-02-11 22:33 UTC (permalink / raw)
To: Stephen N Chivers
Cc: Chris Proctor, sebastian.hesselbarth, linuxppc-dev, devicetree
In-Reply-To: <OF6F6A0029.3B20EF5B-ONCA257C7C.0071BFDA-CA257C7C.00732AD3@csc.com>
On Feb 11, 2014, at 2:57 PM, Stephen N Chivers <schivers@csc.com.au> wrote:
> I have been trial booting a 3.14-rc2 kernel for a 85xx platform
> (dtbImage).
>
> After mounting the root filesystem there are no messages from the init
> scripts
> and the serial console is not available for login.
>
> In the kernel log messages there is:
>
> of_serial f1004500.serial: Unknown serial port found, ignored.
>
> The serial nodes in boards dts file are specified as:
>
> serial0: serial@4500 {
> cell-index = <0>;
> device_type = "serial";
> compatible = "fsl,ns16550", "ns16550";
> reg = <0x4500 0x100>;
> clock-frequency = <0>;
> interrupts = <0x2a 0x2>;
> interrupt-parent = <&mpic>;
> };
>
> Reversing the order of the compatible:
>
> compatible = "ns16550", "fsl,ns16550";
>
> restores the serial console.
>
> Linux-3.13 does not have this behaviour.
>
> There are 49 dts files in Linux-3.14-rc2 that have the fsl,ns16550
> compatible first.
Hmm,
Wondering if this caused the issue:
commit 105353145eafb3ea919f5cdeb652a9d8f270228e
Author: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Date: Tue Dec 3 14:52:00 2013 +0100
OF: base: match each node compatible against all given matches first
- k
^ permalink raw reply
* Re: Linux-3.14-rc2: Order of serial node compatibles in DTS files.
From: Sebastian Hesselbarth @ 2014-02-11 22:51 UTC (permalink / raw)
To: Kumar Gala, Stephen N Chivers
Cc: Chris Proctor, linuxppc-dev, Arnd Bergmann, devicetree
In-Reply-To: <63AEBD99-AA87-4FD7-BBDA-0CE419959F14@kernel.crashing.org>
On 02/11/2014 11:33 PM, Kumar Gala wrote:
>
> On Feb 11, 2014, at 2:57 PM, Stephen N Chivers <schivers@csc.com.au> wrote:
>
>> I have been trial booting a 3.14-rc2 kernel for a 85xx platform
>> (dtbImage).
>>
>> After mounting the root filesystem there are no messages from the init
>> scripts
>> and the serial console is not available for login.
>>
>> In the kernel log messages there is:
>>
>> of_serial f1004500.serial: Unknown serial port found, ignored.
>>
>> The serial nodes in boards dts file are specified as:
>>
>> serial0: serial@4500 {
>> cell-index = <0>;
>> device_type = "serial";
>> compatible = "fsl,ns16550", "ns16550";
>> reg = <0x4500 0x100>;
>> clock-frequency = <0>;
>> interrupts = <0x2a 0x2>;
>> interrupt-parent = <&mpic>;
>> };
>>
>> Reversing the order of the compatible:
>>
>> compatible = "ns16550", "fsl,ns16550";
>>
>> restores the serial console.
>>
>> Linux-3.13 does not have this behaviour.
>>
>> There are 49 dts files in Linux-3.14-rc2 that have the fsl,ns16550
>> compatible first.
>
> Hmm,
>
> Wondering if this caused the issue:
>
> commit 105353145eafb3ea919f5cdeb652a9d8f270228e
> Author: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> Date: Tue Dec 3 14:52:00 2013 +0100
>
> OF: base: match each node compatible against all given matches first
[adding Arnd on Cc]
Could be. I checked tty/serial/of_serial.c and it does not provide a
compatible for "fsl,ns16550". Does reverting the patch fix the issue
observed?
I don't think the missing compatible is causing it, but of_serial
provides a DT match for .type = "serial" just to fail later on
with the error seen above.
The commit in question reorders of_match_device in a way that match
table order is not relevant anymore. This can cause it to match
.type = "serial" first here.
Rather than touching the commit, I suggest to remove the problematic
.type = "serial" from the match table. It is of no use anyway.
Sebastian
^ permalink raw reply
* Re: Linux-3.14-rc2: Order of serial node compatibles in DTS files.
From: Stephen N Chivers @ 2014-02-11 23:38 UTC (permalink / raw)
To: Sebastian Hesselbarth
Cc: Chris Proctor, Arnd Bergmann, devicetree, Stephen N Chivers,
linuxppc-dev
In-Reply-To: <52FAA97F.4060600@gmail.com>
Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> wrote on
02/12/2014 09:51:43 AM:
> From: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> To: Kumar Gala <galak@kernel.crashing.org>, Stephen N Chivers
> <schivers@csc.com.au>
> Cc: linuxppc-dev@lists.ozlabs.org, Chris Proctor
> <cproctor@csc.com.au>, devicetree <devicetree@vger.kernel.org>, Arnd
> Bergmann <arnd@arndb.de>
> Date: 02/12/2014 09:51 AM
> Subject: Re: Linux-3.14-rc2: Order of serial node compatibles in DTS
files.
>
> On 02/11/2014 11:33 PM, Kumar Gala wrote:
> >
> > On Feb 11, 2014, at 2:57 PM, Stephen N Chivers <schivers@csc.com.au>
wrote:
> >
> >> I have been trial booting a 3.14-rc2 kernel for a 85xx platform
> >> (dtbImage).
> >>
> >> After mounting the root filesystem there are no messages from the
init
> >> scripts
> >> and the serial console is not available for login.
> >>
> >> In the kernel log messages there is:
> >>
> >> of_serial f1004500.serial: Unknown serial port found, ignored.
> >>
> >> The serial nodes in boards dts file are specified as:
> >>
> >> serial0: serial@4500 {
> >> cell-index = <0>;
> >> device_type = "serial";
> >> compatible = "fsl,ns16550", "ns16550";
> >> reg = <0x4500 0x100>;
> >> clock-frequency = <0>;
> >> interrupts = <0x2a 0x2>;
> >> interrupt-parent = <&mpic>;
> >> };
> >>
> >> Reversing the order of the compatible:
> >>
> >> compatible = "ns16550", "fsl,ns16550";
> >>
> >> restores the serial console.
> >>
> >> Linux-3.13 does not have this behaviour.
> >>
> >> There are 49 dts files in Linux-3.14-rc2 that have the fsl,ns16550
> >> compatible first.
> >
> > Hmm,
> >
> > Wondering if this caused the issue:
> >
> > commit 105353145eafb3ea919f5cdeb652a9d8f270228e
> > Author: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> > Date: Tue Dec 3 14:52:00 2013 +0100
> >
> > OF: base: match each node compatible against all given matches
first
>
> [adding Arnd on Cc]
>
> Could be. I checked tty/serial/of_serial.c and it does not provide a
> compatible for "fsl,ns16550". Does reverting the patch fix the issue
> observed?
>
> I don't think the missing compatible is causing it, but of_serial
> provides a DT match for .type = "serial" just to fail later on
> with the error seen above.
>
> The commit in question reorders of_match_device in a way that match
> table order is not relevant anymore. This can cause it to match
> .type = "serial" first here.
>
> Rather than touching the commit, I suggest to remove the problematic
> .type = "serial" from the match table. It is of no use anyway.
Deleting the "serial" line from the match table fixes the problem.
I tested it for both orderings of compatible.
>
> Sebastian
Thanks,
Stephen Chivers,
CSC Australia Pty. Ltd.
^ permalink raw reply
* Re: Linux-3.14-rc2: Order of serial node compatibles in DTS files.
From: Scott Wood @ 2014-02-11 23:41 UTC (permalink / raw)
To: Sebastian Hesselbarth
Cc: Chris Proctor, Arnd Bergmann, devicetree, Stephen N Chivers,
linuxppc-dev
In-Reply-To: <52FAA97F.4060600@gmail.com>
On Tue, 2014-02-11 at 23:51 +0100, Sebastian Hesselbarth wrote:
> On 02/11/2014 11:33 PM, Kumar Gala wrote:
> > Hmm,
> >
> > Wondering if this caused the issue:
> >
> > commit 105353145eafb3ea919f5cdeb652a9d8f270228e
> > Author: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> > Date: Tue Dec 3 14:52:00 2013 +0100
> >
> > OF: base: match each node compatible against all given matches first
>
> [adding Arnd on Cc]
>
> Could be. I checked tty/serial/of_serial.c and it does not provide a
> compatible for "fsl,ns16550". Does reverting the patch fix the issue
> observed?
>
> I don't think the missing compatible is causing it, but of_serial
> provides a DT match for .type = "serial" just to fail later on
> with the error seen above.
>
> The commit in question reorders of_match_device in a way that match
> table order is not relevant anymore. This can cause it to match
> .type = "serial" first here.
>
> Rather than touching the commit, I suggest to remove the problematic
> .type = "serial" from the match table. It is of no use anyway.
Regardless of whether .type = "serial" gets removed, it seems wrong for
of_match_node() to accept a .type-only match (or .name, or anything else
that doesn't involve .compatible) before it accepts a compatible match
other than the first in the compatible property.
-Scott
^ permalink raw reply
* Re: Linux-3.14-rc2: Order of serial node compatibles in DTS files.
From: Sebastian Hesselbarth @ 2014-02-11 23:43 UTC (permalink / raw)
To: Stephen N Chivers; +Cc: Chris Proctor, linuxppc-dev, Arnd Bergmann, devicetree
In-Reply-To: <OFB203CA90.B048F8AA-ONCA257C7C.0081A816-CA257C7C.0081DCE3@csc.com>
[-- Attachment #1: Type: text/plain, Size: 2043 bytes --]
On 02/12/2014 12:38 AM, Stephen N Chivers wrote:
> Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> wrote on
>> On 02/11/2014 11:33 PM, Kumar Gala wrote:
>>> On Feb 11, 2014, at 2:57 PM, Stephen N Chivers <schivers@csc.com.au> wrote:
>>>> I have been trial booting a 3.14-rc2 kernel for a 85xx platform
>>>> (dtbImage).
[...]
>>>>
>>>> of_serial f1004500.serial: Unknown serial port found, ignored.
>>>>
>>>> The serial nodes in boards dts file are specified as:
>>>>
>>>> serial0: serial@4500 {
>>>> cell-index = <0>;
>>>> device_type = "serial";
>>>> compatible = "fsl,ns16550", "ns16550";
>>>> reg = <0x4500 0x100>;
>>>> clock-frequency = <0>;
>>>> interrupts = <0x2a 0x2>;
>>>> interrupt-parent = <&mpic>;
>>>> };
>>>
>>> Wondering if this caused the issue:
>>>
>>> commit 105353145eafb3ea919f5cdeb652a9d8f270228e
>>> Author: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
>>> Date: Tue Dec 3 14:52:00 2013 +0100
>>>
>>> OF: base: match each node compatible against all given matches first
>>
[...]
>>
>> I don't think the missing compatible is causing it, but of_serial
>> provides a DT match for .type = "serial" just to fail later on
>> with the error seen above.
>>
>> The commit in question reorders of_match_device in a way that match
>> table order is not relevant anymore. This can cause it to match
>> .type = "serial" first here.
>>
>> Rather than touching the commit, I suggest to remove the problematic
>> .type = "serial" from the match table. It is of no use anyway.
> Deleting the "serial" line from the match table fixes the problem.
> I tested it for both orderings of compatible.
I revert my statement about removing anything from of_serial.c. Instead
we should try to prefer matches with compatibles over type/name without
compatibles. Something like the patch below (compile tested only)
[-- Attachment #2: of_base_match.patch --]
[-- Type: text/x-patch, Size: 1484 bytes --]
diff --git a/drivers/of/base.c b/drivers/of/base.c
index ff85450d5683..60da53b385ff 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -734,6 +734,7 @@ static
const struct of_device_id *__of_match_node(const struct of_device_id *matches,
const struct device_node *node)
{
+ const struct of_device_id *m;
const char *cp;
int cplen, l;
@@ -742,15 +743,15 @@ const struct of_device_id *__of_match_node(const struct of_device_id *matches,
cp = __of_get_property(node, "compatible", &cplen);
do {
- const struct of_device_id *m = matches;
+ m = matches;
/* Check against matches with current compatible string */
while (m->name[0] || m->type[0] || m->compatible[0]) {
int match = 1;
- if (m->name[0])
+ if (m->name[0] && m->compatible[0])
match &= node->name
&& !strcmp(m->name, node->name);
- if (m->type[0])
+ if (m->type[0] && m->compatible[0])
match &= node->type
&& !strcmp(m->type, node->type);
if (m->compatible[0])
@@ -770,6 +771,21 @@ const struct of_device_id *__of_match_node(const struct of_device_id *matches,
}
} while (cp && (cplen > 0));
+ /* Check against matches without compatible string */
+ m = matches;
+ while (m->name[0] || m->type[0]) {
+ int match = 1;
+ if (m->name[0])
+ match &= node->name
+ && !strcmp(m->name, node->name);
+ if (m->type[0])
+ match &= node->type
+ && !strcmp(m->type, node->type);
+ if (match)
+ return m;
+ m++;
+ }
+
return NULL;
}
^ permalink raw reply related
* Re: Linux-3.14-rc2: Order of serial node compatibles in DTS files.
From: Sebastian Hesselbarth @ 2014-02-11 23:46 UTC (permalink / raw)
To: Scott Wood
Cc: Chris Proctor, Arnd Bergmann, devicetree, Stephen N Chivers,
linuxppc-dev
In-Reply-To: <1392162080.6733.404.camel@snotra.buserror.net>
On 02/12/2014 12:41 AM, Scott Wood wrote:
> On Tue, 2014-02-11 at 23:51 +0100, Sebastian Hesselbarth wrote:
>> On 02/11/2014 11:33 PM, Kumar Gala wrote:
>>> Hmm,
>>>
>>> Wondering if this caused the issue:
>>>
>>> commit 105353145eafb3ea919f5cdeb652a9d8f270228e
>>> Author: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
>>> Date: Tue Dec 3 14:52:00 2013 +0100
>>>
>>> OF: base: match each node compatible against all given matches first
>>
>> [adding Arnd on Cc]
>>
>> Could be. I checked tty/serial/of_serial.c and it does not provide a
>> compatible for "fsl,ns16550". Does reverting the patch fix the issue
>> observed?
>>
>> I don't think the missing compatible is causing it, but of_serial
>> provides a DT match for .type = "serial" just to fail later on
>> with the error seen above.
>>
>> The commit in question reorders of_match_device in a way that match
>> table order is not relevant anymore. This can cause it to match
>> .type = "serial" first here.
>>
>> Rather than touching the commit, I suggest to remove the problematic
>> .type = "serial" from the match table. It is of no use anyway.
>
> Regardless of whether .type = "serial" gets removed, it seems wrong for
> of_match_node() to accept a .type-only match (or .name, or anything else
> that doesn't involve .compatible) before it accepts a compatible match
> other than the first in the compatible property.
Right, I thought about it and came to the same conclusion. I sent a
patch a second ago to prefer .compatible != NULL matches over those
with .compatible == NULL.
Would be great if Stephen can re-test that. If it solves the issue, I
can send a patch tomorrow.
Sebastian
^ permalink raw reply
* Re: Linux-3.14-rc2: Order of serial node compatibles in DTS files.
From: Stephen N Chivers @ 2014-02-12 0:21 UTC (permalink / raw)
To: Sebastian Hesselbarth
Cc: Chris Proctor, Arnd Bergmann, devicetree, Stephen N Chivers,
Scott Wood, linuxppc-dev
In-Reply-To: <52FAB65C.4090201@gmail.com>
Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> wrote on
02/12/2014 10:46:36 AM:
> From: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> To: Scott Wood <scottwood@freescale.com>
> Cc: Kumar Gala <galak@kernel.crashing.org>, Stephen N Chivers
> <schivers@csc.com.au>, Chris Proctor <cproctor@csc.com.au>,
> linuxppc-dev@lists.ozlabs.org, Arnd Bergmann <arnd@arndb.de>,
> devicetree <devicetree@vger.kernel.org>
> Date: 02/12/2014 11:04 AM
> Subject: Re: Linux-3.14-rc2: Order of serial node compatibles in DTS
files.
>
> On 02/12/2014 12:41 AM, Scott Wood wrote:
> > On Tue, 2014-02-11 at 23:51 +0100, Sebastian Hesselbarth wrote:
> >> On 02/11/2014 11:33 PM, Kumar Gala wrote:
> >>> Hmm,
> >>>
> >>> Wondering if this caused the issue:
> >>>
> >>> commit 105353145eafb3ea919f5cdeb652a9d8f270228e
> >>> Author: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> >>> Date: Tue Dec 3 14:52:00 2013 +0100
> >>>
> >>> OF: base: match each node compatible against all given matches
first
> >>
> >> [adding Arnd on Cc]
> >>
> >> Could be. I checked tty/serial/of_serial.c and it does not provide a
> >> compatible for "fsl,ns16550". Does reverting the patch fix the issue
> >> observed?
> >>
> >> I don't think the missing compatible is causing it, but of_serial
> >> provides a DT match for .type = "serial" just to fail later on
> >> with the error seen above.
> >>
> >> The commit in question reorders of_match_device in a way that match
> >> table order is not relevant anymore. This can cause it to match
> >> .type = "serial" first here.
> >>
> >> Rather than touching the commit, I suggest to remove the problematic
> >> .type = "serial" from the match table. It is of no use anyway.
> >
> > Regardless of whether .type = "serial" gets removed, it seems wrong
for
> > of_match_node() to accept a .type-only match (or .name, or anything
else
> > that doesn't involve .compatible) before it accepts a compatible match
> > other than the first in the compatible property.
>
> Right, I thought about it and came to the same conclusion. I sent a
> patch a second ago to prefer .compatible != NULL matches over those
> with .compatible == NULL.
>
> Would be great if Stephen can re-test that. If it solves the issue, I
> can send a patch tomorrow.
Done.
But, the Interrupt Controller (MPIC)
goes AWOL and it is down hill from there.
The MPIC is specified in the DTS as:
mpic: pic@40000 {
interrupt-controller;
#address-cells = <0>;
#interrupt-cells = <2>;
reg = <0x40000 0x40000>;
compatible = "chrp,open-pic";
device_type = "open-pic";
big-endian;
};
The board support file has the standard mechanism for allocating
the PIC:
struct mpic *mpic;
mpic = mpic_alloc(NULL, 0, 0, 0, 256, " OpenPIC ");
BUG_ON(mpic == NULL);
mpic_init(mpic);
I checked for damage in applying the patch and it has applied
correctly.
Stephen Chivers,
CSC Australia Pty. Ltd.
^ permalink raw reply
* Re: [PATCH V2] powerpc: thp: Fix crash on mremap
From: Aneesh Kumar K.V @ 2014-02-12 2:52 UTC (permalink / raw)
To: Greg KH, Kirill A. Shutemov; +Cc: paulus, linuxppc-dev, stable
In-Reply-To: <20140211173129.GB30336@kroah.com>
Greg KH <gregkh@linuxfoundation.org> writes:
> On Fri, Feb 07, 2014 at 07:21:57PM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>
>> This patch fix the below crash
>>
>> NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
>> LR [c0000000000439ac] .hash_page+0x18c/0x5e0
>> ...
>> Call Trace:
>> [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
>> [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
>> [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
>>
>> On ppc64 we use the pgtable for storing the hpte slot information and
>> store address to the pgtable at a constant offset (PTRS_PER_PMD) from
>> pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
>> the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
>> from new pmd.
>>
>> We also want to move the withdraw and deposit before the set_pmd so
>> that, when page fault find the pmd as trans huge we can be sure that
>> pgtable can be located at the offset.
>>
>> variant of upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
>> for 3.12 stable series
>
> This doesn't look like a "variant", it looks totally different. Why
> can't I just take the b3084f4db3aeb991c507ca774337c7e7893ed04f patch
> (and follow-on fix) for 3.12?
Because the code in that function changed in 3.13. Kirill added split
ptl locks for huge pte, and we decide whether to withdraw and
deposit again based on the ptl locks in 3.13. In 3.12 we do that only
for ppc64 using #ifdef
>
> I _REALLY_ dislike patches that are totally different from Linus's tree
> in stable trees, it has caused nothing but problems in the past.
>
-aneesh
^ permalink raw reply
* Re: [PATCH V2] powerpc: thp: Fix crash on mremap
From: Aneesh Kumar K.V @ 2014-02-12 2:54 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Greg KH; +Cc: paulus, linuxppc-dev, stable
In-Reply-To: <1392144772.23418.11.camel@pasglop>
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> On Tue, 2014-02-11 at 09:31 -0800, Greg KH wrote:
>> On Fri, Feb 07, 2014 at 07:21:57PM +0530, Aneesh Kumar K.V wrote:
>> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >
>> > This patch fix the below crash
>> >
>> > NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
>> > LR [c0000000000439ac] .hash_page+0x18c/0x5e0
>> > ...
>> > Call Trace:
>> > [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
>> > [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
>> > [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
>> >
>> > On ppc64 we use the pgtable for storing the hpte slot information and
>> > store address to the pgtable at a constant offset (PTRS_PER_PMD) from
>> > pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
>> > the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
>> > from new pmd.
>> >
>> > We also want to move the withdraw and deposit before the set_pmd so
>> > that, when page fault find the pmd as trans huge we can be sure that
>> > pgtable can be located at the offset.
>> >
>> > variant of upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
>> > for 3.12 stable series
>>
>> This doesn't look like a "variant", it looks totally different. Why
>> can't I just take the b3084f4db3aeb991c507ca774337c7e7893ed04f patch
>> (and follow-on fix) for 3.12?
>>
>> I _REALLY_ dislike patches that are totally different from Linus's tree
>> in stable trees, it has caused nothing but problems in the past.
>
> I don't think it applies... (I tried on an internal tree) but the
> affected function changed in 3.13 in various ways. Aneesh, please
> provide a more details explanation and whether we should backport those
> other changes too or whether this is not necessary
Yes the affected function added support for split ptl locks for huge
pte. I don't think that is a stable material.
.
>
> BTW. Aneesh, we need a 3.11.x one too
>
3.11.x it is already applied.
-aneesh
^ permalink raw reply
* [PATCH V2 1/3] powerpc: mm: Add new set flag argument to pte/pmd update function
From: Aneesh Kumar K.V @ 2014-02-12 3:43 UTC (permalink / raw)
To: benh, paulus, riel, mgorman, mpe; +Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1392176618-23667-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
We will use this later to set the _PAGE_NUMA bit.
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/hugetlb.h | 2 +-
arch/powerpc/include/asm/pgtable-ppc64.h | 26 +++++++++++++++-----------
arch/powerpc/mm/pgtable_64.c | 12 +++++++-----
arch/powerpc/mm/subpage-prot.c | 2 +-
4 files changed, 24 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index d750336b171d..623f2971ce0e 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -127,7 +127,7 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
#ifdef CONFIG_PPC64
- return __pte(pte_update(mm, addr, ptep, ~0UL, 1));
+ return __pte(pte_update(mm, addr, ptep, ~0UL, 0, 1));
#else
return __pte(pte_update(ptep, ~0UL, 0));
#endif
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index bc141c950b1e..eb9261024f51 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -195,6 +195,7 @@ extern void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
static inline unsigned long pte_update(struct mm_struct *mm,
unsigned long addr,
pte_t *ptep, unsigned long clr,
+ unsigned long set,
int huge)
{
#ifdef PTE_ATOMIC_UPDATES
@@ -205,14 +206,15 @@ static inline unsigned long pte_update(struct mm_struct *mm,
andi. %1,%0,%6\n\
bne- 1b \n\
andc %1,%0,%4 \n\
+ or %1,%1,%7\n\
stdcx. %1,0,%3 \n\
bne- 1b"
: "=&r" (old), "=&r" (tmp), "=m" (*ptep)
- : "r" (ptep), "r" (clr), "m" (*ptep), "i" (_PAGE_BUSY)
+ : "r" (ptep), "r" (clr), "m" (*ptep), "i" (_PAGE_BUSY), "r" (set)
: "cc" );
#else
unsigned long old = pte_val(*ptep);
- *ptep = __pte(old & ~clr);
+ *ptep = __pte((old & ~clr) | set);
#endif
/* huge pages use the old page table lock */
if (!huge)
@@ -231,9 +233,9 @@ static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
{
unsigned long old;
- if ((pte_val(*ptep) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
+ if ((pte_val(*ptep) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
return 0;
- old = pte_update(mm, addr, ptep, _PAGE_ACCESSED, 0);
+ old = pte_update(mm, addr, ptep, _PAGE_ACCESSED, 0, 0);
return (old & _PAGE_ACCESSED) != 0;
}
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
@@ -252,7 +254,7 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
if ((pte_val(*ptep) & _PAGE_RW) == 0)
return;
- pte_update(mm, addr, ptep, _PAGE_RW, 0);
+ pte_update(mm, addr, ptep, _PAGE_RW, 0, 0);
}
static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
@@ -261,7 +263,7 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
if ((pte_val(*ptep) & _PAGE_RW) == 0)
return;
- pte_update(mm, addr, ptep, _PAGE_RW, 1);
+ pte_update(mm, addr, ptep, _PAGE_RW, 0, 1);
}
/*
@@ -284,14 +286,14 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
- unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0);
+ unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
return __pte(old);
}
static inline void pte_clear(struct mm_struct *mm, unsigned long addr,
pte_t * ptep)
{
- pte_update(mm, addr, ptep, ~0UL, 0);
+ pte_update(mm, addr, ptep, ~0UL, 0, 0);
}
@@ -506,7 +508,9 @@ extern int pmdp_set_access_flags(struct vm_area_struct *vma,
extern unsigned long pmd_hugepage_update(struct mm_struct *mm,
unsigned long addr,
- pmd_t *pmdp, unsigned long clr);
+ pmd_t *pmdp,
+ unsigned long clr,
+ unsigned long set);
static inline int __pmdp_test_and_clear_young(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
@@ -515,7 +519,7 @@ static inline int __pmdp_test_and_clear_young(struct mm_struct *mm,
if ((pmd_val(*pmdp) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
return 0;
- old = pmd_hugepage_update(mm, addr, pmdp, _PAGE_ACCESSED);
+ old = pmd_hugepage_update(mm, addr, pmdp, _PAGE_ACCESSED, 0);
return ((old & _PAGE_ACCESSED) != 0);
}
@@ -542,7 +546,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr,
if ((pmd_val(*pmdp) & _PAGE_RW) == 0)
return;
- pmd_hugepage_update(mm, addr, pmdp, _PAGE_RW);
+ pmd_hugepage_update(mm, addr, pmdp, _PAGE_RW, 0);
}
#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 65b7b65e8708..62bf5e8e78da 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -510,7 +510,8 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
}
unsigned long pmd_hugepage_update(struct mm_struct *mm, unsigned long addr,
- pmd_t *pmdp, unsigned long clr)
+ pmd_t *pmdp, unsigned long clr,
+ unsigned long set)
{
unsigned long old, tmp;
@@ -526,14 +527,15 @@ unsigned long pmd_hugepage_update(struct mm_struct *mm, unsigned long addr,
andi. %1,%0,%6\n\
bne- 1b \n\
andc %1,%0,%4 \n\
+ or %1,%1,%7\n\
stdcx. %1,0,%3 \n\
bne- 1b"
: "=&r" (old), "=&r" (tmp), "=m" (*pmdp)
- : "r" (pmdp), "r" (clr), "m" (*pmdp), "i" (_PAGE_BUSY)
+ : "r" (pmdp), "r" (clr), "m" (*pmdp), "i" (_PAGE_BUSY), "r" (set)
: "cc" );
#else
old = pmd_val(*pmdp);
- *pmdp = __pmd(old & ~clr);
+ *pmdp = __pmd((old & ~clr) | set);
#endif
if (old & _PAGE_HASHPTE)
hpte_do_hugepage_flush(mm, addr, pmdp);
@@ -708,7 +710,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
{
- pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT);
+ pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
}
/*
@@ -835,7 +837,7 @@ pmd_t pmdp_get_and_clear(struct mm_struct *mm,
unsigned long old;
pgtable_t *pgtable_slot;
- old = pmd_hugepage_update(mm, addr, pmdp, ~0UL);
+ old = pmd_hugepage_update(mm, addr, pmdp, ~0UL, 0);
old_pmd = __pmd(old);
/*
* We have pmd == none and we are holding page_table_lock.
diff --git a/arch/powerpc/mm/subpage-prot.c b/arch/powerpc/mm/subpage-prot.c
index a770df2dae70..6c0b1f5f8d2c 100644
--- a/arch/powerpc/mm/subpage-prot.c
+++ b/arch/powerpc/mm/subpage-prot.c
@@ -78,7 +78,7 @@ static void hpte_flush_range(struct mm_struct *mm, unsigned long addr,
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
arch_enter_lazy_mmu_mode();
for (; npages > 0; --npages) {
- pte_update(mm, addr, pte, 0, 0);
+ pte_update(mm, addr, pte, 0, 0, 0);
addr += PAGE_SIZE;
++pte;
}
--
1.8.3.2
^ permalink raw reply related
* [PATCH V2 0/3] powerpc: Fix random application crashes with NUMA_BALANCING enabled
From: Aneesh Kumar K.V @ 2014-02-12 3:43 UTC (permalink / raw)
To: benh, paulus, riel, mgorman, mpe; +Cc: linux-mm, linuxppc-dev
Hello,
This patch series fix random application crashes observed on ppc64 with numa
balancing enabled. Without the patch we see crashes like
anacron[14551]: unhandled signal 11 at 0000000000000041 nip 000000003cfd54b4 lr 000000003cfd5464 code 30001
anacron[14599]: unhandled signal 11 at 0000000000000041 nip 000000003efc54b4 lr 000000003efc5464 code 30001
Changes from V1:
* Build fix for CONFIG_NUMA_BALANCING disabled
-aneesh
^ permalink raw reply
* [PATCH V2 2/3] mm: dirty accountable change only apply to non prot numa case
From: Aneesh Kumar K.V @ 2014-02-12 3:43 UTC (permalink / raw)
To: benh, paulus, riel, mgorman, mpe; +Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1392176618-23667-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
So move it within the if loop
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
mm/mprotect.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 7332c1785744..33eab902f10e 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -58,6 +58,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
if (pte_numa(ptent))
ptent = pte_mknonnuma(ptent);
ptent = pte_modify(ptent, newprot);
+ /*
+ * Avoid taking write faults for pages we
+ * know to be dirty.
+ */
+ if (dirty_accountable && pte_dirty(ptent))
+ ptent = pte_mkwrite(ptent);
+ ptep_modify_prot_commit(mm, addr, pte, ptent);
updated = true;
} else {
struct page *page;
@@ -72,22 +79,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
}
}
}
-
- /*
- * Avoid taking write faults for pages we know to be
- * dirty.
- */
- if (dirty_accountable && pte_dirty(ptent)) {
- ptent = pte_mkwrite(ptent);
- updated = true;
- }
-
if (updated)
pages++;
-
- /* Only !prot_numa always clears the pte */
- if (!prot_numa)
- ptep_modify_prot_commit(mm, addr, pte, ptent);
} else if (IS_ENABLED(CONFIG_MIGRATION) && !pte_file(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);
--
1.8.3.2
^ permalink raw reply related
* [PATCH V2 3/3] mm: Use ptep/pmdp_set_numa for updating _PAGE_NUMA bit
From: Aneesh Kumar K.V @ 2014-02-12 3:43 UTC (permalink / raw)
To: benh, paulus, riel, mgorman, mpe; +Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1392176618-23667-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions. ppc64 also doesn't implement
flush_tlb_range. ppc64 require the tlb flushing to be batched within ptl locks. The reason
to do that is to ensure that the hash page table is in sync with linux page table.
We track the hpte index in linux pte and if we clear them without flushing hash and drop the
ptl lock, we can have another cpu update the pte and can end up with double hash. We also want
to keep set_pte_at simpler by not requiring them to do hash flush for performance reason.
Hence cannot use them while updating _PAGE_NUMA bit. Add new functions for marking pte/pmd numa
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
Changes from V1:
* Build fix for non numa balancing config
arch/powerpc/include/asm/pgtable.h | 22 +++++++++++++++++++++
include/asm-generic/pgtable.h | 39 ++++++++++++++++++++++++++++++++++++++
mm/huge_memory.c | 9 ++-------
mm/mprotect.c | 4 +---
4 files changed, 64 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index f83b6f3e1b39..3ebb188c3ff5 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -75,12 +75,34 @@ static inline pte_t pte_mknuma(pte_t pte)
return pte;
}
+#define ptep_set_numa ptep_set_numa
+static inline void ptep_set_numa(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep)
+{
+ if ((pte_val(*ptep) & _PAGE_PRESENT) == 0)
+ VM_BUG_ON(1);
+
+ pte_update(mm, addr, ptep, _PAGE_PRESENT, _PAGE_NUMA, 0);
+ return;
+}
+
#define pmd_numa pmd_numa
static inline int pmd_numa(pmd_t pmd)
{
return pte_numa(pmd_pte(pmd));
}
+#define pmdp_set_numa pmdp_set_numa
+static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp)
+{
+ if ((pmd_val(*pmdp) & _PAGE_PRESENT) == 0)
+ VM_BUG_ON(1);
+
+ pmd_hugepage_update(mm, addr, pmdp, _PAGE_PRESENT, _PAGE_NUMA);
+ return;
+}
+
#define pmd_mknonnuma pmd_mknonnuma
static inline pmd_t pmd_mknonnuma(pmd_t pmd)
{
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 8e4f41d9af4d..34c7bdc06014 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -701,6 +701,18 @@ static inline pte_t pte_mknuma(pte_t pte)
}
#endif
+#ifndef ptep_set_numa
+static inline void ptep_set_numa(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep)
+{
+ pte_t ptent = *ptep;
+
+ ptent = pte_mknuma(ptent);
+ set_pte_at(mm, addr, ptep, ptent);
+ return;
+}
+#endif
+
#ifndef pmd_mknuma
static inline pmd_t pmd_mknuma(pmd_t pmd)
{
@@ -708,6 +720,18 @@ static inline pmd_t pmd_mknuma(pmd_t pmd)
return pmd_clear_flags(pmd, _PAGE_PRESENT);
}
#endif
+
+#ifndef pmdp_set_numa
+static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp)
+{
+ pmd_t pmd = *pmdp;
+
+ pmd = pmd_mknuma(pmd);
+ set_pmd_at(mm, addr, pmdp, pmd);
+ return;
+}
+#endif
#else
extern int pte_numa(pte_t pte);
extern int pmd_numa(pmd_t pmd);
@@ -715,6 +739,8 @@ extern pte_t pte_mknonnuma(pte_t pte);
extern pmd_t pmd_mknonnuma(pmd_t pmd);
extern pte_t pte_mknuma(pte_t pte);
extern pmd_t pmd_mknuma(pmd_t pmd);
+extern void ptep_set_numa(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
+extern void pmdp_set_numa(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp);
#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */
#else
static inline int pmd_numa(pmd_t pmd)
@@ -742,10 +768,23 @@ static inline pte_t pte_mknuma(pte_t pte)
return pte;
}
+static inline void ptep_set_numa(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep)
+{
+ return;
+}
+
+
static inline pmd_t pmd_mknuma(pmd_t pmd)
{
return pmd;
}
+
+static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp)
+{
+ return ;
+}
#endif /* CONFIG_NUMA_BALANCING */
#endif /* CONFIG_MMU */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 82166bf974e1..da23eb96779f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1545,6 +1545,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
entry = pmd_mknonnuma(entry);
entry = pmd_modify(entry, newprot);
ret = HPAGE_PMD_NR;
+ set_pmd_at(mm, addr, pmd, entry);
BUG_ON(pmd_write(entry));
} else {
struct page *page = pmd_page(*pmd);
@@ -1557,16 +1558,10 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
*/
if (!is_huge_zero_page(page) &&
!pmd_numa(*pmd)) {
- entry = *pmd;
- entry = pmd_mknuma(entry);
+ pmdp_set_numa(mm, addr, pmd);
ret = HPAGE_PMD_NR;
}
}
-
- /* Set PMD if cleared earlier */
- if (ret == HPAGE_PMD_NR)
- set_pmd_at(mm, addr, pmd, entry);
-
spin_unlock(ptl);
}
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 33eab902f10e..769a67a15803 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -69,12 +69,10 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
} else {
struct page *page;
- ptent = *pte;
page = vm_normal_page(vma, addr, oldpte);
if (page && !PageKsm(page)) {
if (!pte_numa(oldpte)) {
- ptent = pte_mknuma(ptent);
- set_pte_at(mm, addr, pte, ptent);
+ ptep_set_numa(mm, addr, pte);
updated = true;
}
}
--
1.8.3.2
^ permalink raw reply related
* [git pull] Please pull powerpc.git merge branch
From: Benjamin Herrenschmidt @ 2014-02-12 4:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linuxppc-dev, Linux Kernel list
Hi Linus !
Here is some powerpc goodness for -rc2. Arguably -rc1 material more than
-rc2 but I was travelling (again !)
It's mostly bug fixes including regressions, but there are a couple of
new things that I decided to drop-in.
One is a straightforward patch from Michael to add a bunch of P8 cache
events to perf.
The other one is a patch by myself to add the direct DMA (iommu bypass)
for PCIe on Power8 for 64-bit capable devices. This has been around for
a while, I had lost track of it. However it's been in our internal
kernels we use for testing P8 already and it affects only P8 related
code. Since P8 is still unreleased the risk is pretty much nil at this
point.
Cheers,
Ben.
The following changes since commit b28a960c42fcd9cfc987441fa6d1c1a471f0f9ed:
Linux 3.14-rc2 (2014-02-09 18:15:47 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge
for you to fetch changes up to cd15b048445d0a54f7147c35a86c5a16ef231554:
powerpc/powernv: Add iommu DMA bypass support for IODA2 (2014-02-11 16:07:37 +1100)
----------------------------------------------------------------
Anshuman Khandual (1):
powerpc/perf: Configure BHRB filter before enabling PMU interrupts
Anton Blanchard (1):
powerpc: Fix endian issues in kexec and crash dump code
Benjamin Herrenschmidt (1):
powerpc/powernv: Add iommu DMA bypass support for IODA2
Kevin Hao (1):
powerpc/ppc32: Fix the bug in the init of non-base exception stack for UP
Laurent Dufour (1):
powerpc/relocate fix relocate processing in LE mode
Mahesh Salgaonkar (2):
powerpc/pseries: Disable relocation on exception while going down during crash.
powerpc: Fix kdump hang issue on p8 with relocation on exception enabled.
Michael Ellerman (5):
powerpc/perf: Add Power8 cache & TLB events
powerpc/pseries: Select ARCH_RANDOM on pseries
powerpc/xmon: Don't loop forever in get_output_lock()
powerpc/xmon: Fix timeout loop in get_output_lock()
powerpc/xmon: Don't signal we've entered until we're finished printing
Nathan Fontenot (1):
crypto/nx/nx-842: Fix handling of vmalloc addresses
Paul Gortmaker (1):
powerpc: Fix build failure in sysdev/mpic.c for MPIC_WEIRD=y
Thadeu Lima de Souza Cascardo (1):
powerpc/eeh: Drop taken reference to driver on eeh_rmv_device
arch/powerpc/include/asm/dma-mapping.h | 1 +
arch/powerpc/include/asm/iommu.h | 1 +
arch/powerpc/include/asm/sections.h | 12 +++
arch/powerpc/kernel/dma.c | 10 ++-
arch/powerpc/kernel/eeh_driver.c | 8 +-
arch/powerpc/kernel/iommu.c | 12 +++
arch/powerpc/kernel/irq.c | 5 ++
arch/powerpc/kernel/machine_kexec.c | 14 ++-
arch/powerpc/kernel/machine_kexec_64.c | 6 +-
arch/powerpc/kernel/reloc_64.S | 4 +-
arch/powerpc/kernel/setup_32.c | 5 ++
arch/powerpc/mm/hash_utils_64.c | 14 +++
arch/powerpc/perf/core-book3s.c | 5 +-
arch/powerpc/perf/power8-pmu.c | 144 ++++++++++++++++++++++++++++++
arch/powerpc/platforms/powernv/pci-ioda.c | 84 +++++++++++++++++
arch/powerpc/platforms/powernv/pci.c | 10 +++
arch/powerpc/platforms/powernv/pci.h | 6 +-
arch/powerpc/platforms/powernv/powernv.h | 8 ++
arch/powerpc/platforms/powernv/setup.c | 9 ++
arch/powerpc/platforms/pseries/Kconfig | 1 +
arch/powerpc/platforms/pseries/setup.c | 3 +-
arch/powerpc/sysdev/mpic.c | 38 ++++----
arch/powerpc/xmon/xmon.c | 24 +++--
drivers/crypto/nx/nx-842.c | 29 +++---
24 files changed, 398 insertions(+), 55 deletions(-)
^ permalink raw reply
* Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error
From: Stephen Rothwell @ 2014-02-12 5:22 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Mahesh J Salgaonkar, linux-next, Paul Mackerras, Linux Kernel,
linuxppc-dev
In-Reply-To: <1386631570.32037.40.camel@pasglop>
[-- Attachment #1: Type: text/plain, Size: 853 bytes --]
Hi all,
On Tue, 10 Dec 2013 10:26:10 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> On Tue, 2013-12-10 at 10:10 +1100, Stephen Rothwell wrote:
> > Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> > Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
> >
> > Works for me. Thanks. I will add this to linux-next today if Ben
> > doesn't add it to his tree.
>
> I will but probably not soon enough for your cut today
As noted elsewhere, this did not completely fix the problem and I have
been still getting this error from my allyesconfig builds for some time:
arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:1312: Error: attempt to move .org backwards
Could someone please fix this?
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox