LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
From: Ricardo Neri @ 2018-06-15  2:21 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu
In-Reply-To: <20180614123250.230667f6@roar.ozlabs.ibm.com>

On Thu, Jun 14, 2018 at 12:32:50PM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:31:17 -0700
> Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> 
> > On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> > > On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> > > Thomas Gleixner <tglx@linutronix.de> wrote:
> > >   
> > > > On Wed, 13 Jun 2018, Peter Zijlstra wrote:  
> > > > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:    
> > > > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > > > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > > > > >     
> > > > > > > Instead of exposing individual functions for the operations of the NMI
> > > > > > > watchdog, define a common interface that can be used across multiple
> > > > > > > implementations.
> > > > > > > 
> > > > > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > > > operations.
> > > > > > > 
> > > > > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > > > variable is set to point the operations of the first NMI watchdog that
> > > > > > > initializes successfully. Even though at this moment, the only available
> > > > > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > > > > can be added in the future.    
> > > > > > 
> > > > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > > > least have their own NMI watchdogs, it would be good to have those
> > > > > > converted as well.    
> > > > > 
> > > > > Yeah, agreed, this looks like half a patch.    
> > > > 
> > > > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > > > low level architecture details so having yet another 'ops' data structure
> > > > with a gazillion of callbacks, checks and indirections does not provide
> > > > value over the currently available weak stubs.  
> > > 
> > > The other way to go of course is librify the perf watchdog and make an
> > > x86 watchdog that selects between perf and hpet... I also probably
> > > prefer that for code such as this, but I wouldn't strongly object to
> > > ops struct if I'm not writing the code. It's not that bad is it?  
> > 
> > My motivation to add the ops was that the hpet and perf watchdog share
> > significant portions of code.
> 
> Right, a good motivation.
> 
> > I could look into creating the library for
> > common code and relocate the hpet watchdog into arch/x86 for the hpet-
> > specific parts.
> 
> If you can investigate that approach, that would be appreciated. I hope
> I did not misunderstand you there, Thomas.
> 
> Basically you would have perf infrastructure and hpet infrastructure,
> and then the x86 watchdog driver will use one or the other of those. The
> generic watchdog driver will be just a simple shim that uses the perf
> infrastructure. Then hopefully the powerpc driver would require almost
> no change.

Sure, I will try to structure code to minimize the changes to the powerpc
watchdog... without breaking the sparc one.

Thanks and BR,
Ricardo

^ permalink raw reply

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
From: Ricardo Neri @ 2018-06-15  2:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu
In-Reply-To: <alpine.DEB.2.21.1806131140560.2280@nanos.tec.linutronix.de>

On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > +	/* There are no CPUs to monitor. */
> > +	if (!cpumask_weight(&hdata->monitored_mask))
> > +		return NMI_HANDLED;
> > +
> >  	inspect_for_hardlockups(regs);
> >  
> > +	/*
> > +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> > +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> > +	 * respectively. Thus, the interrupt should be able to be moved to
> > +	 * the next monitored CPU.
> > +	 */
> > +	spin_lock(&hld_data->lock);
> 
> Yuck. Taking a spinlock from NMI ...

I am sorry. I will look into other options for locking. Do you think rcu_lock
would help in this case? I need this locking because the CPUs being monitored
changes as CPUs come online and offline.

> 
> > +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> > +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > +			break;
> 
> ... and then calling into generic interrupt code which will take even more
> locks is completely broken.


I will into reworking how the destination of the interrupt is set.

Thanks and BR,

Ricardo

^ permalink raw reply

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
From: Ricardo Neri @ 2018-06-15  2:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, Julien Thierry, Peter Zijlstra, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton,
	Levin, Alexander (Sasha Levin), Randy Dunlap, Masami Hiramatsu,
	Bartosz Golaszewski, Doug Berger, Palmer Dabbelt, iommu
In-Reply-To: <c000699c-7859-74ec-7944-ec484e887f60@arm.com>

On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
> On 13/06/18 10:20, Thomas Gleixner wrote:
> > On Wed, 13 Jun 2018, Julien Thierry wrote:
> >> On 13/06/18 09:34, Peter Zijlstra wrote:
> >>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> >>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> >>>> index 5426627..dbc5e02 100644
> >>>> --- a/include/linux/interrupt.h
> >>>> +++ b/include/linux/interrupt.h
> >>>> @@ -61,6 +61,8 @@
> >>>>    *                interrupt handler after suspending interrupts. For
> >>>> system
> >>>>    *                wakeup devices users need to implement wakeup
> >>>> detection in
> >>>>    *                their interrupt handlers.
> >>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> >>>> non-maskable, if
> >>>> + *                supported by the chip.
> >>>>    */
> >>>
> >>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
> >>> NMIs to this level.
> >>>
> >>
> >> I've been working on something similar on arm64 side, and effectively the one
> >> thing that might be common to arm64 and intel is the interface to set an
> >> interrupt as NMI. So I guess it would be nice to agree on the right approach
> >> for this.
> >>
> >> The way I did it was by introducing a new irq_state and let the irqchip driver
> >> handle most of the work (if it supports that state):
> >>
> >> https://lkml.org/lkml/2018/5/25/181
> >>
> >> This has not been ACKed nor NAKed. So I am just asking whether this is a more
> >> suitable approach, and if not, is there any suggestions on how to do this?
> > 
> > I really didn't pay attention to that as it's burried in the GIC/ARM series
> > which is usually Marc's playground.
> 
> I'm working my way through it ATM now that I have some brain cycles back.
> 
> > Adding NMI delivery support at low level architecture irq chip level is
> > perfectly fine, but the exposure of that needs to be restricted very
> > much. Adding it to the generic interrupt control interfaces is not going to
> > happen. That's doomed to begin with and a complete abuse of the interface
> > as the handler can not ever be used for that.
> 
> I can only agree with that. Allowing random driver to use request_irq()
> to make anything an NMI ultimately turns it into a complete mess ("hey,
> NMI is *faster*, let's use that"), and a potential source of horrible
> deadlocks.
> 
> What I'd find more palatable is a way for an irqchip to be able to
> prioritize some interrupts based on a set of architecturally-defined
> requirements, and a separate NMI requesting/handling framework that is
> separate from the IRQ API, as the overall requirements are likely to
> completely different.
> 
> It shouldn't have to be nearly as complex as the IRQ API, and require
> much stricter requirements in terms of what you can do there (flow
> handling should definitely be different).

Marc, Julien, do you plan to actively work on this? Would you mind keeping
me in the loop? I also need this work for this watchdog. In the meantime,
I will go through Julien's patches and try to adapt it to my work.

Thanks and BR,
Ricardo

^ permalink raw reply

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
From: Ricardo Neri @ 2018-06-15  2:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Rafael J. Wysocki, Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu
In-Reply-To: <20180613090720.GV12258@hirez.programming.kicks-ass.net>

On Wed, Jun 13, 2018 at 11:07:20AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:37PM -0700, Ricardo Neri wrote:
> 
> +static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
> +{
> +       unsigned long this_isr;
> +       unsigned int lvl_trig;
> +
> +       this_isr = hpet_readl(HPET_STATUS) & BIT(hdata->num);
> +
> +       lvl_trig = hpet_readl(HPET_Tn_CFG(hdata->num)) & HPET_TN_LEVEL;
> +
> +       if (lvl_trig && this_isr)
> +               return true;
> +
> +       return false;
> +}
> 
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +					   struct pt_regs *regs)
> > +{
> > +	struct hpet_hld_data *hdata = hld_data;
> > +	unsigned int use_fsb;
> > +
> > +	/*
> > +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> > +	 * edge-triggered and there is no need to check the ISR register.
> > +	 */
> > +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> 
> Please do explain.. That FSB thing basically means MSI. But there's only
> a single NMI vector. How do we know this NMI came from the HPET?

Indeed, I see now that this is wrong. There is no way to know. The only way
is to use an IO APIC interrupt and read the HPET status register.

> 
> > +
> > +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> 
> So you add _2_ HPET reads for every single NMI that gets triggered...
> and IIRC HPET reads are _sllooooowwwwww_.


Since the trigger mode of the HPET timer is not expected to change, 
perhaps is_hpet_wdt_interrupt() can only need the interrupt status
register. This would reduce the reads to one. Furthermore, the hardlockup
detector can skip an X number of NMIs and reduce further the frequency
of reads. Does this make sense?

> 
> > +		return NMI_DONE;
> > +
> > +	inspect_for_hardlockups(regs);
> > +
> > +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > +		kick_timer(hdata);
> > +
> > +	/* Acknowledge interrupt if in level-triggered mode */
> > +	if (!use_fsb)
> > +		hpet_writel(BIT(hdata->num), HPET_STATUS);
> > +
> > +	return NMI_HANDLED;
> 
> So if I read this right, when in FSB/MSI mode, we'll basically _always_
> claim every single NMI as handled?
> 
> That's broken.

Yes, this is not correct. I will drop the functionality to use
FSB/MSI mode.

Thanks and BR,
Ricardo

^ permalink raw reply

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
From: Ricardo Neri @ 2018-06-15  2:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu
In-Reply-To: <alpine.DEB.2.21.1806131128570.2280@nanos.tec.linutronix.de>

On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
> >  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> >  		kick_timer(hdata);
> >  
> > +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
> 
> Eeew.

If you don't mind me asking. What is the problem with this error message?
> 
> >  /**
> > + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> > + * @val:	Attribute associated with the NMI. Not used.
> > + * @regs:	Register values as seen when the NMI was asserted
> > + *
> > + * When an NMI is issued, look for hardlockups. If the timer is not periodic,
> > + * kick it. The interrupt is always handled when if delivered via the
> > + * Front-Side Bus.
> > + *
> > + * Returns:
> > + *
> > + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> > + * otherwise.
> > + */
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +					   struct pt_regs *regs)
> > +{
> > +	struct hpet_hld_data *hdata = hld_data;
> > +	unsigned int use_fsb;
> > +
> > +	/*
> > +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> > +	 * edge-triggered and there is no need to check the ISR register.
> > +	 */
> > +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> > +
> > +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> > +		return NMI_DONE;
> 
> So for 'use_fsb == True' every single NMI will fall through into the
> watchdog code below.
> 
> > +	inspect_for_hardlockups(regs);
> > +
> > +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > +		kick_timer(hdata);
> 
> And in case that the HPET does not support periodic mode this reprogramms
> the timer on every NMI which means that while perf is running the watchdog
> will never ever detect anything.

Yes. I see that this is wrong. With MSI interrupts, as far as I can
see, there is not a way to make sure that the HPET timer caused the NMI
perhaps the only option is to use an IO APIC interrupt and read the
interrupt status register.

> 
> Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> access is horribly slow, so any high frequency perf monitoring will take a
> massive performance hit.

If an IO APIC interrupt is used, only HPET register (the status register)
would need to be read for every NMI. Would that be more acceptable? Otherwise,
there is no way to determine if the HPET cause the NMI.

Alternatively, there could be a counter that skips reading the HPET status
register (and the detection of hardlockups) for every X NMIs. This would
reduce the overall frequency of HPET register reads.

Is that more acceptable?

Thanks and BR,
Ricardo

^ permalink raw reply

* [PATCH v3] powerpc/64s/radix: Fix MADV_[FREE|DONTNEED] TLB flush miss problem with THP
From: Nicholas Piggin @ 2018-06-15  1:38 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

The patch 99baac21e4 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
problem") added a force flush mode to the mmu_gather flush, which
unconditionally flushes the entire address range being invalidated
(even if actual ptes only covered a smaller range), to solve a problem
with concurrent threads invalidating the same PTEs causing them to
miss TLBs that need flushing.

This does not work with powerpc that invalidates mmu_gather batches
according to page size. Have powerpc flush all possible page sizes in
the range if it encounters this concurrency condition.

Patch 4647706ebe ("mm: always flush VMA ranges affected by
zap_page_range") does add a TLB flush for all page sizes on powerpc for
the zap_page_range case, but that is to be removed and replaced with
the mmu_gather flush to avoid redundant flushing. It is also thought to
not cover other obscure race conditions:

https://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com

Hash does not have a problem because it invalidates TLBs inside the
page table locks.

Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
Since v2:
- Fix compile breakage.

 arch/powerpc/mm/tlb-radix.c | 96 +++++++++++++++++++++++++++++--------
 1 file changed, 75 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 67a6e86d3e7e..a734e486664d 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -689,22 +689,17 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
 static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
 static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
 
-void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
-		     unsigned long end)
+static inline void __radix__flush_tlb_range(struct mm_struct *mm,
+					unsigned long start, unsigned long end,
+					bool flush_all_sizes)
 
 {
-	struct mm_struct *mm = vma->vm_mm;
 	unsigned long pid;
 	unsigned int page_shift = mmu_psize_defs[mmu_virtual_psize].shift;
 	unsigned long page_size = 1UL << page_shift;
 	unsigned long nr_pages = (end - start) >> page_shift;
 	bool local, full;
 
-#ifdef CONFIG_HUGETLB_PAGE
-	if (is_vm_hugetlb_page(vma))
-		return radix__flush_hugetlb_tlb_range(vma, start, end);
-#endif
-
 	pid = mm->context.id;
 	if (unlikely(pid == MMU_NO_CONTEXT))
 		return;
@@ -738,37 +733,64 @@ void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 				_tlbie_pid(pid, RIC_FLUSH_TLB);
 		}
 	} else {
-		bool hflush = false;
+		bool hflush = flush_all_sizes;
+		bool gflush = flush_all_sizes;
 		unsigned long hstart, hend;
+		unsigned long gstart, gend;
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-		hstart = (start + HPAGE_PMD_SIZE - 1) >> HPAGE_PMD_SHIFT;
-		hend = end >> HPAGE_PMD_SHIFT;
-		if (hstart < hend) {
-			hstart <<= HPAGE_PMD_SHIFT;
-			hend <<= HPAGE_PMD_SHIFT;
+		if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
 			hflush = true;
+
+		if (hflush) {
+			hstart = (start + PMD_SIZE - 1) & PMD_MASK;
+			hend = end & PMD_MASK;
+			if (hstart == hend)
+				hflush = false;
+		}
+
+		if (gflush) {
+			gstart = (start + PUD_SIZE - 1) & PUD_MASK;
+			gend = end & PUD_MASK;
+			if (gstart == gend)
+				gflush = false;
 		}
-#endif
 
 		asm volatile("ptesync": : :"memory");
 		if (local) {
 			__tlbiel_va_range(start, end, pid, page_size, mmu_virtual_psize);
 			if (hflush)
 				__tlbiel_va_range(hstart, hend, pid,
-						HPAGE_PMD_SIZE, MMU_PAGE_2M);
+						PMD_SIZE, MMU_PAGE_2M);
+			if (gflush)
+				__tlbiel_va_range(gstart, gend, pid,
+						PUD_SIZE, MMU_PAGE_1G);
 			asm volatile("ptesync": : :"memory");
 		} else {
 			__tlbie_va_range(start, end, pid, page_size, mmu_virtual_psize);
 			if (hflush)
 				__tlbie_va_range(hstart, hend, pid,
-						HPAGE_PMD_SIZE, MMU_PAGE_2M);
+						PMD_SIZE, MMU_PAGE_2M);
+			if (gflush)
+				__tlbie_va_range(gstart, gend, pid,
+						PUD_SIZE, MMU_PAGE_1G);
 			fixup_tlbie();
 			asm volatile("eieio; tlbsync; ptesync": : :"memory");
 		}
 	}
 	preempt_enable();
 }
+
+void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
+		     unsigned long end)
+
+{
+#ifdef CONFIG_HUGETLB_PAGE
+	if (is_vm_hugetlb_page(vma))
+		return radix__flush_hugetlb_tlb_range(vma, start, end);
+#endif
+
+	__radix__flush_tlb_range(vma->vm_mm, start, end, false);
+}
 EXPORT_SYMBOL(radix__flush_tlb_range);
 
 static int radix_get_mmu_psize(int page_size)
@@ -837,6 +859,8 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 	int psize = 0;
 	struct mm_struct *mm = tlb->mm;
 	int page_size = tlb->page_size;
+	unsigned long start = tlb->start;
+	unsigned long end = tlb->end;
 
 	/*
 	 * if page size is not something we understand, do a full mm flush
@@ -847,15 +871,45 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 	 */
 	if (tlb->fullmm) {
 		__flush_all_mm(mm, true);
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE)
+	} else if (mm_tlb_flush_nested(mm)) {
+		/*
+		 * If there is a concurrent invalidation that is clearing ptes,
+		 * then it's possible this invalidation will miss one of those
+		 * cleared ptes and miss flushing the TLB. If this invalidate
+		 * returns before the other one flushes TLBs, that can result
+		 * in it returning while there are still valid TLBs inside the
+		 * range to be invalidated.
+		 *
+		 * See mm/memory.c:tlb_finish_mmu() for more details.
+		 *
+		 * The solution to this is ensure the entire range is always
+		 * flushed here. The problem for powerpc is that the flushes
+		 * are page size specific, so this "forced flush" would not
+		 * do the right thing if there are a mix of page sizes in
+		 * the range to be invalidated. So use __flush_tlb_range
+		 * which invalidates all possible page sizes in the range.
+		 *
+		 * PWC flush probably is not be required because the core code
+		 * shouldn't free page tables in this path, but accounting
+		 * for the possibility makes us a bit more robust.
+		 *
+		 * need_flush_all is an uncommon case because page table
+		 * teardown should be done with exclusive locks held (but
+		 * after locks are dropped another invalidate could come
+		 * in), it could be optimized further if necessary.
+		 */
+		if (!tlb->need_flush_all)
+			__radix__flush_tlb_range(mm, start, end, true);
+		else
+			radix__flush_all_mm(mm);
+#endif
 	} else if ( (psize = radix_get_mmu_psize(page_size)) == -1) {
 		if (!tlb->need_flush_all)
 			radix__flush_tlb_mm(mm);
 		else
 			radix__flush_all_mm(mm);
 	} else {
-		unsigned long start = tlb->start;
-		unsigned long end = tlb->end;
-
 		if (!tlb->need_flush_all)
 			radix__flush_tlb_range_psize(mm, start, end, psize);
 		else
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v13 00/24] selftests, powerpc, x86 : Memory Protection Keys
From: Ram Pai @ 2018-06-15  0:58 UTC (permalink / raw)
  To: Florian Weimer
  Cc: shuahkh, linux-kselftest, mpe, linuxppc-dev, linux-mm, x86,
	linux-arch, mingo, dave.hansen, mhocko, bauerman, msuchanek,
	aneesh.kumar
In-Reply-To: <c5c119b0-f5ca-4ddc-43c0-a6b597173973@redhat.com>

On Thu, Jun 14, 2018 at 10:19:11PM +0200, Florian Weimer wrote:
> On 06/14/2018 02:44 AM, Ram Pai wrote:
> >Test
> >----
> >Verified for correctness on powerpc. Need help verifying on x86.
> >Compiles on x86.
> 
> It breaks make in tools/testing/selftests/x86:
> 
> make: *** No rule to make target `protection_keys.c', needed by
> `/home/linux/tools/testing/selftests/x86/protection_keys_64'.  Stop.

Ah.. it has to be taken out from the Makefile of
/home/linux/tools/testing/selftests/x86/

The sources have been moved to /home/linux/tools/testing/selftests/mm/

> 
> The generic implementation no longer builds 32-bit binaries.  Is
> this the intent?

No. But building it 32-bit after moving it to a the new directory 
needs some special code in the Makefile. 

> 
> It's possible to build 32-bit binaries with “make CC='gcc -m32'”, so
> perhaps this is good enough?

Dave Hansen did mention it, but he did not complain too much. So I kept
quite.

> 
> But with that, I get a warning:
> 
> protection_keys.c: In function ‘dump_mem’:
> protection_keys.c:172:3: warning: format ‘%lx’ expects argument of
> type ‘long unsigned int’, but argument 4 has type ‘uint64_t’
> [-Wformat=]
>    dprintf1("dump[%03d][@%p]: %016lx\n", i, ptr, *ptr);
>    ^
> 
> I suppose you could use %016llx and add a cast to unsigned long long
> to fix this.

yes.

> 
> Anyway, both the 32-bit and 64-bit tests fail here:
> 
> assert() at protection_keys.c::943 test_nr: 12 iteration: 1
> running abort_hooks()...
> 
> I've yet checked what causes this.  It's with the kernel headers
> from 4.17, but with other userspace headers based on glibc 2.17.  I
> hope to look into this some more before the weekend, but I
> eventually have to return the test machine to the pool.

I wish I could get a x86 machine which could do memory keys. Had a AWS
instance, but struggled to boot my kernel. Can't get to the console...
gave up.  If someone can give me a ready-made machine with support for
memkeys, I can quickly fix all the outstanding x86 issues.  But if
someone can just fix it for me, ....  ;)

RP

^ permalink raw reply

* Re: [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly
From: Segher Boessenkool @ 2018-06-14 21:51 UTC (permalink / raw)
  To: Christophe LEROY
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	wei.guo.simon, linux-kernel, linuxppc-dev
In-Reply-To: <baaf4fa9-ff0b-0bcb-8662-dafb794caab4@c-s.fr>

On Tue, Jun 12, 2018 at 07:01:59PM +0200, Christophe LEROY wrote:
> 
> 
> Le 12/06/2018 à 16:53, Segher Boessenkool a écrit :
> >On Tue, Jun 12, 2018 at 09:14:53AM +0000, Christophe Leroy wrote:
> >>---
> >>Not tested on PPC64.
> >
> >It won't be acceptable until that happens.  It also is likely quite bad
> >performance on all 64-bit CPUs from the last fifteen years or so.  Or you
> >did nothing to prove otherwise, at least.
> 
> Will it be as bad as the generic implementation which does it byte per 
> byte ?

Probably not.  But how is it for short inputs, etc.?

The main point is that it needs actual testing _for correctness_.

Btw, GCC 7 and later can expand many memcmp as builtins on PowerPC (just
like memset and memcpy etc.), creating better code, without function call.


Segher

^ permalink raw reply

* Re: [PATCH] mm: convert return type of handle_mm_fault() caller to vm_fault_t
From: kbuild test robot @ 2018-06-14 21:31 UTC (permalink / raw)
  To: Souptick Joarder
  Cc: kbuild-all, willy, rth, tony.luck, mattst88, vgupta, linux,
	catalin.marinas, will.deacon, rkuo, geert, monstr, jhogan, lftan,
	jonas, jejb, benh, palmer, ysato, davem, richard, gxt, tglx, hpa,
	alexander.levin, akpm, linux-alpha, linux-kernel, linux-snps-arc,
	linux-arm-kernel, linux-hexagon, linux-ia64, linux-m68k,
	linux-mips, nios2-dev, openrisc, linux-parisc, linuxppc-dev,
	linux-riscv, linux-s390, linux-sh, sparclinux,
	user-mode-linux-devel, user-mode-linux-user, linux-xtensa, iommu,
	linux-mm, brajeswar.linux, sabyasachi.linux
In-Reply-To: <20180614190629.GA18576@jordon-HP-15-Notebook-PC>

[-- Attachment #1: Type: text/plain, Size: 10985 bytes --]

Hi Souptick,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.17]
[cannot apply to linus/master powerpc/next sparc-next/master next-20180614]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Souptick-Joarder/mm-convert-return-type-of-handle_mm_fault-caller-to-vm_fault_t/20180615-030636
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

>> arch/powerpc/mm/copro_fault.c:36:5: error: conflicting types for 'copro_handle_mm_fault'
    int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
        ^~~~~~~~~~~~~~~~~~~~~
   In file included from arch/powerpc/mm/copro_fault.c:27:0:
   arch/powerpc/include/asm/copro.h:18:5: note: previous declaration of 'copro_handle_mm_fault' was here
    int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
        ^~~~~~~~~~~~~~~~~~~~~
   In file included from include/linux/linkage.h:7:0,
                    from include/linux/kernel.h:7,
                    from include/asm-generic/bug.h:18,
                    from arch/powerpc/include/asm/bug.h:128,
                    from include/linux/bug.h:5,
                    from arch/powerpc/include/asm/mmu.h:126,
                    from arch/powerpc/include/asm/lppaca.h:36,
                    from arch/powerpc/include/asm/paca.h:21,
                    from arch/powerpc/include/asm/current.h:16,
                    from include/linux/sched.h:12,
                    from arch/powerpc/mm/copro_fault.c:23:
   arch/powerpc/mm/copro_fault.c:101:19: error: conflicting types for 'copro_handle_mm_fault'
    EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
                      ^
   include/linux/export.h:65:21: note: in definition of macro '___EXPORT_SYMBOL'
     extern typeof(sym) sym;      \
                        ^~~
   arch/powerpc/mm/copro_fault.c:101:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
    EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
    ^~~~~~~~~~~~~~~~~
   In file included from arch/powerpc/mm/copro_fault.c:27:0:
   arch/powerpc/include/asm/copro.h:18:5: note: previous declaration of 'copro_handle_mm_fault' was here
    int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
        ^~~~~~~~~~~~~~~~~~~~~

vim +/copro_handle_mm_fault +36 arch/powerpc/mm/copro_fault.c

7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   30  
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   31  /*
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   32   * This ought to be kept in sync with the powerpc specific do_page_fault
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   33   * function. Currently, there are a few corner cases that we haven't had
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   34   * to handle fortunately.
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   35   */
e83d01697 arch/powerpc/mm/copro_fault.c           Ian Munsie         2014-10-08  @36  int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
7c71e54a4 arch/powerpc/mm/copro_fault.c           Souptick Joarder   2018-06-15   37  		unsigned long dsisr, vm_fault_t *flt)
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   38  {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   39  	struct vm_area_struct *vma;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   40  	unsigned long is_write;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   41  	int ret;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   42  
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   43  	if (mm == NULL)
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   44  		return -EFAULT;
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   45  
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   46  	if (mm->pgd == NULL)
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   47  		return -EFAULT;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   48  
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   49  	down_read(&mm->mmap_sem);
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   50  	ret = -EFAULT;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   51  	vma = find_vma(mm, ea);
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   52  	if (!vma)
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   53  		goto out_unlock;
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   54  
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   55  	if (ea < vma->vm_start) {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   56  		if (!(vma->vm_flags & VM_GROWSDOWN))
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   57  			goto out_unlock;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   58  		if (expand_stack(vma, ea))
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   59  			goto out_unlock;
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   60  	}
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   61  
e83d01697 arch/powerpc/mm/copro_fault.c           Ian Munsie         2014-10-08   62  	is_write = dsisr & DSISR_ISSTORE;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   63  	if (is_write) {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   64  		if (!(vma->vm_flags & VM_WRITE))
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   65  			goto out_unlock;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   66  	} else {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   67  		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   68  			goto out_unlock;
842915f56 arch/powerpc/mm/copro_fault.c           Mel Gorman         2015-02-12   69  		/*
18061c17c arch/powerpc/mm/copro_fault.c           Aneesh Kumar K.V   2017-01-30   70  		 * PROT_NONE is covered by the VMA check above.
18061c17c arch/powerpc/mm/copro_fault.c           Aneesh Kumar K.V   2017-01-30   71  		 * and hash should get a NOHPTE fault instead of
18061c17c arch/powerpc/mm/copro_fault.c           Aneesh Kumar K.V   2017-01-30   72  		 * a PROTFAULT in case fixup is needed for things
18061c17c arch/powerpc/mm/copro_fault.c           Aneesh Kumar K.V   2017-01-30   73  		 * like autonuma.
842915f56 arch/powerpc/mm/copro_fault.c           Mel Gorman         2015-02-12   74  		 */
18061c17c arch/powerpc/mm/copro_fault.c           Aneesh Kumar K.V   2017-01-30   75  		if (!radix_enabled())
842915f56 arch/powerpc/mm/copro_fault.c           Mel Gorman         2015-02-12   76  			WARN_ON_ONCE(dsisr & DSISR_PROTFAULT);
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   77  	}
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   78  
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   79  	ret = 0;
dcddffd41 arch/powerpc/mm/copro_fault.c           Kirill A. Shutemov 2016-07-26   80  	*flt = handle_mm_fault(vma, ea, is_write ? FAULT_FLAG_WRITE : 0);
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   81  	if (unlikely(*flt & VM_FAULT_ERROR)) {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   82  		if (*flt & VM_FAULT_OOM) {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   83  			ret = -ENOMEM;
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   84  			goto out_unlock;
33692f275 arch/powerpc/mm/copro_fault.c           Linus Torvalds     2015-01-29   85  		} else if (*flt & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   86  			ret = -EFAULT;
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   87  			goto out_unlock;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   88  		}
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   89  		BUG();
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   90  	}
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   91  
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   92  	if (*flt & VM_FAULT_MAJOR)
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   93  		current->maj_flt++;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   94  	else
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   95  		current->min_flt++;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   96  
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   97  out_unlock:
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   98  	up_read(&mm->mmap_sem);
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   99  	return ret;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20  100  }
e83d01697 arch/powerpc/mm/copro_fault.c           Ian Munsie         2014-10-08  101  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
73d16a6e0 arch/powerpc/mm/copro_fault.c           Ian Munsie         2014-10-08  102  

:::::: The code at line 36 was first introduced by commit
:::::: e83d01697583d8610d1d62279758c2a881e3396f powerpc/cell: Move spu_handle_mm_fault() out of cell platform

:::::: TO: Ian Munsie <imunsie@au1.ibm.com>
:::::: CC: Michael Ellerman <mpe@ellerman.id.au>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23364 bytes --]

^ permalink raw reply

* Re: [PATCH v13 00/24] selftests, powerpc, x86 : Memory Protection Keys
From: Florian Weimer @ 2018-06-14 20:19 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, dave.hansen,
	mhocko, bauerman, msuchanek, aneesh.kumar
In-Reply-To: <1528937115-10132-1-git-send-email-linuxram@us.ibm.com>

On 06/14/2018 02:44 AM, Ram Pai wrote:
> Test
> ----
> Verified for correctness on powerpc. Need help verifying on x86.
> Compiles on x86.

It breaks make in tools/testing/selftests/x86:

make: *** No rule to make target `protection_keys.c', needed by 
`/home/linux/tools/testing/selftests/x86/protection_keys_64'.  Stop.

The generic implementation no longer builds 32-bit binaries.  Is this 
the intent?

It's possible to build 32-bit binaries with “make CC='gcc -m32'”, so 
perhaps this is good enough?

But with that, I get a warning:

protection_keys.c: In function ‘dump_mem’:
protection_keys.c:172:3: warning: format ‘%lx’ expects argument of type 
‘long unsigned int’, but argument 4 has type ‘uint64_t’ [-Wformat=]
    dprintf1("dump[%03d][@%p]: %016lx\n", i, ptr, *ptr);
    ^

I suppose you could use %016llx and add a cast to unsigned long long to 
fix this.

Anyway, both the 32-bit and 64-bit tests fail here:

assert() at protection_keys.c::943 test_nr: 12 iteration: 1
running abort_hooks()...

I've yet checked what causes this.  It's with the kernel headers from 
4.17, but with other userspace headers based on glibc 2.17.  I hope to 
look into this some more before the weekend, but I eventually have to 
return the test machine to the pool.

Thanks,
Florian

^ permalink raw reply

* [PATCH] mm: convert return type of handle_mm_fault() caller to vm_fault_t
From: Souptick Joarder @ 2018-06-14 19:06 UTC (permalink / raw)
  To: willy, rth, tony.luck, mattst88, vgupta, linux, catalin.marinas,
	will.deacon, rkuo, geert, monstr, jhogan, lftan, jonas, jejb,
	benh, palmer, ysato, davem, richard, gxt, tglx, hpa,
	alexander.levin, akpm
  Cc: linux-alpha, linux-kernel, linux-snps-arc, linux-arm-kernel,
	linux-hexagon, linux-ia64, linux-m68k, linux-mips, nios2-dev,
	openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, user-mode-linux-devel, user-mode-linux-user,
	linux-xtensa, iommu, linux-mm, brajeswar.linux, sabyasachi.linux

Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.

Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

In this patch all the caller of handle_mm_fault()
are changed to return vm_fault_t type.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
---
 arch/alpha/mm/fault.c         |  3 ++-
 arch/arc/mm/fault.c           |  4 +++-
 arch/arm/mm/fault.c           |  7 ++++---
 arch/arm64/mm/fault.c         |  6 +++---
 arch/hexagon/mm/vm_fault.c    |  2 +-
 arch/ia64/mm/fault.c          |  2 +-
 arch/m68k/mm/fault.c          |  4 ++--
 arch/microblaze/mm/fault.c    |  2 +-
 arch/mips/mm/fault.c          |  2 +-
 arch/nds32/mm/fault.c         |  2 +-
 arch/nios2/mm/fault.c         |  2 +-
 arch/openrisc/mm/fault.c      |  2 +-
 arch/parisc/mm/fault.c        |  2 +-
 arch/powerpc/mm/copro_fault.c |  2 +-
 arch/powerpc/mm/fault.c       |  7 ++++---
 arch/riscv/mm/fault.c         |  3 ++-
 arch/s390/mm/fault.c          | 13 ++++++++-----
 arch/sh/mm/fault.c            |  4 ++--
 arch/sparc/mm/fault_32.c      |  3 ++-
 arch/sparc/mm/fault_64.c      |  3 ++-
 arch/um/kernel/trap.c         |  2 +-
 arch/unicore32/mm/fault.c     |  9 +++++----
 arch/x86/mm/fault.c           |  5 +++--
 arch/xtensa/mm/fault.c        |  2 +-
 drivers/iommu/amd_iommu_v2.c  |  2 +-
 drivers/iommu/intel-svm.c     |  4 +++-
 mm/hmm.c                      |  8 ++++----
 mm/ksm.c                      |  2 +-
 28 files changed, 62 insertions(+), 47 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index cd3c572..2a979ee 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -87,7 +87,8 @@
 	struct vm_area_struct * vma;
 	struct mm_struct *mm = current->mm;
 	const struct exception_table_entry *fixup;
-	int fault, si_code = SEGV_MAPERR;
+	int si_code = SEGV_MAPERR;
+	vm_fault_t fault;
 	siginfo_t info;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index a0b7bd6..3a18d33 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -15,6 +15,7 @@
 #include <linux/uaccess.h>
 #include <linux/kdebug.h>
 #include <linux/perf_event.h>
+#include <linux/mm_types.h>
 #include <asm/pgalloc.h>
 #include <asm/mmu.h>
 
@@ -66,7 +67,8 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 	struct task_struct *tsk = current;
 	struct mm_struct *mm = tsk->mm;
 	siginfo_t info;
-	int fault, ret;
+	int ret;
+	vm_fault_t fault;
 	int write = regs->ecr_cause & ECR_C_PROTV_STORE;  /* ST/EX */
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index b75eada..758abcb 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -219,12 +219,12 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
 	return vma->vm_flags & mask ? false : true;
 }
 
-static int __kprobes
+static vm_fault_t __kprobes
 __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
 		unsigned int flags, struct task_struct *tsk)
 {
 	struct vm_area_struct *vma;
-	int fault;
+	vm_fault_t fault;
 
 	vma = find_vma(mm, addr);
 	fault = VM_FAULT_BADMAP;
@@ -259,7 +259,8 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
 {
 	struct task_struct *tsk;
 	struct mm_struct *mm;
-	int fault, sig, code;
+	int sig, code;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	if (notify_page_fault(regs, fsr))
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 2af3dd8..8da263b 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -371,12 +371,12 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
 #define VM_FAULT_BADMAP		0x010000
 #define VM_FAULT_BADACCESS	0x020000
 
-static int __do_page_fault(struct mm_struct *mm, unsigned long addr,
+static vm_fault_t __do_page_fault(struct mm_struct *mm, unsigned long addr,
 			   unsigned int mm_flags, unsigned long vm_flags,
 			   struct task_struct *tsk)
 {
 	struct vm_area_struct *vma;
-	int fault;
+	vm_fault_t fault;
 
 	vma = find_vma(mm, addr);
 	fault = VM_FAULT_BADMAP;
@@ -419,7 +419,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	struct siginfo si;
-	int fault, major = 0;
+	vm_fault_t fault, major = 0;
 	unsigned long vm_flags = VM_READ | VM_WRITE;
 	unsigned int mm_flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
index 3eec33c..5d1de6c 100644
--- a/arch/hexagon/mm/vm_fault.c
+++ b/arch/hexagon/mm/vm_fault.c
@@ -52,7 +52,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
 	struct mm_struct *mm = current->mm;
 	siginfo_t info;
 	int si_code = SEGV_MAPERR;
-	int fault;
+	vm_fault_t fault;
 	const struct exception_table_entry *fixup;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index dfdc152..e085d89 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -87,7 +87,7 @@ static inline int notify_page_fault(struct pt_regs *regs, int trap)
 	struct mm_struct *mm = current->mm;
 	struct siginfo si;
 	unsigned long mask;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	mask = ((((isr >> IA64_ISR_X_BIT) & 1UL) << VM_EXEC_BIT)
diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index 03253c4..1fc7ac0 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -73,7 +73,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct * vma;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	pr_debug("do page fault:\nregs->sr=%#x, regs->pc=%#lx, address=%#lx, %ld, %p\n",
@@ -139,7 +139,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
 	 */
 
 	fault = handle_mm_fault(vma, address, flags);
-	pr_debug("handle_mm_fault returns %d\n", fault);
+	pr_debug("handle_mm_fault returns %x\n", fault);
 
 	if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
 		return 0;
diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
index f91b30f..92a8682 100644
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -91,7 +91,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
 	siginfo_t info;
 	int code = SEGV_MAPERR;
 	int is_write = error_code & ESR_S;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	regs->ear = address;
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index 4f8f5bf..0bc5030 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -43,7 +43,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
 	struct mm_struct *mm = tsk->mm;
 	const int field = sizeof(unsigned long) * 2;
 	siginfo_t info;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10);
diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c
index 3a246fb..96796d3 100644
--- a/arch/nds32/mm/fault.c
+++ b/arch/nds32/mm/fault.c
@@ -73,7 +73,7 @@ void do_page_fault(unsigned long entry, unsigned long addr,
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	siginfo_t info;
-	int fault;
+	vm_fault_t fault;
 	unsigned int mask = VM_READ | VM_WRITE | VM_EXEC;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index b804dd0..24fd84c 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -47,7 +47,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
 	struct task_struct *tsk = current;
 	struct mm_struct *mm = tsk->mm;
 	int code = SEGV_MAPERR;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	cause >>= 2;
diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index d0021df..21e8f16 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -53,7 +53,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	siginfo_t info;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	tsk = current;
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index e247edb..ff9e634 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -262,7 +262,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	unsigned long acc_type;
-	int fault = 0;
+	vm_fault_t fault = 0;
 	unsigned int flags;
 
 	if (faulthandler_disabled())
diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index 7d0945b..c8da352 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -34,7 +34,7 @@
  * to handle fortunately.
  */
 int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
-		unsigned long dsisr, unsigned *flt)
+		unsigned long dsisr, vm_fault_t *flt)
 {
 	struct vm_area_struct *vma;
 	unsigned long is_write;
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c01d627..17cce1b 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -159,7 +159,7 @@ static noinline int bad_access(struct pt_regs *regs, unsigned long address)
 }
 
 static int do_sigbus(struct pt_regs *regs, unsigned long address,
-		     unsigned int fault)
+		     vm_fault_t fault)
 {
 	siginfo_t info;
 	unsigned int lsb = 0;
@@ -189,7 +189,8 @@ static int do_sigbus(struct pt_regs *regs, unsigned long address,
 	return 0;
 }
 
-static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
+static int mm_fault_error(struct pt_regs *regs, unsigned long addr,
+				vm_fault_t fault)
 {
 	/*
 	 * Kernel page fault interrupted by SIGKILL. We have no reason to
@@ -402,7 +403,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
  	int is_exec = TRAP(regs) == 0x400;
 	int is_user = user_mode(regs);
 	int is_write = page_fault_is_write(error_code);
-	int fault, major = 0;
+	vm_fault_t fault, major = 0;
 	bool store_update_sp = false;
 
 	if (notify_page_fault(regs))
diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 148c98c..88401d5 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -41,7 +41,8 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
 	struct mm_struct *mm;
 	unsigned long addr, cause;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
-	int fault, code = SEGV_MAPERR;
+	int code = SEGV_MAPERR;
+	vm_fault_t fault;
 
 	cause = regs->scause;
 	addr = regs->sbadaddr;
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 93faeca..8ea0855 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -350,7 +350,8 @@ static noinline int signal_return(struct pt_regs *regs)
 	return -EACCES;
 }
 
-static noinline void do_fault_error(struct pt_regs *regs, int access, int fault)
+static noinline void do_fault_error(struct pt_regs *regs, int access,
+					vm_fault_t fault)
 {
 	int si_code;
 
@@ -410,7 +411,7 @@ static noinline void do_fault_error(struct pt_regs *regs, int access, int fault)
  *   11       Page translation     ->  Not present       (nullification)
  *   3b       Region third trans.  ->  Not present       (nullification)
  */
-static inline int do_exception(struct pt_regs *regs, int access)
+static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 {
 	struct gmap *gmap;
 	struct task_struct *tsk;
@@ -420,7 +421,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
 	unsigned long trans_exc_code;
 	unsigned long address;
 	unsigned int flags;
-	int fault;
+	vm_fault_t fault;
 
 	tsk = current;
 	/*
@@ -571,7 +572,8 @@ static inline int do_exception(struct pt_regs *regs, int access)
 void do_protection_exception(struct pt_regs *regs)
 {
 	unsigned long trans_exc_code;
-	int access, fault;
+	int access;
+	vm_fault_t fault;
 
 	trans_exc_code = regs->int_parm_long;
 	/*
@@ -606,7 +608,8 @@ void do_protection_exception(struct pt_regs *regs)
 
 void do_dat_exception(struct pt_regs *regs)
 {
-	int access, fault;
+	int access;
+	vm_fault_t fault;
 
 	access = VM_READ | VM_EXEC | VM_WRITE;
 	fault = do_exception(regs, access);
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index 6fd1bf7..474bf14 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -320,7 +320,7 @@ static noinline int vmalloc_fault(unsigned long address)
 
 static noinline int
 mm_fault_error(struct pt_regs *regs, unsigned long error_code,
-	       unsigned long address, unsigned int fault)
+	       unsigned long address, vm_fault_t fault)
 {
 	/*
 	 * Pagefault was interrupted by SIGKILL. We have no reason to
@@ -403,7 +403,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	struct vm_area_struct * vma;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	tsk = current;
diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index a8103a8..1a44a4e 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -174,7 +174,8 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
 	unsigned int fixup;
 	unsigned long g2;
 	int from_user = !(regs->psr & PSR_PS);
-	int fault, code;
+	int code;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	if (text_fault)
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 41363f4..2078bfe 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -284,7 +284,8 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	unsigned int insn = 0;
-	int si_code, fault_code, fault;
+	int si_code, fault_code;
+	vm_fault_t fault;
 	unsigned long address, mm_rss;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index b2b02df..0afcd09 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -72,7 +72,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
 	}
 
 	do {
-		int fault;
+		vm_fault_t fault;
 
 		fault = handle_mm_fault(vma, address, flags);
 
diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index bbefcc4..2982140 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -167,11 +167,11 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
 	return vma->vm_flags & mask ? false : true;
 }
 
-static int __do_pf(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
-		unsigned int flags, struct task_struct *tsk)
+static vm_fault_t __do_pf(struct mm_struct *mm, unsigned long addr,
+		unsigned int fsr, unsigned int flags, struct task_struct *tsk)
 {
 	struct vm_area_struct *vma;
-	int fault;
+	vm_fault_t fault;
 
 	vma = find_vma(mm, addr);
 	fault = VM_FAULT_BADMAP;
@@ -208,7 +208,8 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 {
 	struct task_struct *tsk;
 	struct mm_struct *mm;
-	int fault, sig, code;
+	int sig, code;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	tsk = current;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 73bd8c9..5171d60 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -16,6 +16,7 @@
 #include <linux/prefetch.h>		/* prefetchw			*/
 #include <linux/context_tracking.h>	/* exception_enter(), ...	*/
 #include <linux/uaccess.h>		/* faulthandler_disabled()	*/
+#include <linux/mm_types.h>
 
 #include <asm/cpufeature.h>		/* boot_cpu_has, ...		*/
 #include <asm/traps.h>			/* dotraplinkage, ...		*/
@@ -1004,7 +1005,7 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
 
 static noinline void
 mm_fault_error(struct pt_regs *regs, unsigned long error_code,
-	       unsigned long address, u32 *pkey, unsigned int fault)
+	       unsigned long address, u32 *pkey, vm_fault_t fault)
 {
 	if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) {
 		no_context(regs, error_code, address, 0, 0);
@@ -1218,7 +1219,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
 	struct mm_struct *mm;
-	int fault, major = 0;
+	vm_fault_t fault, major = 0;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 	u32 pkey;
 
diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index 8b9b6f4..203fade 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -42,7 +42,7 @@ void do_page_fault(struct pt_regs *regs)
 	siginfo_t info;
 
 	int is_write, is_exec;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	info.si_code = SEGV_MAPERR;
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 1d0b53a0..58da65d 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -508,7 +508,7 @@ static void do_fault(struct work_struct *work)
 {
 	struct fault *fault = container_of(work, struct fault, work);
 	struct vm_area_struct *vma;
-	int ret = VM_FAULT_ERROR;
+	vm_fault_t ret = VM_FAULT_ERROR;
 	unsigned int flags = 0;
 	struct mm_struct *mm;
 	u64 address;
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index e8cd984..75189c0 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -24,6 +24,7 @@
 #include <linux/pci-ats.h>
 #include <linux/dmar.h>
 #include <linux/interrupt.h>
+#include <linux/mm_types.h>
 #include <asm/page.h>
 
 #define PASID_ENTRY_P		BIT_ULL(0)
@@ -594,7 +595,8 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 		struct vm_area_struct *vma;
 		struct page_req_dsc *req;
 		struct qi_desc resp;
-		int ret, result;
+		int result;
+		vm_fault_t ret;
 		u64 address;
 
 		handled = 1;
diff --git a/mm/hmm.c b/mm/hmm.c
index 486dc39..d7919e5 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -308,14 +308,14 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
 	struct vm_area_struct *vma = walk->vma;
-	int r;
+	vm_fault_t ret;
 
 	flags |= hmm_vma_walk->block ? 0 : FAULT_FLAG_ALLOW_RETRY;
 	flags |= write_fault ? FAULT_FLAG_WRITE : 0;
-	r = handle_mm_fault(vma, addr, flags);
-	if (r & VM_FAULT_RETRY)
+	ret = handle_mm_fault(vma, addr, flags);
+	if (ret & VM_FAULT_RETRY)
 		return -EBUSY;
-	if (r & VM_FAULT_ERROR) {
+	if (ret & VM_FAULT_ERROR) {
 		*pfn = range->values[HMM_PFN_ERROR];
 		return -EFAULT;
 	}
diff --git a/mm/ksm.c b/mm/ksm.c
index e3cbf9a..cb4e6ed 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -451,7 +451,7 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
 static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
 {
 	struct page *page;
-	int ret = 0;
+	vm_fault_t ret = 0;
 
 	do {
 		cond_resched();
-- 
1.9.1

^ permalink raw reply related

* Re: powerpc: use time64_t in read_persistent_clock
From: Mathieu Malaterre @ 2018-06-14 19:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Benjamin Herrenschmidt, Michael Ellerman, linuxppc-dev,
	Meelis Roos
In-Reply-To: <CAK8P3a3khJexs5FMzvn2+askoT8T1r-0eK6VAcZV3mQ9FPrc3w@mail.gmail.com>

On Thu, Jun 14, 2018 at 1:46 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Jun 13, 2018 at 10:24 PM, Mathieu Malaterre <malat@debian.org> wrote:
> > Arnd,
> >
> > In 5bfd643583b2e I can see that you changed:
> >
> > $ git show 5bfd643583b2e -- arch/powerpc/platforms/powermac/time.c
> > [...]
> >  #ifdef CONFIG_ADB_PMU
> > -static unsigned long pmu_get_time(void)
> > +static time64_t pmu_get_time(void)
> >  {
> >         struct adb_request req;
> > -       unsigned int now;
> > +       time64_t now;
> >
> >         if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
> >                 return 0;
> > @@ -160,10 +151,10 @@ static unsigned long pmu_get_time(void)
> >                        req.reply_len);
> >         now = (req.reply[0] << 24) + (req.reply[1] << 16)
> >                 + (req.reply[2] << 8) + req.reply[3];
> > -       return ((unsigned long)now) - RTC_OFFSET;
> > +       return now - RTC_OFFSET;
> >  }
> > [...]
> >
> > As far as I can tell the old function would never return a negative
> > value and rely on unsigned long roll over. Now I am seeing negative
> > value being returned and it seems to break later on in the rtc
> > library.
> >
> > Could you comment why you removed the cast to (unsigned long) in your commit ?
>
> I'm sorry my patch caused a regression. Even with your bug report
> it took me a while to see what exactly went wrong, and how I got
> mixed up the integer conversion rules.
>
> I mistakenly assume that fiven
>
>        long long now;
>        unsigned char reply[4];
>        now = reply[0] << 24;
>
> we would always end up with a positive number, but since reply[0] is
> promoted to a signed integer, the right-hand side of the assignment
> ends up as a negative number for the usual contents of the RTC.

Ah right. My diagnosis was invalid.

> Can you confirm that this patch addresses your problem?

Yes !

Before:
[    5.986710] rtc-generic rtc-generic: hctosys: unable to read the
hardware clock

After:
[    5.579611] rtc-generic rtc-generic: setting system clock to
2018-06-14 18:57:00 UTC (1529002620)

So for the above:

Tested-by: Mathieu Malaterre <malat@debian.org>

> diff --git a/arch/powerpc/platforms/powermac/time.c
> b/arch/powerpc/platforms/powermac/time.c
> index 7c968e46736f..a7fdcd3051f9 100644
> --- a/arch/powerpc/platforms/powermac/time.c
> +++ b/arch/powerpc/platforms/powermac/time.c
> @@ -97,7 +97,7 @@ static time64_t cuda_get_time(void)
>         if (req.reply_len != 7)
>                 printk(KERN_ERR "cuda_get_time: got %d byte reply\n",
>                        req.reply_len);
> -       now = (req.reply[3] << 24) + (req.reply[4] << 16)
> +       now = (u32)(req.reply[3] << 24) + (req.reply[4] << 16)
>                 + (req.reply[5] << 8) + req.reply[6];
>         return now - RTC_OFFSET;
>  }
> @@ -140,7 +140,7 @@ static time64_t pmu_get_time(void)
>         if (req.reply_len != 4)
>                 printk(KERN_ERR "pmu_get_time: got %d byte reply from PMU\n",
>                        req.reply_len);
> -       now = (req.reply[0] << 24) + (req.reply[1] << 16)
> +       now = (u32)(req.reply[0] << 24) + (req.reply[1] << 16)
>                 + (req.reply[2] << 8) + req.reply[3];
>         return now - RTC_OFFSET;
>  }
>
> On a related note, we do have some freedom in how we want to
> interpret values beyond year 2038/2040 when the RTC does wrap
> around.
>
> - With the original code, we converted the time into a signed
>   32-bit integer based on the 1970 Unix epoch, converting RTC
>   years 2038 through 2040 into Linux years 1902 through 1904,
>   which doesn't seem very helpful.
>
> - With my fix above, we would interpret the numbers as documented,
>   with 2040 wrapping around to 1904.
>
> - If we want, we could reinterpret the 1904-1970 range as later
>   times between 2040 and 2106, like this:
>
> --- a/arch/powerpc/platforms/powermac/time.c
> +++ b/arch/powerpc/platforms/powermac/time.c
> @@ -88,7 +88,7 @@ long __init pmac_time_init(void)
>  static time64_t cuda_get_time(void)
>  {
>         struct adb_request req;
> -       time64_t now;
> +       u32 now;
>
>         if (cuda_request(&req, NULL, 2, CUDA_PACKET, CUDA_GET_TIME) < 0)
>                 return 0;
> @@ -132,7 +132,7 @@ static int cuda_set_rtc_time(struct rtc_time *tm)
>  static time64_t pmu_get_time(void)
>  {
>         struct adb_request req;
> -       time64_t now;
> +       u32 now;
>
>         if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
>                 return 0;
>
> On the upside, this would possibly extend the life of some machines
> by another 66 years, on the downside it might cause issues when
> an empty RTC battery causes the initial system time to be
> above the 32-bit TIME_T_MAX (instead of going back to 1904).

I would submit the first patch and add the above as comment. I doubt
anyone would still have a running system by then.

Thanks

^ permalink raw reply

* Re: [PATCH] misc: ocxl: Change return type for fault handler
From: Frederic Barrat @ 2018-06-14 16:06 UTC (permalink / raw)
  To: Souptick Joarder, willy, fbarrat, andrew.donnellan, arnd, gregkh
  Cc: linuxppc-dev, linux-kernel, sabyasachi.linux, brajeswar.linux
In-Reply-To: <20180611202904.GA25538@jordon-HP-15-Notebook-PC>



Le 11/06/2018 à 22:29, Souptick Joarder a écrit :
> Use new return type vm_fault_t for fault handler. For
> now, this is just documenting that the function returns
> a VM_FAULT value rather than an errno. Once all instances
> are converted, vm_fault_t will become a distinct type.
> 
> Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")
> 
> There is an existing bug when vm_insert_pfn() can return
> ENOMEM which was ignored and VM_FAULT_NOPAGE returned as
> default. The new inline vmf_insert_pfn() has removed
> this inefficiency by returning correct vm_fault_ type.
> 
> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
> ---

Thanks!

Tested and
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>


>   drivers/misc/ocxl/context.c | 22 +++++++++++-----------
>   drivers/misc/ocxl/sysfs.c   |  5 ++---
>   2 files changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
> index 909e880..98daf91 100644
> --- a/drivers/misc/ocxl/context.c
> +++ b/drivers/misc/ocxl/context.c
> @@ -83,7 +83,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
>   	return rc;
>   }
> 
> -static int map_afu_irq(struct vm_area_struct *vma, unsigned long address,
> +static vm_fault_t map_afu_irq(struct vm_area_struct *vma, unsigned long address,
>   		u64 offset, struct ocxl_context *ctx)
>   {
>   	u64 trigger_addr;
> @@ -92,15 +92,15 @@ static int map_afu_irq(struct vm_area_struct *vma, unsigned long address,
>   	if (!trigger_addr)
>   		return VM_FAULT_SIGBUS;
> 
> -	vm_insert_pfn(vma, address, trigger_addr >> PAGE_SHIFT);
> -	return VM_FAULT_NOPAGE;
> +	return vmf_insert_pfn(vma, address, trigger_addr >> PAGE_SHIFT);
>   }
> 
> -static int map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
> +static vm_fault_t map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
>   		u64 offset, struct ocxl_context *ctx)
>   {
>   	u64 pp_mmio_addr;
>   	int pasid_off;
> +	vm_fault_t ret;
> 
>   	if (offset >= ctx->afu->config.pp_mmio_stride)
>   		return VM_FAULT_SIGBUS;
> @@ -118,27 +118,27 @@ static int map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
>   		pasid_off * ctx->afu->config.pp_mmio_stride +
>   		offset;
> 
> -	vm_insert_pfn(vma, address, pp_mmio_addr >> PAGE_SHIFT);
> +	ret = vmf_insert_pfn(vma, address, pp_mmio_addr >> PAGE_SHIFT);
>   	mutex_unlock(&ctx->status_mutex);
> -	return VM_FAULT_NOPAGE;
> +	return ret;
>   }
> 
> -static int ocxl_mmap_fault(struct vm_fault *vmf)
> +static vm_fault_t ocxl_mmap_fault(struct vm_fault *vmf)
>   {
>   	struct vm_area_struct *vma = vmf->vma;
>   	struct ocxl_context *ctx = vma->vm_file->private_data;
>   	u64 offset;
> -	int rc;
> +	vm_fault_t ret;
> 
>   	offset = vmf->pgoff << PAGE_SHIFT;
>   	pr_debug("%s: pasid %d address 0x%lx offset 0x%llx\n", __func__,
>   		ctx->pasid, vmf->address, offset);
> 
>   	if (offset < ctx->afu->irq_base_offset)
> -		rc = map_pp_mmio(vma, vmf->address, offset, ctx);
> +		ret = map_pp_mmio(vma, vmf->address, offset, ctx);
>   	else
> -		rc = map_afu_irq(vma, vmf->address, offset, ctx);
> -	return rc;
> +		ret = map_afu_irq(vma, vmf->address, offset, ctx);
> +	return ret;
>   }
> 
>   static const struct vm_operations_struct ocxl_vmops = {
> diff --git a/drivers/misc/ocxl/sysfs.c b/drivers/misc/ocxl/sysfs.c
> index d9753a1..0ab1fd1 100644
> --- a/drivers/misc/ocxl/sysfs.c
> +++ b/drivers/misc/ocxl/sysfs.c
> @@ -64,7 +64,7 @@ static ssize_t global_mmio_read(struct file *filp, struct kobject *kobj,
>   	return count;
>   }
> 
> -static int global_mmio_fault(struct vm_fault *vmf)
> +static vm_fault_t global_mmio_fault(struct vm_fault *vmf)
>   {
>   	struct vm_area_struct *vma = vmf->vma;
>   	struct ocxl_afu *afu = vma->vm_private_data;
> @@ -75,8 +75,7 @@ static int global_mmio_fault(struct vm_fault *vmf)
> 
>   	offset = vmf->pgoff;
>   	offset += (afu->global_mmio_start >> PAGE_SHIFT);
> -	vm_insert_pfn(vma, vmf->address, offset);
> -	return VM_FAULT_NOPAGE;
> +	return vmf_insert_pfn(vma, vmf->address, offset);
>   }
> 
>   static const struct vm_operations_struct global_mmio_vmops = {
> 

^ permalink raw reply

* [PATCH] powerpc/e500mc: Set assembler machine type to e500mc
From: Michael Jeanson @ 2018-06-14 15:27 UTC (permalink / raw)
  Cc: Michael Jeanson, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Kumar Gala, Vakul Garg, Scott Wood,
	Mathieu Desnoyers, linuxppc-dev, linux-kernel, stable

In binutils 2.26 a new opcode for the "wait" instruction was added for the
POWER9 and has precedence over the one specific to the e500mc. Commit
ebf714ff3756 ("powerpc/e500mc: Add support for the wait instruction in
e500_idle") uses this instruction specifically on the e500mc to work around
an erratum.

This results in an invalid instruction in idle_e500 when we build for the
e500mc on bintutils >= 2.26 with the default assembler machine type.

Since multiplatform between e500 and non-e500 is not supported, set the
assembler machine type globaly when CONFIG_PPC_E500MC=y.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Kumar Gala <galak@kernel.crashing.org>
CC: Vakul Garg <vakul.garg@nxp.com>
CC: Scott Wood <swood@redhat.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-kernel@vger.kernel.org
CC: stable@vger.kernel.org
---
 arch/powerpc/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 95813df90801..bb2523b4bd8f 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -251,6 +251,7 @@ cpu-as-$(CONFIG_4xx)		+= -Wa,-m405
 cpu-as-$(CONFIG_ALTIVEC)	+= $(call as-option,-Wa$(comma)-maltivec)
 cpu-as-$(CONFIG_E200)		+= -Wa,-me200
 cpu-as-$(CONFIG_PPC_BOOK3S_64)	+= -Wa,-mpower4
+cpu-as-$(CONFIG_PPC_E500MC)	+= $(call as-option,-Wa$(comma)-me500mc)
 
 KBUILD_AFLAGS += $(cpu-as-y)
 KBUILD_CFLAGS += $(cpu-as-y)
-- 
2.17.1

^ permalink raw reply related

* Re: [RFC PATCH 2/5] powerpc: Flush checkpointed gpr state for 32-bit processes in ptrace
From: Pedro Franco de Carvalho @ 2018-06-14 13:55 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
In-Reply-To: <87d0wvtolp.fsf@concordia.ellerman.id.au>

Michael Ellerman <mpe@ellerman.id.au> writes:

> Can you add a helper that does it and use that helper in these two
> functions. Then if you can send me another patch that converts all the
> other uses to use the new helper.

Yes, I'll do this.

Thanks!

--
Pedro

^ permalink raw reply

* Re: [RFC PATCH 1/5] powerpc: Fix inverted active predicate for setting the EBB regset
From: Pedro Franco de Carvalho @ 2018-06-14 13:52 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, Anshuman Khandual
In-Reply-To: <87fu1rtot3.fsf@concordia.ellerman.id.au>

Michael Ellerman <mpe@ellerman.id.au> writes:

> Hi Pedro,
>
> Thanks for looking into this, how did you detect this? Do you have a
> test case?

Hello,

I'm working on allowing these registers to be accessed through GDB,
which is where I saw this happen. Then I used a small program that
traces another, then reads and tries to write to the regset, but not in
linux selftest format.

> I don't think Anshuman wrote it this way on purpose, but added him to Cc
> in case he remembers.
>
> But I don't think this fix is necessarily right. If we are setting the
> EBB regs via ptrace then it doesn't matter if they were previously in
> use or not, we should just set them. What *does* matter is that at the
> end of the function we set used_ebb to true, because otherwise the
> values we have set will not actually be used when the process is
> rescheduled.

Now I'm not sure why the ptrace calls for the ebb regset are guarded
with used_ebb in the first place. The save/restore_sprs functions in
kernel/process.c seem to handle the ebb registers regardless of this
flag, and it seems to be possible for user programs to read and write to
these registers even without having first created a perf event.

The flag only appears to be used in perf/core_book3s.c in the
ebb_event_add function, and the pmu registers (mmcr0, etc) only seem to
be saved to and restored from the thread_struct through
ebb_switch_in/out. So maybe the original intent was to guard the
pmu_get/set functions with used_ebb instead?

I'm not sure about this, because I don't know if it possible for a
ptrace call to happen between ebb_event_add and the first ebb_switch_in
for a thread.

Thanks!

--
Pedro

^ permalink raw reply

* Re: [PATCH kernel 6/6] powerpc/powernv/ioda: Allocate indirect TCE levels on demand
From: David Gibson @ 2018-06-14 12:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm-ppc, kvm, Alex Williamson,
	Benjamin Herrenschmidt
In-Reply-To: <360896bd-f3a9-530c-8144-c10502dc965b@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 4449 bytes --]

On Thu, Jun 14, 2018 at 04:35:18PM +1000, Alexey Kardashevskiy wrote:
> On 12/6/18 2:17 pm, David Gibson wrote:
> > On Fri, Jun 08, 2018 at 03:46:33PM +1000, Alexey Kardashevskiy wrote:
> >> At the moment we allocate the entire TCE table, twice (hardware part and
> >> userspace translation cache). This normally works as we normally have
> >> contigous memory and the guest will map entire RAM for 64bit DMA.
> >>
> >> However if we have sparse RAM (one example is a memory device), then
> >> we will allocate TCEs which will never be used as the guest only maps
> >> actual memory for DMA. If it is a single level TCE table, there is nothing
> >> we can really do but if it a multilevel table, we can skip allocating
> >> TCEs we know we won't need.
> >>
> >> This adds ability to allocate only first level, saving memory.
> >>
> >> This changes iommu_table::free() to avoid allocating of an extra level;
> >> iommu_table::set() will do this when needed.
> >>
> >> This adds @alloc parameter to iommu_table::exchange() to tell the callback
> >> if it can allocate an extra level; the flag is set to "false" for
> >> the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
> >> H_TOO_HARD.
> >>
> >> This still requires the entire table to be counted in mm::locked_vm.
> >>
> >> To be conservative, this only does on-demand allocation when
> >> the usespace cache table is requested which is the case of VFIO.
> >>
> >> The example math for a system replicating a powernv setup with NVLink2
> >> in a guest:
> >> 16GB RAM mapped at 0x0
> >> 128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000
> >>
> >> the table to cover that all with 64K pages takes:
> >> (((0x244000000000 + 0x2000000000) >> 16)*8)>>20 = 4556MB
> >>
> >> If we allocate only necessary TCE levels, we will only need:
> >> (((0x400000000 + 0x400000000) >> 16)*8)>>20 = 4MB (plus some for indirect
> >> levels).
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>  arch/powerpc/include/asm/iommu.h              |  7 ++-
> >>  arch/powerpc/platforms/powernv/pci.h          |  6 ++-
> >>  arch/powerpc/kvm/book3s_64_vio_hv.c           |  4 +-
> >>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 69 ++++++++++++++++++++-------
> >>  arch/powerpc/platforms/powernv/pci-ioda.c     |  8 ++--
> >>  drivers/vfio/vfio_iommu_spapr_tce.c           |  2 +-
> >>  6 files changed, 69 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> >> index 4bdcf22..daa3ee5 100644
> >> --- a/arch/powerpc/include/asm/iommu.h
> >> +++ b/arch/powerpc/include/asm/iommu.h
> >> @@ -70,7 +70,7 @@ struct iommu_table_ops {
> >>  			unsigned long *hpa,
> >>  			enum dma_data_direction *direction);
> >>  
> >> -	__be64 *(*useraddrptr)(struct iommu_table *tbl, long index);
> >> +	__be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc);
> >>  #endif
> >>  	void (*clear)(struct iommu_table *tbl,
> >>  			long index, long npages);
> >> @@ -122,10 +122,13 @@ struct iommu_table {
> >>  	__be64 *it_userspace; /* userspace view of the table */
> >>  	struct iommu_table_ops *it_ops;
> >>  	struct kref    it_kref;
> >> +	int it_nid;
> >>  };
> >>  
> >> +#define IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry) \
> >> +		((tbl)->it_ops->useraddrptr((tbl), (entry), false))
> > 
> > Is real mode really the only case where you want to inhibit new
> > allocations?  I would have thought some paths would be read-only and
> > you wouldn't want to allocate, even in virtual mode.
> 
> There are paths when I do not want allocation but I can figure out that
> from dma direction flag, for example, I am cleaning up the table and I do
> not want any extra  allocation to happen there and they do happen because I
> made a mistake so I'll repost. Other than that, this @alloc flag is for
> real mode only.

Hm, ok.

Something just occurred to me: IIRC, going from the vmalloc() to the
explicit table structure was to avoid allocating pages for memory
regions that aren't there due to sparseness.  But.. won't you get that
with vmalloc() anyway, if the pages for the gaps never get
instantiated?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/6] powerpc/pkeys: fixes to pkeys
From: Florian Weimer @ 2018-06-14 12:15 UTC (permalink / raw)
  To: Ram Pai, mpe
  Cc: linuxppc-dev, dave.hansen, aneesh.kumar, bsingharora, hbabu,
	mhocko, bauerman, Ulrich.Weigand, luto, msuchanek
In-Reply-To: <1528936144-6696-1-git-send-email-linuxram@us.ibm.com>

On 06/14/2018 02:28 AM, Ram Pai wrote:
> Assortment of fixes to pkey.
> 
> Patch 1  makes pkey consumable in multithreaded applications.
> 
> Patch 2  fixes fork behavior to inherit the key attributes.
> 
> Patch 3  A off-by-one bug made one key unusable. Fixes it.
> 
> Patch 4  Execute-only key is preallocated.
> 
> Patch 5  Makes pkey-0 less special.
> 
> Patch 6  Deny by default permissions on all unallocated keys.
> 
> Passes all selftests on powerpc. Also behavior verified to be correct
> by Florian.
> 
> Changelog:
> 
> 	v2: . fixed merge conflict with upstream code.
> 	    . Add patch 6. Makes the behavior consistent
> 	      with that on x86.

(Except signal handling, but I agree with Ram that the POWER behavior is 
the correct one.)

> Ram Pai (6):
>    powerpc/pkeys: Enable all user-allocatable pkeys at init.
>    powerpc/pkeys: Save the pkey registers before fork
>    powerpc/pkeys: fix calculation of total pkeys.
>    powerpc/pkeys: Preallocate execute-only key
>    powerpc/pkeys: make protection key 0 less special
>    powerpc/pkeys: Deny read/write/execute by default

I tested the whole series with the new selftests, with the printamr.c 
program I posted earlier, and the glibc test for pkey_alloc &c.  The 
latter required some test fixes, but now passes as well.  As far as I 
can tell, everything looks good now.

Tested-By: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian

^ permalink raw reply

* Re: stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'
From: Nicholas Piggin @ 2018-06-14 11:46 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: Michael Ellerman, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <62c0b560-5211-4844-1210-6dcd9e2c5526@c-s.fr>

On Thu, 14 Jun 2018 12:51:28 +0200
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 14/06/2018 =C3=A0 12:25, Nicholas Piggin a =C3=A9crit=C2=A0:
> > On Thu, 14 Jun 2018 12:06:51 +0200
> > Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> >  =20
> >> See http://kisskb.ellerman.id.au/kisskb/buildresult/13395345/ for deta=
ils
> >>
> >> Seems like that function only exists when CONFIG_NMI_IPI is defined.
> >>
> >> Problem introduced by commit 5cc05910f26e6fd6da15f052f86f6150e4b91664
> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/co=
mmit/arch/powerpc/kernel/stacktrace.c?h=3Dnext-20180614&id=3D5cc05910f26e6f=
d6da15f052f86f6150e4b91664
> >>
> >>
> >> What do you recommend to fix that ? =20
> >=20
> > We could just do smp_call_function in the case we have no NMI_IPI? =20
>=20
> ???
>=20
> Can you detail what you have in mind ? Do smp_call_function() to call=20
> what ? Instead of the call to smp_send_safe_nmi_ipi() ?

Yes I was thinking that but now I take a better look there's places that
call it with irqs off so that might be a bit risky. I guess just making
it depend on NMI_IPI for now is the minimum fix.

Thanks,
Nick

^ permalink raw reply

* Re: powerpc: use time64_t in read_persistent_clock
From: Arnd Bergmann @ 2018-06-14 11:45 UTC (permalink / raw)
  To: Mathieu Malaterre; +Cc: Benjamin Herrenschmidt, Michael Ellerman, linuxppc-dev
In-Reply-To: <CA+7wUswNnhB3=_Z0cbeAzDrcHZr=xg_nE5uUZuPXEk2RKNaABA@mail.gmail.com>

On Wed, Jun 13, 2018 at 10:24 PM, Mathieu Malaterre <malat@debian.org> wrote:
> Arnd,
>
> In 5bfd643583b2e I can see that you changed:
>
> $ git show 5bfd643583b2e -- arch/powerpc/platforms/powermac/time.c
> [...]
>  #ifdef CONFIG_ADB_PMU
> -static unsigned long pmu_get_time(void)
> +static time64_t pmu_get_time(void)
>  {
>         struct adb_request req;
> -       unsigned int now;
> +       time64_t now;
>
>         if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
>                 return 0;
> @@ -160,10 +151,10 @@ static unsigned long pmu_get_time(void)
>                        req.reply_len);
>         now = (req.reply[0] << 24) + (req.reply[1] << 16)
>                 + (req.reply[2] << 8) + req.reply[3];
> -       return ((unsigned long)now) - RTC_OFFSET;
> +       return now - RTC_OFFSET;
>  }
> [...]
>
> As far as I can tell the old function would never return a negative
> value and rely on unsigned long roll over. Now I am seeing negative
> value being returned and it seems to break later on in the rtc
> library.
>
> Could you comment why you removed the cast to (unsigned long) in your commit ?

I'm sorry my patch caused a regression. Even with your bug report
it took me a while to see what exactly went wrong, and how I got
mixed up the integer conversion rules.

I mistakenly assume that fiven

       long long now;
       unsigned char reply[4];
       now = reply[0] << 24;

we would always end up with a positive number, but since reply[0] is
promoted to a signed integer, the right-hand side of the assignment
ends up as a negative number for the usual contents of the RTC.

Can you confirm that this patch addresses your problem?

diff --git a/arch/powerpc/platforms/powermac/time.c
b/arch/powerpc/platforms/powermac/time.c
index 7c968e46736f..a7fdcd3051f9 100644
--- a/arch/powerpc/platforms/powermac/time.c
+++ b/arch/powerpc/platforms/powermac/time.c
@@ -97,7 +97,7 @@ static time64_t cuda_get_time(void)
        if (req.reply_len != 7)
                printk(KERN_ERR "cuda_get_time: got %d byte reply\n",
                       req.reply_len);
-       now = (req.reply[3] << 24) + (req.reply[4] << 16)
+       now = (u32)(req.reply[3] << 24) + (req.reply[4] << 16)
                + (req.reply[5] << 8) + req.reply[6];
        return now - RTC_OFFSET;
 }
@@ -140,7 +140,7 @@ static time64_t pmu_get_time(void)
        if (req.reply_len != 4)
                printk(KERN_ERR "pmu_get_time: got %d byte reply from PMU\n",
                       req.reply_len);
-       now = (req.reply[0] << 24) + (req.reply[1] << 16)
+       now = (u32)(req.reply[0] << 24) + (req.reply[1] << 16)
                + (req.reply[2] << 8) + req.reply[3];
        return now - RTC_OFFSET;
 }

On a related note, we do have some freedom in how we want to
interpret values beyond year 2038/2040 when the RTC does wrap
around.

- With the original code, we converted the time into a signed
  32-bit integer based on the 1970 Unix epoch, converting RTC
  years 2038 through 2040 into Linux years 1902 through 1904,
  which doesn't seem very helpful.

- With my fix above, we would interpret the numbers as documented,
  with 2040 wrapping around to 1904.

- If we want, we could reinterpret the 1904-1970 range as later
  times between 2040 and 2106, like this:

--- a/arch/powerpc/platforms/powermac/time.c
+++ b/arch/powerpc/platforms/powermac/time.c
@@ -88,7 +88,7 @@ long __init pmac_time_init(void)
 static time64_t cuda_get_time(void)
 {
        struct adb_request req;
-       time64_t now;
+       u32 now;

        if (cuda_request(&req, NULL, 2, CUDA_PACKET, CUDA_GET_TIME) < 0)
                return 0;
@@ -132,7 +132,7 @@ static int cuda_set_rtc_time(struct rtc_time *tm)
 static time64_t pmu_get_time(void)
 {
        struct adb_request req;
-       time64_t now;
+       u32 now;

        if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
                return 0;

On the upside, this would possibly extend the life of some machines
by another 66 years, on the downside it might cause issues when
an empty RTC battery causes the initial system time to be
above the 32-bit TIME_T_MAX (instead of going back to 1904).

      Arnd

^ permalink raw reply related

* Re: stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'
From: Christophe LEROY @ 2018-06-14 10:51 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Michael Ellerman, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20180614202546.24d40e51@roar.ozlabs.ibm.com>



Le 14/06/2018 à 12:25, Nicholas Piggin a écrit :
> On Thu, 14 Jun 2018 12:06:51 +0200
> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> 
>> See http://kisskb.ellerman.id.au/kisskb/buildresult/13395345/ for details
>>
>> Seems like that function only exists when CONFIG_NMI_IPI is defined.
>>
>> Problem introduced by commit 5cc05910f26e6fd6da15f052f86f6150e4b91664
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/arch/powerpc/kernel/stacktrace.c?h=next-20180614&id=5cc05910f26e6fd6da15f052f86f6150e4b91664
>>
>>
>> What do you recommend to fix that ?
> 
> We could just do smp_call_function in the case we have no NMI_IPI?

???

Can you detail what you have in mind ? Do smp_call_function() to call 
what ? Instead of the call to smp_send_safe_nmi_ipi() ?

Thanks
Christophe

> 
> Thanks,
> Nick
> 

^ permalink raw reply

* [PATCH] powerpc/mm/hash/4k: Free hugetlb page table caches correctly.
From: Aneesh Kumar K.V @ 2018-06-14 10:31 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

With 4k page size for hugetlb we allocate hugepage directories from its on slab
cache. With patch 0c4d26802 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
we missed to free these allocated hugepd tables.

Update pgtable_free to handle hugetlb hugepd directory table.

Fixes:  0c4d26802 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h  |  1 +
 .../include/asm/book3s/64/pgtable-4k.h        | 21 +++++++++++++++++++
 .../include/asm/book3s/64/pgtable-64k.h       |  9 ++++++++
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  5 +++++
 arch/powerpc/include/asm/nohash/32/pgalloc.h  |  1 +
 arch/powerpc/include/asm/nohash/64/pgalloc.h  |  1 +
 arch/powerpc/mm/hugetlbpage.c                 |  3 ++-
 arch/powerpc/mm/pgtable-book3s64.c            | 12 +++++++++++
 8 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 6a6673907e45..e4633803fe43 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -108,6 +108,7 @@ static inline void pgtable_free(void *table, unsigned index_size)
 }
 
 #define check_pgt_cache()	do { } while (0)
+#define get_hugepd_cache_index(x)  (x)
 
 #ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
index af5f2baac80f..a069dfcac9a9 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
@@ -49,6 +49,27 @@ static inline int hugepd_ok(hugepd_t hpd)
 }
 #define is_hugepd(hpd)		(hugepd_ok(hpd))
 
+/*
+ * 16M and 16G huge page directory tables are allocated from slab cache
+ *
+ */
+#define H_16M_CACHE_INDEX (PAGE_SHIFT + H_PTE_INDEX_SIZE + H_PMD_INDEX_SIZE - 24)
+#define H_16G_CACHE_INDEX                                                      \
+	(PAGE_SHIFT + H_PTE_INDEX_SIZE + H_PMD_INDEX_SIZE + H_PUD_INDEX_SIZE - 34)
+
+static inline int get_hugepd_cache_index(int index)
+{
+	switch (index) {
+	case H_16M_CACHE_INDEX:
+		return HTLB_16M_INDEX;
+	case H_16G_CACHE_INDEX:
+		return HTLB_16G_INDEX;
+	default:
+		BUG();
+	}
+	/* should not reach */
+}
+
 #else /* !CONFIG_HUGETLB_PAGE */
 static inline int pmd_huge(pmd_t pmd) { return 0; }
 static inline int pud_huge(pud_t pud) { return 0; }
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
index fb4b3ba52339..d7ee249d6890 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
@@ -45,8 +45,17 @@ static inline int hugepd_ok(hugepd_t hpd)
 {
 	return 0;
 }
+
 #define is_hugepd(pdep)			0
 
+/*
+ * This should never get called
+ */
+static inline int get_hugepd_cache_index(int index)
+{
+	BUG();
+}
+
 #else /* !CONFIG_HUGETLB_PAGE */
 static inline int pmd_huge(pmd_t pmd) { return 0; }
 static inline int pud_huge(pud_t pud) { return 0; }
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 42fe7c2ff2df..1022f622397f 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -287,6 +287,11 @@ enum pgtable_index {
 	PMD_INDEX,
 	PUD_INDEX,
 	PGD_INDEX,
+	/*
+	 * Below are used with 4k page size and hugetlb
+	 */
+	HTLB_16M_INDEX,
+	HTLB_16G_INDEX,
 };
 
 extern unsigned long __vmalloc_start;
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 1707781d2f20..9de40eb614da 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -109,6 +109,7 @@ static inline void pgtable_free(void *table, unsigned index_size)
 }
 
 #define check_pgt_cache()	do { } while (0)
+#define get_hugepd_cache_index(x)	(x)
 
 #ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index 0e693f322cb2..e2d62d033708 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -141,6 +141,7 @@ static inline void pgtable_free(void *table, int shift)
 	}
 }
 
+#define get_hugepd_cache_index(x)	(x)
 #ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
 {
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 7c5f479c5c00..8a9a49c13865 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -337,7 +337,8 @@ static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshif
 	if (shift >= pdshift)
 		hugepd_free(tlb, hugepte);
 	else
-		pgtable_free_tlb(tlb, hugepte, pdshift - shift);
+		pgtable_free_tlb(tlb, hugepte,
+				 get_hugepd_cache_index(pdshift - shift));
 }
 
 static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index c1f4ca45c93a..468c3d83a2aa 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -409,6 +409,18 @@ static inline void pgtable_free(void *table, int index)
 	case PUD_INDEX:
 		kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), table);
 		break;
+#ifdef CONFIG_PPC_4K_PAGES
+		/* 16M hugepd directory at pud level */
+	case HTLB_16M_INDEX:
+		BUILD_BUG_ON(H_16M_CACHE_INDEX <= 0);
+		kmem_cache_free(PGT_CACHE(H_16M_CACHE_INDEX), table);
+		break;
+		/* 16G hugepd directory at the pgd level */
+	case HTLB_16G_INDEX:
+		BUILD_BUG_ON(H_16G_CACHE_INDEX <= 0);
+		kmem_cache_free(PGT_CACHE(H_16G_CACHE_INDEX), table);
+		break;
+#endif
 		/* We don't free pgd table via RCU callback */
 	default:
 		BUG();
-- 
2.17.1

^ permalink raw reply related

* Re: stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'
From: Nicholas Piggin @ 2018-06-14 10:25 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: Michael Ellerman, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <0ae99ba5-e721-dada-8d94-b223e41977e9@c-s.fr>

On Thu, 14 Jun 2018 12:06:51 +0200
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> See http://kisskb.ellerman.id.au/kisskb/buildresult/13395345/ for details
> 
> Seems like that function only exists when CONFIG_NMI_IPI is defined.
> 
> Problem introduced by commit 5cc05910f26e6fd6da15f052f86f6150e4b91664 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/arch/powerpc/kernel/stacktrace.c?h=next-20180614&id=5cc05910f26e6fd6da15f052f86f6150e4b91664
> 
> 
> What do you recommend to fix that ?

We could just do smp_call_function in the case we have no NMI_IPI?

Thanks,
Nick

^ permalink raw reply

* stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'
From: Christophe LEROY @ 2018-06-14 10:06 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev@lists.ozlabs.org, Nicholas Piggin

See http://kisskb.ellerman.id.au/kisskb/buildresult/13395345/ for details

Seems like that function only exists when CONFIG_NMI_IPI is defined.

Problem introduced by commit 5cc05910f26e6fd6da15f052f86f6150e4b91664 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/arch/powerpc/kernel/stacktrace.c?h=next-20180614&id=5cc05910f26e6fd6da15f052f86f6150e4b91664


What do you recommend to fix that ?


Christophe

^ permalink raw reply

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
From: Thomas Gleixner @ 2018-06-14  8:32 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Ricardo Neri, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu
In-Reply-To: <20180614123250.230667f6@roar.ozlabs.ibm.com>

On Thu, 14 Jun 2018, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:31:17 -0700
> > I could look into creating the library for
> > common code and relocate the hpet watchdog into arch/x86 for the hpet-
> > specific parts.
> 
> If you can investigate that approach, that would be appreciated. I hope
> I did not misunderstand you there, Thomas.

I'm not against cleanups and consolidation, quite the contrary.

But this stuff just adds new infrastructure w/o showing that it's actually
a cleanup and consolidation.

Thanks,

	tglx

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox