LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
From: Linus Torvalds @ 2018-06-14  6:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton
In-Reply-To: <20180614124931.703e5b54@roar.ozlabs.ibm.com>

On Thu, Jun 14, 2018 at 11:49 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> +#ifndef pte_free_tlb
>  #define pte_free_tlb(tlb, ptep, address)                       \
>         do {                                                    \
>                 __tlb_adjust_range(tlb, address, PAGE_SIZE);    \
>                 __pte_free_tlb(tlb, ptep, address);             \
>         } while (0)
> +#endif

Do you really want to / need to take over the whole pte_free_tlb macro?

I was hoping that you'd just replace the __tlv_adjust_range() instead.

Something like

 - replace the

        __tlb_adjust_range(tlb, address, PAGE_SIZE);

   with a "page directory" version:

        __tlb_free_directory(tlb, address, size);

 - have the default implementation for that be the old code:

        #ifndef __tlb_free_directory
          #define __tlb_free_directory(tlb,addr,size)
__tlb_adjust_range(tlb, addr, PAGE_SIZE)
        #endif

and that way architectures can now just hook into that
"__tlb_free_directory()" thing.

Hmm?

             Linus

^ permalink raw reply

* Re: [PATCH kernel 6/6] powerpc/powernv/ioda: Allocate indirect TCE levels on demand
From: Alexey Kardashevskiy @ 2018-06-14  6:35 UTC (permalink / raw)
  To: David Gibson
  Cc: linuxppc-dev, kvm-ppc, kvm, Alex Williamson,
	Benjamin Herrenschmidt
In-Reply-To: <20180612041700.GZ2737@umbus.fritz.box>

On 12/6/18 2:17 pm, David Gibson wrote:
> On Fri, Jun 08, 2018 at 03:46:33PM +1000, Alexey Kardashevskiy wrote:
>> At the moment we allocate the entire TCE table, twice (hardware part and
>> userspace translation cache). This normally works as we normally have
>> contigous memory and the guest will map entire RAM for 64bit DMA.
>>
>> However if we have sparse RAM (one example is a memory device), then
>> we will allocate TCEs which will never be used as the guest only maps
>> actual memory for DMA. If it is a single level TCE table, there is nothing
>> we can really do but if it a multilevel table, we can skip allocating
>> TCEs we know we won't need.
>>
>> This adds ability to allocate only first level, saving memory.
>>
>> This changes iommu_table::free() to avoid allocating of an extra level;
>> iommu_table::set() will do this when needed.
>>
>> This adds @alloc parameter to iommu_table::exchange() to tell the callback
>> if it can allocate an extra level; the flag is set to "false" for
>> the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
>> H_TOO_HARD.
>>
>> This still requires the entire table to be counted in mm::locked_vm.
>>
>> To be conservative, this only does on-demand allocation when
>> the usespace cache table is requested which is the case of VFIO.
>>
>> The example math for a system replicating a powernv setup with NVLink2
>> in a guest:
>> 16GB RAM mapped at 0x0
>> 128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000
>>
>> the table to cover that all with 64K pages takes:
>> (((0x244000000000 + 0x2000000000) >> 16)*8)>>20 = 4556MB
>>
>> If we allocate only necessary TCE levels, we will only need:
>> (((0x400000000 + 0x400000000) >> 16)*8)>>20 = 4MB (plus some for indirect
>> levels).
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  arch/powerpc/include/asm/iommu.h              |  7 ++-
>>  arch/powerpc/platforms/powernv/pci.h          |  6 ++-
>>  arch/powerpc/kvm/book3s_64_vio_hv.c           |  4 +-
>>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 69 ++++++++++++++++++++-------
>>  arch/powerpc/platforms/powernv/pci-ioda.c     |  8 ++--
>>  drivers/vfio/vfio_iommu_spapr_tce.c           |  2 +-
>>  6 files changed, 69 insertions(+), 27 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
>> index 4bdcf22..daa3ee5 100644
>> --- a/arch/powerpc/include/asm/iommu.h
>> +++ b/arch/powerpc/include/asm/iommu.h
>> @@ -70,7 +70,7 @@ struct iommu_table_ops {
>>  			unsigned long *hpa,
>>  			enum dma_data_direction *direction);
>>  
>> -	__be64 *(*useraddrptr)(struct iommu_table *tbl, long index);
>> +	__be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc);
>>  #endif
>>  	void (*clear)(struct iommu_table *tbl,
>>  			long index, long npages);
>> @@ -122,10 +122,13 @@ struct iommu_table {
>>  	__be64 *it_userspace; /* userspace view of the table */
>>  	struct iommu_table_ops *it_ops;
>>  	struct kref    it_kref;
>> +	int it_nid;
>>  };
>>  
>> +#define IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry) \
>> +		((tbl)->it_ops->useraddrptr((tbl), (entry), false))
> 
> Is real mode really the only case where you want to inhibit new
> allocations?  I would have thought some paths would be read-only and
> you wouldn't want to allocate, even in virtual mode.


There are paths when I do not want allocation but I can figure out that
from dma direction flag, for example, I am cleaning up the table and I do
not want any extra  allocation to happen there and they do happen because I
made a mistake so I'll repost. Other than that, this @alloc flag is for
real mode only.


> 
>>  #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \
>> -		((tbl)->it_ops->useraddrptr((tbl), (entry)))
>> +		((tbl)->it_ops->useraddrptr((tbl), (entry), true))
>>  
>>  /* Pure 2^n version of get_order */
>>  static inline __attribute_const__
>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>> index 5e02408..1fa5590 100644
>> --- a/arch/powerpc/platforms/powernv/pci.h
>> +++ b/arch/powerpc/platforms/powernv/pci.h
>> @@ -267,8 +267,10 @@ extern int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
>>  		unsigned long attrs);
>>  extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
>>  extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
>> -		unsigned long *hpa, enum dma_data_direction *direction);
>> -extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index);
>> +		unsigned long *hpa, enum dma_data_direction *direction,
>> +		bool alloc);
>> +extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index,
>> +		bool alloc);
>>  extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
>>  
>>  extern long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index db0490c..05b4865 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -200,7 +200,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
>>  {
>>  	struct mm_iommu_table_group_mem_t *mem = NULL;
>>  	const unsigned long pgsize = 1ULL << tbl->it_page_shift;
>> -	__be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
>> +	__be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry);
>>  
>>  	if (!pua)
>>  		/* it_userspace allocation might be delayed */
>> @@ -264,7 +264,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>>  {
>>  	long ret;
>>  	unsigned long hpa = 0;
>> -	__be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
>> +	__be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry);
>>  	struct mm_iommu_table_group_mem_t *mem;
>>  
>>  	if (!pua)
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>> index 36c2eb0..a7debfb 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>> @@ -48,7 +48,7 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
>>  	return addr;
>>  }
>>  
>> -static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx)
>> +static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
>>  {
>>  	__be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base;
>>  	int  level = tbl->it_indirect_levels;
>> @@ -57,7 +57,20 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx)
>>  
>>  	while (level) {
>>  		int n = (idx & mask) >> (level * shift);
>> -		unsigned long tce = be64_to_cpu(tmp[n]);
>> +		unsigned long tce;
>> +
>> +		if (tmp[n] == 0) {
>> +			__be64 *tmp2;
>> +
>> +			if (!alloc)
>> +				return NULL;
>> +
>> +			tmp2 = pnv_alloc_tce_level(tbl->it_nid,
>> +					ilog2(tbl->it_level_size) + 3);
> 
> What if the allocation fails?


Fair question, this needs to be handled with at least H_TOO_HARD and real
mode and H_HARDWARE in virtual, I'll fix.




-- 
Alexey

^ permalink raw reply

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
From: Nicholas Piggin @ 2018-06-14  6:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton
In-Reply-To: <CA+55aFwP-6QZ0u2ZYCjTebP6OmkeTpbUHyLT0ih-57TbvJBPxg@mail.gmail.com>

On Thu, 14 Jun 2018 15:15:47 +0900
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Jun 14, 2018 at 11:49 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > +#ifndef pte_free_tlb
> >  #define pte_free_tlb(tlb, ptep, address)                       \
> >         do {                                                    \
> >                 __tlb_adjust_range(tlb, address, PAGE_SIZE);    \
> >                 __pte_free_tlb(tlb, ptep, address);             \
> >         } while (0)
> > +#endif  
> 
> Do you really want to / need to take over the whole pte_free_tlb macro?
> 
> I was hoping that you'd just replace the __tlv_adjust_range() instead.
> 
> Something like
> 
>  - replace the
> 
>         __tlb_adjust_range(tlb, address, PAGE_SIZE);
> 
>    with a "page directory" version:
> 
>         __tlb_free_directory(tlb, address, size);
> 
>  - have the default implementation for that be the old code:
> 
>         #ifndef __tlb_free_directory
>           #define __tlb_free_directory(tlb,addr,size)
> __tlb_adjust_range(tlb, addr, PAGE_SIZE)
>         #endif
> 
> and that way architectures can now just hook into that
> "__tlb_free_directory()" thing.
> 
> Hmm?

Isn't it just easier and less indirection for the arch to just take
over the pte_free_tlb instead? 

I don't see what the __tlb_free_directory gets you except having to
follow another macro -- if the arch has something special they want
to do there, just do it in their __pte_free_tlb and call it
pte_free_tlb instead.

Thanks,
Nick

> 
>              Linus

^ permalink raw reply

* Re: [PATCH v2] powerpc/64s/radix: Fix MADV_[FREE|DONTNEED] TLB flush miss problem with THP
From: Nicholas Piggin @ 2018-06-14  6:54 UTC (permalink / raw)
  Cc: linuxppc-dev
In-Reply-To: <201806141310.bQbDyW9s%fengguang.wu@intel.com>

On Thu, 14 Jun 2018 13:51:40 +0800
kbuild test robot <lkp@intel.com> wrote:

> Hi Nicholas,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on next-20180613]
> [cannot apply to v4.17]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64s-radix-Fix-MADV_-FREE-DONTNEED-TLB-flush-miss-problem-with-THP/20180614-114728
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: microblaze-nommu_defconfig (attached as .config)
> compiler: microblaze-linux-gcc (GCC) 8.1.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=8.1.0 make.cross ARCH=microblaze 

Ooops, absent mindedly edited a header without thinking it's not
powerpc specific. Will fix it somehow.

^ permalink raw reply

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
From: Thomas Gleixner @ 2018-06-14  8:32 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Ricardo Neri, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu
In-Reply-To: <20180614123250.230667f6@roar.ozlabs.ibm.com>

On Thu, 14 Jun 2018, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:31:17 -0700
> > I could look into creating the library for
> > common code and relocate the hpet watchdog into arch/x86 for the hpet-
> > specific parts.
> 
> If you can investigate that approach, that would be appreciated. I hope
> I did not misunderstand you there, Thomas.

I'm not against cleanups and consolidation, quite the contrary.

But this stuff just adds new infrastructure w/o showing that it's actually
a cleanup and consolidation.

Thanks,

	tglx

^ permalink raw reply

* stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'
From: Christophe LEROY @ 2018-06-14 10:06 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev@lists.ozlabs.org, Nicholas Piggin

See http://kisskb.ellerman.id.au/kisskb/buildresult/13395345/ for details

Seems like that function only exists when CONFIG_NMI_IPI is defined.

Problem introduced by commit 5cc05910f26e6fd6da15f052f86f6150e4b91664 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/arch/powerpc/kernel/stacktrace.c?h=next-20180614&id=5cc05910f26e6fd6da15f052f86f6150e4b91664


What do you recommend to fix that ?


Christophe

^ permalink raw reply

* Re: stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'
From: Nicholas Piggin @ 2018-06-14 10:25 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: Michael Ellerman, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <0ae99ba5-e721-dada-8d94-b223e41977e9@c-s.fr>

On Thu, 14 Jun 2018 12:06:51 +0200
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> See http://kisskb.ellerman.id.au/kisskb/buildresult/13395345/ for details
> 
> Seems like that function only exists when CONFIG_NMI_IPI is defined.
> 
> Problem introduced by commit 5cc05910f26e6fd6da15f052f86f6150e4b91664 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/arch/powerpc/kernel/stacktrace.c?h=next-20180614&id=5cc05910f26e6fd6da15f052f86f6150e4b91664
> 
> 
> What do you recommend to fix that ?

We could just do smp_call_function in the case we have no NMI_IPI?

Thanks,
Nick

^ permalink raw reply

* [PATCH] powerpc/mm/hash/4k: Free hugetlb page table caches correctly.
From: Aneesh Kumar K.V @ 2018-06-14 10:31 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

With 4k page size for hugetlb we allocate hugepage directories from its on slab
cache. With patch 0c4d26802 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
we missed to free these allocated hugepd tables.

Update pgtable_free to handle hugetlb hugepd directory table.

Fixes:  0c4d26802 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h  |  1 +
 .../include/asm/book3s/64/pgtable-4k.h        | 21 +++++++++++++++++++
 .../include/asm/book3s/64/pgtable-64k.h       |  9 ++++++++
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  5 +++++
 arch/powerpc/include/asm/nohash/32/pgalloc.h  |  1 +
 arch/powerpc/include/asm/nohash/64/pgalloc.h  |  1 +
 arch/powerpc/mm/hugetlbpage.c                 |  3 ++-
 arch/powerpc/mm/pgtable-book3s64.c            | 12 +++++++++++
 8 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 6a6673907e45..e4633803fe43 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -108,6 +108,7 @@ static inline void pgtable_free(void *table, unsigned index_size)
 }
 
 #define check_pgt_cache()	do { } while (0)
+#define get_hugepd_cache_index(x)  (x)
 
 #ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
index af5f2baac80f..a069dfcac9a9 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
@@ -49,6 +49,27 @@ static inline int hugepd_ok(hugepd_t hpd)
 }
 #define is_hugepd(hpd)		(hugepd_ok(hpd))
 
+/*
+ * 16M and 16G huge page directory tables are allocated from slab cache
+ *
+ */
+#define H_16M_CACHE_INDEX (PAGE_SHIFT + H_PTE_INDEX_SIZE + H_PMD_INDEX_SIZE - 24)
+#define H_16G_CACHE_INDEX                                                      \
+	(PAGE_SHIFT + H_PTE_INDEX_SIZE + H_PMD_INDEX_SIZE + H_PUD_INDEX_SIZE - 34)
+
+static inline int get_hugepd_cache_index(int index)
+{
+	switch (index) {
+	case H_16M_CACHE_INDEX:
+		return HTLB_16M_INDEX;
+	case H_16G_CACHE_INDEX:
+		return HTLB_16G_INDEX;
+	default:
+		BUG();
+	}
+	/* should not reach */
+}
+
 #else /* !CONFIG_HUGETLB_PAGE */
 static inline int pmd_huge(pmd_t pmd) { return 0; }
 static inline int pud_huge(pud_t pud) { return 0; }
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
index fb4b3ba52339..d7ee249d6890 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
@@ -45,8 +45,17 @@ static inline int hugepd_ok(hugepd_t hpd)
 {
 	return 0;
 }
+
 #define is_hugepd(pdep)			0
 
+/*
+ * This should never get called
+ */
+static inline int get_hugepd_cache_index(int index)
+{
+	BUG();
+}
+
 #else /* !CONFIG_HUGETLB_PAGE */
 static inline int pmd_huge(pmd_t pmd) { return 0; }
 static inline int pud_huge(pud_t pud) { return 0; }
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 42fe7c2ff2df..1022f622397f 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -287,6 +287,11 @@ enum pgtable_index {
 	PMD_INDEX,
 	PUD_INDEX,
 	PGD_INDEX,
+	/*
+	 * Below are used with 4k page size and hugetlb
+	 */
+	HTLB_16M_INDEX,
+	HTLB_16G_INDEX,
 };
 
 extern unsigned long __vmalloc_start;
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 1707781d2f20..9de40eb614da 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -109,6 +109,7 @@ static inline void pgtable_free(void *table, unsigned index_size)
 }
 
 #define check_pgt_cache()	do { } while (0)
+#define get_hugepd_cache_index(x)	(x)
 
 #ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index 0e693f322cb2..e2d62d033708 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -141,6 +141,7 @@ static inline void pgtable_free(void *table, int shift)
 	}
 }
 
+#define get_hugepd_cache_index(x)	(x)
 #ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
 {
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 7c5f479c5c00..8a9a49c13865 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -337,7 +337,8 @@ static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshif
 	if (shift >= pdshift)
 		hugepd_free(tlb, hugepte);
 	else
-		pgtable_free_tlb(tlb, hugepte, pdshift - shift);
+		pgtable_free_tlb(tlb, hugepte,
+				 get_hugepd_cache_index(pdshift - shift));
 }
 
 static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index c1f4ca45c93a..468c3d83a2aa 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -409,6 +409,18 @@ static inline void pgtable_free(void *table, int index)
 	case PUD_INDEX:
 		kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), table);
 		break;
+#ifdef CONFIG_PPC_4K_PAGES
+		/* 16M hugepd directory at pud level */
+	case HTLB_16M_INDEX:
+		BUILD_BUG_ON(H_16M_CACHE_INDEX <= 0);
+		kmem_cache_free(PGT_CACHE(H_16M_CACHE_INDEX), table);
+		break;
+		/* 16G hugepd directory at the pgd level */
+	case HTLB_16G_INDEX:
+		BUILD_BUG_ON(H_16G_CACHE_INDEX <= 0);
+		kmem_cache_free(PGT_CACHE(H_16G_CACHE_INDEX), table);
+		break;
+#endif
 		/* We don't free pgd table via RCU callback */
 	default:
 		BUG();
-- 
2.17.1

^ permalink raw reply related

* Re: stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'
From: Christophe LEROY @ 2018-06-14 10:51 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Michael Ellerman, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20180614202546.24d40e51@roar.ozlabs.ibm.com>



Le 14/06/2018 à 12:25, Nicholas Piggin a écrit :
> On Thu, 14 Jun 2018 12:06:51 +0200
> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> 
>> See http://kisskb.ellerman.id.au/kisskb/buildresult/13395345/ for details
>>
>> Seems like that function only exists when CONFIG_NMI_IPI is defined.
>>
>> Problem introduced by commit 5cc05910f26e6fd6da15f052f86f6150e4b91664
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/arch/powerpc/kernel/stacktrace.c?h=next-20180614&id=5cc05910f26e6fd6da15f052f86f6150e4b91664
>>
>>
>> What do you recommend to fix that ?
> 
> We could just do smp_call_function in the case we have no NMI_IPI?

???

Can you detail what you have in mind ? Do smp_call_function() to call 
what ? Instead of the call to smp_send_safe_nmi_ipi() ?

Thanks
Christophe

> 
> Thanks,
> Nick
> 

^ permalink raw reply

* Re: powerpc: use time64_t in read_persistent_clock
From: Arnd Bergmann @ 2018-06-14 11:45 UTC (permalink / raw)
  To: Mathieu Malaterre; +Cc: Benjamin Herrenschmidt, Michael Ellerman, linuxppc-dev
In-Reply-To: <CA+7wUswNnhB3=_Z0cbeAzDrcHZr=xg_nE5uUZuPXEk2RKNaABA@mail.gmail.com>

On Wed, Jun 13, 2018 at 10:24 PM, Mathieu Malaterre <malat@debian.org> wrote:
> Arnd,
>
> In 5bfd643583b2e I can see that you changed:
>
> $ git show 5bfd643583b2e -- arch/powerpc/platforms/powermac/time.c
> [...]
>  #ifdef CONFIG_ADB_PMU
> -static unsigned long pmu_get_time(void)
> +static time64_t pmu_get_time(void)
>  {
>         struct adb_request req;
> -       unsigned int now;
> +       time64_t now;
>
>         if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
>                 return 0;
> @@ -160,10 +151,10 @@ static unsigned long pmu_get_time(void)
>                        req.reply_len);
>         now = (req.reply[0] << 24) + (req.reply[1] << 16)
>                 + (req.reply[2] << 8) + req.reply[3];
> -       return ((unsigned long)now) - RTC_OFFSET;
> +       return now - RTC_OFFSET;
>  }
> [...]
>
> As far as I can tell the old function would never return a negative
> value and rely on unsigned long roll over. Now I am seeing negative
> value being returned and it seems to break later on in the rtc
> library.
>
> Could you comment why you removed the cast to (unsigned long) in your commit ?

I'm sorry my patch caused a regression. Even with your bug report
it took me a while to see what exactly went wrong, and how I got
mixed up the integer conversion rules.

I mistakenly assume that fiven

       long long now;
       unsigned char reply[4];
       now = reply[0] << 24;

we would always end up with a positive number, but since reply[0] is
promoted to a signed integer, the right-hand side of the assignment
ends up as a negative number for the usual contents of the RTC.

Can you confirm that this patch addresses your problem?

diff --git a/arch/powerpc/platforms/powermac/time.c
b/arch/powerpc/platforms/powermac/time.c
index 7c968e46736f..a7fdcd3051f9 100644
--- a/arch/powerpc/platforms/powermac/time.c
+++ b/arch/powerpc/platforms/powermac/time.c
@@ -97,7 +97,7 @@ static time64_t cuda_get_time(void)
        if (req.reply_len != 7)
                printk(KERN_ERR "cuda_get_time: got %d byte reply\n",
                       req.reply_len);
-       now = (req.reply[3] << 24) + (req.reply[4] << 16)
+       now = (u32)(req.reply[3] << 24) + (req.reply[4] << 16)
                + (req.reply[5] << 8) + req.reply[6];
        return now - RTC_OFFSET;
 }
@@ -140,7 +140,7 @@ static time64_t pmu_get_time(void)
        if (req.reply_len != 4)
                printk(KERN_ERR "pmu_get_time: got %d byte reply from PMU\n",
                       req.reply_len);
-       now = (req.reply[0] << 24) + (req.reply[1] << 16)
+       now = (u32)(req.reply[0] << 24) + (req.reply[1] << 16)
                + (req.reply[2] << 8) + req.reply[3];
        return now - RTC_OFFSET;
 }

On a related note, we do have some freedom in how we want to
interpret values beyond year 2038/2040 when the RTC does wrap
around.

- With the original code, we converted the time into a signed
  32-bit integer based on the 1970 Unix epoch, converting RTC
  years 2038 through 2040 into Linux years 1902 through 1904,
  which doesn't seem very helpful.

- With my fix above, we would interpret the numbers as documented,
  with 2040 wrapping around to 1904.

- If we want, we could reinterpret the 1904-1970 range as later
  times between 2040 and 2106, like this:

--- a/arch/powerpc/platforms/powermac/time.c
+++ b/arch/powerpc/platforms/powermac/time.c
@@ -88,7 +88,7 @@ long __init pmac_time_init(void)
 static time64_t cuda_get_time(void)
 {
        struct adb_request req;
-       time64_t now;
+       u32 now;

        if (cuda_request(&req, NULL, 2, CUDA_PACKET, CUDA_GET_TIME) < 0)
                return 0;
@@ -132,7 +132,7 @@ static int cuda_set_rtc_time(struct rtc_time *tm)
 static time64_t pmu_get_time(void)
 {
        struct adb_request req;
-       time64_t now;
+       u32 now;

        if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
                return 0;

On the upside, this would possibly extend the life of some machines
by another 66 years, on the downside it might cause issues when
an empty RTC battery causes the initial system time to be
above the 32-bit TIME_T_MAX (instead of going back to 1904).

      Arnd

^ permalink raw reply related

* Re: stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'
From: Nicholas Piggin @ 2018-06-14 11:46 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: Michael Ellerman, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <62c0b560-5211-4844-1210-6dcd9e2c5526@c-s.fr>

On Thu, 14 Jun 2018 12:51:28 +0200
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 14/06/2018 =C3=A0 12:25, Nicholas Piggin a =C3=A9crit=C2=A0:
> > On Thu, 14 Jun 2018 12:06:51 +0200
> > Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> >  =20
> >> See http://kisskb.ellerman.id.au/kisskb/buildresult/13395345/ for deta=
ils
> >>
> >> Seems like that function only exists when CONFIG_NMI_IPI is defined.
> >>
> >> Problem introduced by commit 5cc05910f26e6fd6da15f052f86f6150e4b91664
> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/co=
mmit/arch/powerpc/kernel/stacktrace.c?h=3Dnext-20180614&id=3D5cc05910f26e6f=
d6da15f052f86f6150e4b91664
> >>
> >>
> >> What do you recommend to fix that ? =20
> >=20
> > We could just do smp_call_function in the case we have no NMI_IPI? =20
>=20
> ???
>=20
> Can you detail what you have in mind ? Do smp_call_function() to call=20
> what ? Instead of the call to smp_send_safe_nmi_ipi() ?

Yes I was thinking that but now I take a better look there's places that
call it with irqs off so that might be a bit risky. I guess just making
it depend on NMI_IPI for now is the minimum fix.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v2 0/6] powerpc/pkeys: fixes to pkeys
From: Florian Weimer @ 2018-06-14 12:15 UTC (permalink / raw)
  To: Ram Pai, mpe
  Cc: linuxppc-dev, dave.hansen, aneesh.kumar, bsingharora, hbabu,
	mhocko, bauerman, Ulrich.Weigand, luto, msuchanek
In-Reply-To: <1528936144-6696-1-git-send-email-linuxram@us.ibm.com>

On 06/14/2018 02:28 AM, Ram Pai wrote:
> Assortment of fixes to pkey.
> 
> Patch 1  makes pkey consumable in multithreaded applications.
> 
> Patch 2  fixes fork behavior to inherit the key attributes.
> 
> Patch 3  A off-by-one bug made one key unusable. Fixes it.
> 
> Patch 4  Execute-only key is preallocated.
> 
> Patch 5  Makes pkey-0 less special.
> 
> Patch 6  Deny by default permissions on all unallocated keys.
> 
> Passes all selftests on powerpc. Also behavior verified to be correct
> by Florian.
> 
> Changelog:
> 
> 	v2: . fixed merge conflict with upstream code.
> 	    . Add patch 6. Makes the behavior consistent
> 	      with that on x86.

(Except signal handling, but I agree with Ram that the POWER behavior is 
the correct one.)

> Ram Pai (6):
>    powerpc/pkeys: Enable all user-allocatable pkeys at init.
>    powerpc/pkeys: Save the pkey registers before fork
>    powerpc/pkeys: fix calculation of total pkeys.
>    powerpc/pkeys: Preallocate execute-only key
>    powerpc/pkeys: make protection key 0 less special
>    powerpc/pkeys: Deny read/write/execute by default

I tested the whole series with the new selftests, with the printamr.c 
program I posted earlier, and the glibc test for pkey_alloc &c.  The 
latter required some test fixes, but now passes as well.  As far as I 
can tell, everything looks good now.

Tested-By: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian

^ permalink raw reply

* Re: [PATCH kernel 6/6] powerpc/powernv/ioda: Allocate indirect TCE levels on demand
From: David Gibson @ 2018-06-14 12:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm-ppc, kvm, Alex Williamson,
	Benjamin Herrenschmidt
In-Reply-To: <360896bd-f3a9-530c-8144-c10502dc965b@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 4449 bytes --]

On Thu, Jun 14, 2018 at 04:35:18PM +1000, Alexey Kardashevskiy wrote:
> On 12/6/18 2:17 pm, David Gibson wrote:
> > On Fri, Jun 08, 2018 at 03:46:33PM +1000, Alexey Kardashevskiy wrote:
> >> At the moment we allocate the entire TCE table, twice (hardware part and
> >> userspace translation cache). This normally works as we normally have
> >> contigous memory and the guest will map entire RAM for 64bit DMA.
> >>
> >> However if we have sparse RAM (one example is a memory device), then
> >> we will allocate TCEs which will never be used as the guest only maps
> >> actual memory for DMA. If it is a single level TCE table, there is nothing
> >> we can really do but if it a multilevel table, we can skip allocating
> >> TCEs we know we won't need.
> >>
> >> This adds ability to allocate only first level, saving memory.
> >>
> >> This changes iommu_table::free() to avoid allocating of an extra level;
> >> iommu_table::set() will do this when needed.
> >>
> >> This adds @alloc parameter to iommu_table::exchange() to tell the callback
> >> if it can allocate an extra level; the flag is set to "false" for
> >> the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
> >> H_TOO_HARD.
> >>
> >> This still requires the entire table to be counted in mm::locked_vm.
> >>
> >> To be conservative, this only does on-demand allocation when
> >> the usespace cache table is requested which is the case of VFIO.
> >>
> >> The example math for a system replicating a powernv setup with NVLink2
> >> in a guest:
> >> 16GB RAM mapped at 0x0
> >> 128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000
> >>
> >> the table to cover that all with 64K pages takes:
> >> (((0x244000000000 + 0x2000000000) >> 16)*8)>>20 = 4556MB
> >>
> >> If we allocate only necessary TCE levels, we will only need:
> >> (((0x400000000 + 0x400000000) >> 16)*8)>>20 = 4MB (plus some for indirect
> >> levels).
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>  arch/powerpc/include/asm/iommu.h              |  7 ++-
> >>  arch/powerpc/platforms/powernv/pci.h          |  6 ++-
> >>  arch/powerpc/kvm/book3s_64_vio_hv.c           |  4 +-
> >>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 69 ++++++++++++++++++++-------
> >>  arch/powerpc/platforms/powernv/pci-ioda.c     |  8 ++--
> >>  drivers/vfio/vfio_iommu_spapr_tce.c           |  2 +-
> >>  6 files changed, 69 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> >> index 4bdcf22..daa3ee5 100644
> >> --- a/arch/powerpc/include/asm/iommu.h
> >> +++ b/arch/powerpc/include/asm/iommu.h
> >> @@ -70,7 +70,7 @@ struct iommu_table_ops {
> >>  			unsigned long *hpa,
> >>  			enum dma_data_direction *direction);
> >>  
> >> -	__be64 *(*useraddrptr)(struct iommu_table *tbl, long index);
> >> +	__be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc);
> >>  #endif
> >>  	void (*clear)(struct iommu_table *tbl,
> >>  			long index, long npages);
> >> @@ -122,10 +122,13 @@ struct iommu_table {
> >>  	__be64 *it_userspace; /* userspace view of the table */
> >>  	struct iommu_table_ops *it_ops;
> >>  	struct kref    it_kref;
> >> +	int it_nid;
> >>  };
> >>  
> >> +#define IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry) \
> >> +		((tbl)->it_ops->useraddrptr((tbl), (entry), false))
> > 
> > Is real mode really the only case where you want to inhibit new
> > allocations?  I would have thought some paths would be read-only and
> > you wouldn't want to allocate, even in virtual mode.
> 
> There are paths when I do not want allocation but I can figure out that
> from dma direction flag, for example, I am cleaning up the table and I do
> not want any extra  allocation to happen there and they do happen because I
> made a mistake so I'll repost. Other than that, this @alloc flag is for
> real mode only.

Hm, ok.

Something just occurred to me: IIRC, going from the vmalloc() to the
explicit table structure was to avoid allocating pages for memory
regions that aren't there due to sparseness.  But.. won't you get that
with vmalloc() anyway, if the pages for the gaps never get
instantiated?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [RFC PATCH 1/5] powerpc: Fix inverted active predicate for setting the EBB regset
From: Pedro Franco de Carvalho @ 2018-06-14 13:52 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, Anshuman Khandual
In-Reply-To: <87fu1rtot3.fsf@concordia.ellerman.id.au>

Michael Ellerman <mpe@ellerman.id.au> writes:

> Hi Pedro,
>
> Thanks for looking into this, how did you detect this? Do you have a
> test case?

Hello,

I'm working on allowing these registers to be accessed through GDB,
which is where I saw this happen. Then I used a small program that
traces another, then reads and tries to write to the regset, but not in
linux selftest format.

> I don't think Anshuman wrote it this way on purpose, but added him to Cc
> in case he remembers.
>
> But I don't think this fix is necessarily right. If we are setting the
> EBB regs via ptrace then it doesn't matter if they were previously in
> use or not, we should just set them. What *does* matter is that at the
> end of the function we set used_ebb to true, because otherwise the
> values we have set will not actually be used when the process is
> rescheduled.

Now I'm not sure why the ptrace calls for the ebb regset are guarded
with used_ebb in the first place. The save/restore_sprs functions in
kernel/process.c seem to handle the ebb registers regardless of this
flag, and it seems to be possible for user programs to read and write to
these registers even without having first created a perf event.

The flag only appears to be used in perf/core_book3s.c in the
ebb_event_add function, and the pmu registers (mmcr0, etc) only seem to
be saved to and restored from the thread_struct through
ebb_switch_in/out. So maybe the original intent was to guard the
pmu_get/set functions with used_ebb instead?

I'm not sure about this, because I don't know if it possible for a
ptrace call to happen between ebb_event_add and the first ebb_switch_in
for a thread.

Thanks!

--
Pedro

^ permalink raw reply

* Re: [RFC PATCH 2/5] powerpc: Flush checkpointed gpr state for 32-bit processes in ptrace
From: Pedro Franco de Carvalho @ 2018-06-14 13:55 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
In-Reply-To: <87d0wvtolp.fsf@concordia.ellerman.id.au>

Michael Ellerman <mpe@ellerman.id.au> writes:

> Can you add a helper that does it and use that helper in these two
> functions. Then if you can send me another patch that converts all the
> other uses to use the new helper.

Yes, I'll do this.

Thanks!

--
Pedro

^ permalink raw reply

* [PATCH] powerpc/e500mc: Set assembler machine type to e500mc
From: Michael Jeanson @ 2018-06-14 15:27 UTC (permalink / raw)
  Cc: Michael Jeanson, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Kumar Gala, Vakul Garg, Scott Wood,
	Mathieu Desnoyers, linuxppc-dev, linux-kernel, stable

In binutils 2.26 a new opcode for the "wait" instruction was added for the
POWER9 and has precedence over the one specific to the e500mc. Commit
ebf714ff3756 ("powerpc/e500mc: Add support for the wait instruction in
e500_idle") uses this instruction specifically on the e500mc to work around
an erratum.

This results in an invalid instruction in idle_e500 when we build for the
e500mc on bintutils >= 2.26 with the default assembler machine type.

Since multiplatform between e500 and non-e500 is not supported, set the
assembler machine type globaly when CONFIG_PPC_E500MC=y.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Kumar Gala <galak@kernel.crashing.org>
CC: Vakul Garg <vakul.garg@nxp.com>
CC: Scott Wood <swood@redhat.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-kernel@vger.kernel.org
CC: stable@vger.kernel.org
---
 arch/powerpc/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 95813df90801..bb2523b4bd8f 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -251,6 +251,7 @@ cpu-as-$(CONFIG_4xx)		+= -Wa,-m405
 cpu-as-$(CONFIG_ALTIVEC)	+= $(call as-option,-Wa$(comma)-maltivec)
 cpu-as-$(CONFIG_E200)		+= -Wa,-me200
 cpu-as-$(CONFIG_PPC_BOOK3S_64)	+= -Wa,-mpower4
+cpu-as-$(CONFIG_PPC_E500MC)	+= $(call as-option,-Wa$(comma)-me500mc)
 
 KBUILD_AFLAGS += $(cpu-as-y)
 KBUILD_CFLAGS += $(cpu-as-y)
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] misc: ocxl: Change return type for fault handler
From: Frederic Barrat @ 2018-06-14 16:06 UTC (permalink / raw)
  To: Souptick Joarder, willy, fbarrat, andrew.donnellan, arnd, gregkh
  Cc: linuxppc-dev, linux-kernel, sabyasachi.linux, brajeswar.linux
In-Reply-To: <20180611202904.GA25538@jordon-HP-15-Notebook-PC>



Le 11/06/2018 à 22:29, Souptick Joarder a écrit :
> Use new return type vm_fault_t for fault handler. For
> now, this is just documenting that the function returns
> a VM_FAULT value rather than an errno. Once all instances
> are converted, vm_fault_t will become a distinct type.
> 
> Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")
> 
> There is an existing bug when vm_insert_pfn() can return
> ENOMEM which was ignored and VM_FAULT_NOPAGE returned as
> default. The new inline vmf_insert_pfn() has removed
> this inefficiency by returning correct vm_fault_ type.
> 
> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
> ---

Thanks!

Tested and
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>


>   drivers/misc/ocxl/context.c | 22 +++++++++++-----------
>   drivers/misc/ocxl/sysfs.c   |  5 ++---
>   2 files changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
> index 909e880..98daf91 100644
> --- a/drivers/misc/ocxl/context.c
> +++ b/drivers/misc/ocxl/context.c
> @@ -83,7 +83,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
>   	return rc;
>   }
> 
> -static int map_afu_irq(struct vm_area_struct *vma, unsigned long address,
> +static vm_fault_t map_afu_irq(struct vm_area_struct *vma, unsigned long address,
>   		u64 offset, struct ocxl_context *ctx)
>   {
>   	u64 trigger_addr;
> @@ -92,15 +92,15 @@ static int map_afu_irq(struct vm_area_struct *vma, unsigned long address,
>   	if (!trigger_addr)
>   		return VM_FAULT_SIGBUS;
> 
> -	vm_insert_pfn(vma, address, trigger_addr >> PAGE_SHIFT);
> -	return VM_FAULT_NOPAGE;
> +	return vmf_insert_pfn(vma, address, trigger_addr >> PAGE_SHIFT);
>   }
> 
> -static int map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
> +static vm_fault_t map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
>   		u64 offset, struct ocxl_context *ctx)
>   {
>   	u64 pp_mmio_addr;
>   	int pasid_off;
> +	vm_fault_t ret;
> 
>   	if (offset >= ctx->afu->config.pp_mmio_stride)
>   		return VM_FAULT_SIGBUS;
> @@ -118,27 +118,27 @@ static int map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
>   		pasid_off * ctx->afu->config.pp_mmio_stride +
>   		offset;
> 
> -	vm_insert_pfn(vma, address, pp_mmio_addr >> PAGE_SHIFT);
> +	ret = vmf_insert_pfn(vma, address, pp_mmio_addr >> PAGE_SHIFT);
>   	mutex_unlock(&ctx->status_mutex);
> -	return VM_FAULT_NOPAGE;
> +	return ret;
>   }
> 
> -static int ocxl_mmap_fault(struct vm_fault *vmf)
> +static vm_fault_t ocxl_mmap_fault(struct vm_fault *vmf)
>   {
>   	struct vm_area_struct *vma = vmf->vma;
>   	struct ocxl_context *ctx = vma->vm_file->private_data;
>   	u64 offset;
> -	int rc;
> +	vm_fault_t ret;
> 
>   	offset = vmf->pgoff << PAGE_SHIFT;
>   	pr_debug("%s: pasid %d address 0x%lx offset 0x%llx\n", __func__,
>   		ctx->pasid, vmf->address, offset);
> 
>   	if (offset < ctx->afu->irq_base_offset)
> -		rc = map_pp_mmio(vma, vmf->address, offset, ctx);
> +		ret = map_pp_mmio(vma, vmf->address, offset, ctx);
>   	else
> -		rc = map_afu_irq(vma, vmf->address, offset, ctx);
> -	return rc;
> +		ret = map_afu_irq(vma, vmf->address, offset, ctx);
> +	return ret;
>   }
> 
>   static const struct vm_operations_struct ocxl_vmops = {
> diff --git a/drivers/misc/ocxl/sysfs.c b/drivers/misc/ocxl/sysfs.c
> index d9753a1..0ab1fd1 100644
> --- a/drivers/misc/ocxl/sysfs.c
> +++ b/drivers/misc/ocxl/sysfs.c
> @@ -64,7 +64,7 @@ static ssize_t global_mmio_read(struct file *filp, struct kobject *kobj,
>   	return count;
>   }
> 
> -static int global_mmio_fault(struct vm_fault *vmf)
> +static vm_fault_t global_mmio_fault(struct vm_fault *vmf)
>   {
>   	struct vm_area_struct *vma = vmf->vma;
>   	struct ocxl_afu *afu = vma->vm_private_data;
> @@ -75,8 +75,7 @@ static int global_mmio_fault(struct vm_fault *vmf)
> 
>   	offset = vmf->pgoff;
>   	offset += (afu->global_mmio_start >> PAGE_SHIFT);
> -	vm_insert_pfn(vma, vmf->address, offset);
> -	return VM_FAULT_NOPAGE;
> +	return vmf_insert_pfn(vma, vmf->address, offset);
>   }
> 
>   static const struct vm_operations_struct global_mmio_vmops = {
> 

^ permalink raw reply

* Re: powerpc: use time64_t in read_persistent_clock
From: Mathieu Malaterre @ 2018-06-14 19:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Benjamin Herrenschmidt, Michael Ellerman, linuxppc-dev,
	Meelis Roos
In-Reply-To: <CAK8P3a3khJexs5FMzvn2+askoT8T1r-0eK6VAcZV3mQ9FPrc3w@mail.gmail.com>

On Thu, Jun 14, 2018 at 1:46 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Jun 13, 2018 at 10:24 PM, Mathieu Malaterre <malat@debian.org> wrote:
> > Arnd,
> >
> > In 5bfd643583b2e I can see that you changed:
> >
> > $ git show 5bfd643583b2e -- arch/powerpc/platforms/powermac/time.c
> > [...]
> >  #ifdef CONFIG_ADB_PMU
> > -static unsigned long pmu_get_time(void)
> > +static time64_t pmu_get_time(void)
> >  {
> >         struct adb_request req;
> > -       unsigned int now;
> > +       time64_t now;
> >
> >         if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
> >                 return 0;
> > @@ -160,10 +151,10 @@ static unsigned long pmu_get_time(void)
> >                        req.reply_len);
> >         now = (req.reply[0] << 24) + (req.reply[1] << 16)
> >                 + (req.reply[2] << 8) + req.reply[3];
> > -       return ((unsigned long)now) - RTC_OFFSET;
> > +       return now - RTC_OFFSET;
> >  }
> > [...]
> >
> > As far as I can tell the old function would never return a negative
> > value and rely on unsigned long roll over. Now I am seeing negative
> > value being returned and it seems to break later on in the rtc
> > library.
> >
> > Could you comment why you removed the cast to (unsigned long) in your commit ?
>
> I'm sorry my patch caused a regression. Even with your bug report
> it took me a while to see what exactly went wrong, and how I got
> mixed up the integer conversion rules.
>
> I mistakenly assume that fiven
>
>        long long now;
>        unsigned char reply[4];
>        now = reply[0] << 24;
>
> we would always end up with a positive number, but since reply[0] is
> promoted to a signed integer, the right-hand side of the assignment
> ends up as a negative number for the usual contents of the RTC.

Ah right. My diagnosis was invalid.

> Can you confirm that this patch addresses your problem?

Yes !

Before:
[    5.986710] rtc-generic rtc-generic: hctosys: unable to read the
hardware clock

After:
[    5.579611] rtc-generic rtc-generic: setting system clock to
2018-06-14 18:57:00 UTC (1529002620)

So for the above:

Tested-by: Mathieu Malaterre <malat@debian.org>

> diff --git a/arch/powerpc/platforms/powermac/time.c
> b/arch/powerpc/platforms/powermac/time.c
> index 7c968e46736f..a7fdcd3051f9 100644
> --- a/arch/powerpc/platforms/powermac/time.c
> +++ b/arch/powerpc/platforms/powermac/time.c
> @@ -97,7 +97,7 @@ static time64_t cuda_get_time(void)
>         if (req.reply_len != 7)
>                 printk(KERN_ERR "cuda_get_time: got %d byte reply\n",
>                        req.reply_len);
> -       now = (req.reply[3] << 24) + (req.reply[4] << 16)
> +       now = (u32)(req.reply[3] << 24) + (req.reply[4] << 16)
>                 + (req.reply[5] << 8) + req.reply[6];
>         return now - RTC_OFFSET;
>  }
> @@ -140,7 +140,7 @@ static time64_t pmu_get_time(void)
>         if (req.reply_len != 4)
>                 printk(KERN_ERR "pmu_get_time: got %d byte reply from PMU\n",
>                        req.reply_len);
> -       now = (req.reply[0] << 24) + (req.reply[1] << 16)
> +       now = (u32)(req.reply[0] << 24) + (req.reply[1] << 16)
>                 + (req.reply[2] << 8) + req.reply[3];
>         return now - RTC_OFFSET;
>  }
>
> On a related note, we do have some freedom in how we want to
> interpret values beyond year 2038/2040 when the RTC does wrap
> around.
>
> - With the original code, we converted the time into a signed
>   32-bit integer based on the 1970 Unix epoch, converting RTC
>   years 2038 through 2040 into Linux years 1902 through 1904,
>   which doesn't seem very helpful.
>
> - With my fix above, we would interpret the numbers as documented,
>   with 2040 wrapping around to 1904.
>
> - If we want, we could reinterpret the 1904-1970 range as later
>   times between 2040 and 2106, like this:
>
> --- a/arch/powerpc/platforms/powermac/time.c
> +++ b/arch/powerpc/platforms/powermac/time.c
> @@ -88,7 +88,7 @@ long __init pmac_time_init(void)
>  static time64_t cuda_get_time(void)
>  {
>         struct adb_request req;
> -       time64_t now;
> +       u32 now;
>
>         if (cuda_request(&req, NULL, 2, CUDA_PACKET, CUDA_GET_TIME) < 0)
>                 return 0;
> @@ -132,7 +132,7 @@ static int cuda_set_rtc_time(struct rtc_time *tm)
>  static time64_t pmu_get_time(void)
>  {
>         struct adb_request req;
> -       time64_t now;
> +       u32 now;
>
>         if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
>                 return 0;
>
> On the upside, this would possibly extend the life of some machines
> by another 66 years, on the downside it might cause issues when
> an empty RTC battery causes the initial system time to be
> above the 32-bit TIME_T_MAX (instead of going back to 1904).

I would submit the first patch and add the above as comment. I doubt
anyone would still have a running system by then.

Thanks

^ permalink raw reply

* [PATCH] mm: convert return type of handle_mm_fault() caller to vm_fault_t
From: Souptick Joarder @ 2018-06-14 19:06 UTC (permalink / raw)
  To: willy, rth, tony.luck, mattst88, vgupta, linux, catalin.marinas,
	will.deacon, rkuo, geert, monstr, jhogan, lftan, jonas, jejb,
	benh, palmer, ysato, davem, richard, gxt, tglx, hpa,
	alexander.levin, akpm
  Cc: linux-alpha, linux-kernel, linux-snps-arc, linux-arm-kernel,
	linux-hexagon, linux-ia64, linux-m68k, linux-mips, nios2-dev,
	openrisc, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, user-mode-linux-devel, user-mode-linux-user,
	linux-xtensa, iommu, linux-mm, brajeswar.linux, sabyasachi.linux

Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.

Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

In this patch all the caller of handle_mm_fault()
are changed to return vm_fault_t type.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
---
 arch/alpha/mm/fault.c         |  3 ++-
 arch/arc/mm/fault.c           |  4 +++-
 arch/arm/mm/fault.c           |  7 ++++---
 arch/arm64/mm/fault.c         |  6 +++---
 arch/hexagon/mm/vm_fault.c    |  2 +-
 arch/ia64/mm/fault.c          |  2 +-
 arch/m68k/mm/fault.c          |  4 ++--
 arch/microblaze/mm/fault.c    |  2 +-
 arch/mips/mm/fault.c          |  2 +-
 arch/nds32/mm/fault.c         |  2 +-
 arch/nios2/mm/fault.c         |  2 +-
 arch/openrisc/mm/fault.c      |  2 +-
 arch/parisc/mm/fault.c        |  2 +-
 arch/powerpc/mm/copro_fault.c |  2 +-
 arch/powerpc/mm/fault.c       |  7 ++++---
 arch/riscv/mm/fault.c         |  3 ++-
 arch/s390/mm/fault.c          | 13 ++++++++-----
 arch/sh/mm/fault.c            |  4 ++--
 arch/sparc/mm/fault_32.c      |  3 ++-
 arch/sparc/mm/fault_64.c      |  3 ++-
 arch/um/kernel/trap.c         |  2 +-
 arch/unicore32/mm/fault.c     |  9 +++++----
 arch/x86/mm/fault.c           |  5 +++--
 arch/xtensa/mm/fault.c        |  2 +-
 drivers/iommu/amd_iommu_v2.c  |  2 +-
 drivers/iommu/intel-svm.c     |  4 +++-
 mm/hmm.c                      |  8 ++++----
 mm/ksm.c                      |  2 +-
 28 files changed, 62 insertions(+), 47 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index cd3c572..2a979ee 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -87,7 +87,8 @@
 	struct vm_area_struct * vma;
 	struct mm_struct *mm = current->mm;
 	const struct exception_table_entry *fixup;
-	int fault, si_code = SEGV_MAPERR;
+	int si_code = SEGV_MAPERR;
+	vm_fault_t fault;
 	siginfo_t info;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index a0b7bd6..3a18d33 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -15,6 +15,7 @@
 #include <linux/uaccess.h>
 #include <linux/kdebug.h>
 #include <linux/perf_event.h>
+#include <linux/mm_types.h>
 #include <asm/pgalloc.h>
 #include <asm/mmu.h>
 
@@ -66,7 +67,8 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 	struct task_struct *tsk = current;
 	struct mm_struct *mm = tsk->mm;
 	siginfo_t info;
-	int fault, ret;
+	int ret;
+	vm_fault_t fault;
 	int write = regs->ecr_cause & ECR_C_PROTV_STORE;  /* ST/EX */
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index b75eada..758abcb 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -219,12 +219,12 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
 	return vma->vm_flags & mask ? false : true;
 }
 
-static int __kprobes
+static vm_fault_t __kprobes
 __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
 		unsigned int flags, struct task_struct *tsk)
 {
 	struct vm_area_struct *vma;
-	int fault;
+	vm_fault_t fault;
 
 	vma = find_vma(mm, addr);
 	fault = VM_FAULT_BADMAP;
@@ -259,7 +259,8 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
 {
 	struct task_struct *tsk;
 	struct mm_struct *mm;
-	int fault, sig, code;
+	int sig, code;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	if (notify_page_fault(regs, fsr))
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 2af3dd8..8da263b 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -371,12 +371,12 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
 #define VM_FAULT_BADMAP		0x010000
 #define VM_FAULT_BADACCESS	0x020000
 
-static int __do_page_fault(struct mm_struct *mm, unsigned long addr,
+static vm_fault_t __do_page_fault(struct mm_struct *mm, unsigned long addr,
 			   unsigned int mm_flags, unsigned long vm_flags,
 			   struct task_struct *tsk)
 {
 	struct vm_area_struct *vma;
-	int fault;
+	vm_fault_t fault;
 
 	vma = find_vma(mm, addr);
 	fault = VM_FAULT_BADMAP;
@@ -419,7 +419,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	struct siginfo si;
-	int fault, major = 0;
+	vm_fault_t fault, major = 0;
 	unsigned long vm_flags = VM_READ | VM_WRITE;
 	unsigned int mm_flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
index 3eec33c..5d1de6c 100644
--- a/arch/hexagon/mm/vm_fault.c
+++ b/arch/hexagon/mm/vm_fault.c
@@ -52,7 +52,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
 	struct mm_struct *mm = current->mm;
 	siginfo_t info;
 	int si_code = SEGV_MAPERR;
-	int fault;
+	vm_fault_t fault;
 	const struct exception_table_entry *fixup;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index dfdc152..e085d89 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -87,7 +87,7 @@ static inline int notify_page_fault(struct pt_regs *regs, int trap)
 	struct mm_struct *mm = current->mm;
 	struct siginfo si;
 	unsigned long mask;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	mask = ((((isr >> IA64_ISR_X_BIT) & 1UL) << VM_EXEC_BIT)
diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index 03253c4..1fc7ac0 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -73,7 +73,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct * vma;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	pr_debug("do page fault:\nregs->sr=%#x, regs->pc=%#lx, address=%#lx, %ld, %p\n",
@@ -139,7 +139,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
 	 */
 
 	fault = handle_mm_fault(vma, address, flags);
-	pr_debug("handle_mm_fault returns %d\n", fault);
+	pr_debug("handle_mm_fault returns %x\n", fault);
 
 	if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
 		return 0;
diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
index f91b30f..92a8682 100644
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -91,7 +91,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
 	siginfo_t info;
 	int code = SEGV_MAPERR;
 	int is_write = error_code & ESR_S;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	regs->ear = address;
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index 4f8f5bf..0bc5030 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -43,7 +43,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
 	struct mm_struct *mm = tsk->mm;
 	const int field = sizeof(unsigned long) * 2;
 	siginfo_t info;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10);
diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c
index 3a246fb..96796d3 100644
--- a/arch/nds32/mm/fault.c
+++ b/arch/nds32/mm/fault.c
@@ -73,7 +73,7 @@ void do_page_fault(unsigned long entry, unsigned long addr,
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	siginfo_t info;
-	int fault;
+	vm_fault_t fault;
 	unsigned int mask = VM_READ | VM_WRITE | VM_EXEC;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index b804dd0..24fd84c 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -47,7 +47,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
 	struct task_struct *tsk = current;
 	struct mm_struct *mm = tsk->mm;
 	int code = SEGV_MAPERR;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	cause >>= 2;
diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index d0021df..21e8f16 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -53,7 +53,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	siginfo_t info;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	tsk = current;
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index e247edb..ff9e634 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -262,7 +262,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	unsigned long acc_type;
-	int fault = 0;
+	vm_fault_t fault = 0;
 	unsigned int flags;
 
 	if (faulthandler_disabled())
diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index 7d0945b..c8da352 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -34,7 +34,7 @@
  * to handle fortunately.
  */
 int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
-		unsigned long dsisr, unsigned *flt)
+		unsigned long dsisr, vm_fault_t *flt)
 {
 	struct vm_area_struct *vma;
 	unsigned long is_write;
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c01d627..17cce1b 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -159,7 +159,7 @@ static noinline int bad_access(struct pt_regs *regs, unsigned long address)
 }
 
 static int do_sigbus(struct pt_regs *regs, unsigned long address,
-		     unsigned int fault)
+		     vm_fault_t fault)
 {
 	siginfo_t info;
 	unsigned int lsb = 0;
@@ -189,7 +189,8 @@ static int do_sigbus(struct pt_regs *regs, unsigned long address,
 	return 0;
 }
 
-static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
+static int mm_fault_error(struct pt_regs *regs, unsigned long addr,
+				vm_fault_t fault)
 {
 	/*
 	 * Kernel page fault interrupted by SIGKILL. We have no reason to
@@ -402,7 +403,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
  	int is_exec = TRAP(regs) == 0x400;
 	int is_user = user_mode(regs);
 	int is_write = page_fault_is_write(error_code);
-	int fault, major = 0;
+	vm_fault_t fault, major = 0;
 	bool store_update_sp = false;
 
 	if (notify_page_fault(regs))
diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 148c98c..88401d5 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -41,7 +41,8 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
 	struct mm_struct *mm;
 	unsigned long addr, cause;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
-	int fault, code = SEGV_MAPERR;
+	int code = SEGV_MAPERR;
+	vm_fault_t fault;
 
 	cause = regs->scause;
 	addr = regs->sbadaddr;
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 93faeca..8ea0855 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -350,7 +350,8 @@ static noinline int signal_return(struct pt_regs *regs)
 	return -EACCES;
 }
 
-static noinline void do_fault_error(struct pt_regs *regs, int access, int fault)
+static noinline void do_fault_error(struct pt_regs *regs, int access,
+					vm_fault_t fault)
 {
 	int si_code;
 
@@ -410,7 +411,7 @@ static noinline void do_fault_error(struct pt_regs *regs, int access, int fault)
  *   11       Page translation     ->  Not present       (nullification)
  *   3b       Region third trans.  ->  Not present       (nullification)
  */
-static inline int do_exception(struct pt_regs *regs, int access)
+static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 {
 	struct gmap *gmap;
 	struct task_struct *tsk;
@@ -420,7 +421,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
 	unsigned long trans_exc_code;
 	unsigned long address;
 	unsigned int flags;
-	int fault;
+	vm_fault_t fault;
 
 	tsk = current;
 	/*
@@ -571,7 +572,8 @@ static inline int do_exception(struct pt_regs *regs, int access)
 void do_protection_exception(struct pt_regs *regs)
 {
 	unsigned long trans_exc_code;
-	int access, fault;
+	int access;
+	vm_fault_t fault;
 
 	trans_exc_code = regs->int_parm_long;
 	/*
@@ -606,7 +608,8 @@ void do_protection_exception(struct pt_regs *regs)
 
 void do_dat_exception(struct pt_regs *regs)
 {
-	int access, fault;
+	int access;
+	vm_fault_t fault;
 
 	access = VM_READ | VM_EXEC | VM_WRITE;
 	fault = do_exception(regs, access);
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index 6fd1bf7..474bf14 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -320,7 +320,7 @@ static noinline int vmalloc_fault(unsigned long address)
 
 static noinline int
 mm_fault_error(struct pt_regs *regs, unsigned long error_code,
-	       unsigned long address, unsigned int fault)
+	       unsigned long address, vm_fault_t fault)
 {
 	/*
 	 * Pagefault was interrupted by SIGKILL. We have no reason to
@@ -403,7 +403,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	struct vm_area_struct * vma;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	tsk = current;
diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index a8103a8..1a44a4e 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -174,7 +174,8 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
 	unsigned int fixup;
 	unsigned long g2;
 	int from_user = !(regs->psr & PSR_PS);
-	int fault, code;
+	int code;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	if (text_fault)
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 41363f4..2078bfe 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -284,7 +284,8 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	unsigned int insn = 0;
-	int si_code, fault_code, fault;
+	int si_code, fault_code;
+	vm_fault_t fault;
 	unsigned long address, mm_rss;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index b2b02df..0afcd09 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -72,7 +72,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
 	}
 
 	do {
-		int fault;
+		vm_fault_t fault;
 
 		fault = handle_mm_fault(vma, address, flags);
 
diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index bbefcc4..2982140 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -167,11 +167,11 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
 	return vma->vm_flags & mask ? false : true;
 }
 
-static int __do_pf(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
-		unsigned int flags, struct task_struct *tsk)
+static vm_fault_t __do_pf(struct mm_struct *mm, unsigned long addr,
+		unsigned int fsr, unsigned int flags, struct task_struct *tsk)
 {
 	struct vm_area_struct *vma;
-	int fault;
+	vm_fault_t fault;
 
 	vma = find_vma(mm, addr);
 	fault = VM_FAULT_BADMAP;
@@ -208,7 +208,8 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 {
 	struct task_struct *tsk;
 	struct mm_struct *mm;
-	int fault, sig, code;
+	int sig, code;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	tsk = current;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 73bd8c9..5171d60 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -16,6 +16,7 @@
 #include <linux/prefetch.h>		/* prefetchw			*/
 #include <linux/context_tracking.h>	/* exception_enter(), ...	*/
 #include <linux/uaccess.h>		/* faulthandler_disabled()	*/
+#include <linux/mm_types.h>
 
 #include <asm/cpufeature.h>		/* boot_cpu_has, ...		*/
 #include <asm/traps.h>			/* dotraplinkage, ...		*/
@@ -1004,7 +1005,7 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
 
 static noinline void
 mm_fault_error(struct pt_regs *regs, unsigned long error_code,
-	       unsigned long address, u32 *pkey, unsigned int fault)
+	       unsigned long address, u32 *pkey, vm_fault_t fault)
 {
 	if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) {
 		no_context(regs, error_code, address, 0, 0);
@@ -1218,7 +1219,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
 	struct mm_struct *mm;
-	int fault, major = 0;
+	vm_fault_t fault, major = 0;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 	u32 pkey;
 
diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index 8b9b6f4..203fade 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -42,7 +42,7 @@ void do_page_fault(struct pt_regs *regs)
 	siginfo_t info;
 
 	int is_write, is_exec;
-	int fault;
+	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	info.si_code = SEGV_MAPERR;
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 1d0b53a0..58da65d 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -508,7 +508,7 @@ static void do_fault(struct work_struct *work)
 {
 	struct fault *fault = container_of(work, struct fault, work);
 	struct vm_area_struct *vma;
-	int ret = VM_FAULT_ERROR;
+	vm_fault_t ret = VM_FAULT_ERROR;
 	unsigned int flags = 0;
 	struct mm_struct *mm;
 	u64 address;
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index e8cd984..75189c0 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -24,6 +24,7 @@
 #include <linux/pci-ats.h>
 #include <linux/dmar.h>
 #include <linux/interrupt.h>
+#include <linux/mm_types.h>
 #include <asm/page.h>
 
 #define PASID_ENTRY_P		BIT_ULL(0)
@@ -594,7 +595,8 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 		struct vm_area_struct *vma;
 		struct page_req_dsc *req;
 		struct qi_desc resp;
-		int ret, result;
+		int result;
+		vm_fault_t ret;
 		u64 address;
 
 		handled = 1;
diff --git a/mm/hmm.c b/mm/hmm.c
index 486dc39..d7919e5 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -308,14 +308,14 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
 	struct vm_area_struct *vma = walk->vma;
-	int r;
+	vm_fault_t ret;
 
 	flags |= hmm_vma_walk->block ? 0 : FAULT_FLAG_ALLOW_RETRY;
 	flags |= write_fault ? FAULT_FLAG_WRITE : 0;
-	r = handle_mm_fault(vma, addr, flags);
-	if (r & VM_FAULT_RETRY)
+	ret = handle_mm_fault(vma, addr, flags);
+	if (ret & VM_FAULT_RETRY)
 		return -EBUSY;
-	if (r & VM_FAULT_ERROR) {
+	if (ret & VM_FAULT_ERROR) {
 		*pfn = range->values[HMM_PFN_ERROR];
 		return -EFAULT;
 	}
diff --git a/mm/ksm.c b/mm/ksm.c
index e3cbf9a..cb4e6ed 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -451,7 +451,7 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
 static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
 {
 	struct page *page;
-	int ret = 0;
+	vm_fault_t ret = 0;
 
 	do {
 		cond_resched();
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v13 00/24] selftests, powerpc, x86 : Memory Protection Keys
From: Florian Weimer @ 2018-06-14 20:19 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, dave.hansen,
	mhocko, bauerman, msuchanek, aneesh.kumar
In-Reply-To: <1528937115-10132-1-git-send-email-linuxram@us.ibm.com>

On 06/14/2018 02:44 AM, Ram Pai wrote:
> Test
> ----
> Verified for correctness on powerpc. Need help verifying on x86.
> Compiles on x86.

It breaks make in tools/testing/selftests/x86:

make: *** No rule to make target `protection_keys.c', needed by 
`/home/linux/tools/testing/selftests/x86/protection_keys_64'.  Stop.

The generic implementation no longer builds 32-bit binaries.  Is this 
the intent?

It's possible to build 32-bit binaries with “make CC='gcc -m32'”, so 
perhaps this is good enough?

But with that, I get a warning:

protection_keys.c: In function ‘dump_mem’:
protection_keys.c:172:3: warning: format ‘%lx’ expects argument of type 
‘long unsigned int’, but argument 4 has type ‘uint64_t’ [-Wformat=]
    dprintf1("dump[%03d][@%p]: %016lx\n", i, ptr, *ptr);
    ^

I suppose you could use %016llx and add a cast to unsigned long long to 
fix this.

Anyway, both the 32-bit and 64-bit tests fail here:

assert() at protection_keys.c::943 test_nr: 12 iteration: 1
running abort_hooks()...

I've yet checked what causes this.  It's with the kernel headers from 
4.17, but with other userspace headers based on glibc 2.17.  I hope to 
look into this some more before the weekend, but I eventually have to 
return the test machine to the pool.

Thanks,
Florian

^ permalink raw reply

* Re: [PATCH] mm: convert return type of handle_mm_fault() caller to vm_fault_t
From: kbuild test robot @ 2018-06-14 21:31 UTC (permalink / raw)
  To: Souptick Joarder
  Cc: kbuild-all, willy, rth, tony.luck, mattst88, vgupta, linux,
	catalin.marinas, will.deacon, rkuo, geert, monstr, jhogan, lftan,
	jonas, jejb, benh, palmer, ysato, davem, richard, gxt, tglx, hpa,
	alexander.levin, akpm, linux-alpha, linux-kernel, linux-snps-arc,
	linux-arm-kernel, linux-hexagon, linux-ia64, linux-m68k,
	linux-mips, nios2-dev, openrisc, linux-parisc, linuxppc-dev,
	linux-riscv, linux-s390, linux-sh, sparclinux,
	user-mode-linux-devel, user-mode-linux-user, linux-xtensa, iommu,
	linux-mm, brajeswar.linux, sabyasachi.linux
In-Reply-To: <20180614190629.GA18576@jordon-HP-15-Notebook-PC>

[-- Attachment #1: Type: text/plain, Size: 10985 bytes --]

Hi Souptick,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.17]
[cannot apply to linus/master powerpc/next sparc-next/master next-20180614]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Souptick-Joarder/mm-convert-return-type-of-handle_mm_fault-caller-to-vm_fault_t/20180615-030636
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

>> arch/powerpc/mm/copro_fault.c:36:5: error: conflicting types for 'copro_handle_mm_fault'
    int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
        ^~~~~~~~~~~~~~~~~~~~~
   In file included from arch/powerpc/mm/copro_fault.c:27:0:
   arch/powerpc/include/asm/copro.h:18:5: note: previous declaration of 'copro_handle_mm_fault' was here
    int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
        ^~~~~~~~~~~~~~~~~~~~~
   In file included from include/linux/linkage.h:7:0,
                    from include/linux/kernel.h:7,
                    from include/asm-generic/bug.h:18,
                    from arch/powerpc/include/asm/bug.h:128,
                    from include/linux/bug.h:5,
                    from arch/powerpc/include/asm/mmu.h:126,
                    from arch/powerpc/include/asm/lppaca.h:36,
                    from arch/powerpc/include/asm/paca.h:21,
                    from arch/powerpc/include/asm/current.h:16,
                    from include/linux/sched.h:12,
                    from arch/powerpc/mm/copro_fault.c:23:
   arch/powerpc/mm/copro_fault.c:101:19: error: conflicting types for 'copro_handle_mm_fault'
    EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
                      ^
   include/linux/export.h:65:21: note: in definition of macro '___EXPORT_SYMBOL'
     extern typeof(sym) sym;      \
                        ^~~
   arch/powerpc/mm/copro_fault.c:101:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
    EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
    ^~~~~~~~~~~~~~~~~
   In file included from arch/powerpc/mm/copro_fault.c:27:0:
   arch/powerpc/include/asm/copro.h:18:5: note: previous declaration of 'copro_handle_mm_fault' was here
    int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
        ^~~~~~~~~~~~~~~~~~~~~

vim +/copro_handle_mm_fault +36 arch/powerpc/mm/copro_fault.c

7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   30  
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   31  /*
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   32   * This ought to be kept in sync with the powerpc specific do_page_fault
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   33   * function. Currently, there are a few corner cases that we haven't had
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   34   * to handle fortunately.
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   35   */
e83d01697 arch/powerpc/mm/copro_fault.c           Ian Munsie         2014-10-08  @36  int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
7c71e54a4 arch/powerpc/mm/copro_fault.c           Souptick Joarder   2018-06-15   37  		unsigned long dsisr, vm_fault_t *flt)
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   38  {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   39  	struct vm_area_struct *vma;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   40  	unsigned long is_write;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   41  	int ret;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   42  
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   43  	if (mm == NULL)
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   44  		return -EFAULT;
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   45  
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   46  	if (mm->pgd == NULL)
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   47  		return -EFAULT;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   48  
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   49  	down_read(&mm->mmap_sem);
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   50  	ret = -EFAULT;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   51  	vma = find_vma(mm, ea);
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   52  	if (!vma)
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   53  		goto out_unlock;
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   54  
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   55  	if (ea < vma->vm_start) {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   56  		if (!(vma->vm_flags & VM_GROWSDOWN))
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   57  			goto out_unlock;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   58  		if (expand_stack(vma, ea))
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   59  			goto out_unlock;
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   60  	}
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   61  
e83d01697 arch/powerpc/mm/copro_fault.c           Ian Munsie         2014-10-08   62  	is_write = dsisr & DSISR_ISSTORE;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   63  	if (is_write) {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   64  		if (!(vma->vm_flags & VM_WRITE))
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   65  			goto out_unlock;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   66  	} else {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   67  		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   68  			goto out_unlock;
842915f56 arch/powerpc/mm/copro_fault.c           Mel Gorman         2015-02-12   69  		/*
18061c17c arch/powerpc/mm/copro_fault.c           Aneesh Kumar K.V   2017-01-30   70  		 * PROT_NONE is covered by the VMA check above.
18061c17c arch/powerpc/mm/copro_fault.c           Aneesh Kumar K.V   2017-01-30   71  		 * and hash should get a NOHPTE fault instead of
18061c17c arch/powerpc/mm/copro_fault.c           Aneesh Kumar K.V   2017-01-30   72  		 * a PROTFAULT in case fixup is needed for things
18061c17c arch/powerpc/mm/copro_fault.c           Aneesh Kumar K.V   2017-01-30   73  		 * like autonuma.
842915f56 arch/powerpc/mm/copro_fault.c           Mel Gorman         2015-02-12   74  		 */
18061c17c arch/powerpc/mm/copro_fault.c           Aneesh Kumar K.V   2017-01-30   75  		if (!radix_enabled())
842915f56 arch/powerpc/mm/copro_fault.c           Mel Gorman         2015-02-12   76  			WARN_ON_ONCE(dsisr & DSISR_PROTFAULT);
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   77  	}
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   78  
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   79  	ret = 0;
dcddffd41 arch/powerpc/mm/copro_fault.c           Kirill A. Shutemov 2016-07-26   80  	*flt = handle_mm_fault(vma, ea, is_write ? FAULT_FLAG_WRITE : 0);
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   81  	if (unlikely(*flt & VM_FAULT_ERROR)) {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   82  		if (*flt & VM_FAULT_OOM) {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   83  			ret = -ENOMEM;
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   84  			goto out_unlock;
33692f275 arch/powerpc/mm/copro_fault.c           Linus Torvalds     2015-01-29   85  		} else if (*flt & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) {
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   86  			ret = -EFAULT;
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   87  			goto out_unlock;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   88  		}
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   89  		BUG();
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   90  	}
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   91  
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   92  	if (*flt & VM_FAULT_MAJOR)
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   93  		current->maj_flt++;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   94  	else
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   95  		current->min_flt++;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   96  
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   97  out_unlock:
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20   98  	up_read(&mm->mmap_sem);
60ee03194 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2009-02-17   99  	return ret;
7cd58e438 arch/powerpc/platforms/cell/spu_fault.c Jeremy Kerr        2007-12-20  100  }
e83d01697 arch/powerpc/mm/copro_fault.c           Ian Munsie         2014-10-08  101  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
73d16a6e0 arch/powerpc/mm/copro_fault.c           Ian Munsie         2014-10-08  102  

:::::: The code at line 36 was first introduced by commit
:::::: e83d01697583d8610d1d62279758c2a881e3396f powerpc/cell: Move spu_handle_mm_fault() out of cell platform

:::::: TO: Ian Munsie <imunsie@au1.ibm.com>
:::::: CC: Michael Ellerman <mpe@ellerman.id.au>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23364 bytes --]

^ permalink raw reply

* Re: [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly
From: Segher Boessenkool @ 2018-06-14 21:51 UTC (permalink / raw)
  To: Christophe LEROY
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	wei.guo.simon, linux-kernel, linuxppc-dev
In-Reply-To: <baaf4fa9-ff0b-0bcb-8662-dafb794caab4@c-s.fr>

On Tue, Jun 12, 2018 at 07:01:59PM +0200, Christophe LEROY wrote:
> 
> 
> Le 12/06/2018 à 16:53, Segher Boessenkool a écrit :
> >On Tue, Jun 12, 2018 at 09:14:53AM +0000, Christophe Leroy wrote:
> >>---
> >>Not tested on PPC64.
> >
> >It won't be acceptable until that happens.  It also is likely quite bad
> >performance on all 64-bit CPUs from the last fifteen years or so.  Or you
> >did nothing to prove otherwise, at least.
> 
> Will it be as bad as the generic implementation which does it byte per 
> byte ?

Probably not.  But how is it for short inputs, etc.?

The main point is that it needs actual testing _for correctness_.

Btw, GCC 7 and later can expand many memcmp as builtins on PowerPC (just
like memset and memcpy etc.), creating better code, without function call.


Segher

^ permalink raw reply

* Re: [PATCH v13 00/24] selftests, powerpc, x86 : Memory Protection Keys
From: Ram Pai @ 2018-06-15  0:58 UTC (permalink / raw)
  To: Florian Weimer
  Cc: shuahkh, linux-kselftest, mpe, linuxppc-dev, linux-mm, x86,
	linux-arch, mingo, dave.hansen, mhocko, bauerman, msuchanek,
	aneesh.kumar
In-Reply-To: <c5c119b0-f5ca-4ddc-43c0-a6b597173973@redhat.com>

On Thu, Jun 14, 2018 at 10:19:11PM +0200, Florian Weimer wrote:
> On 06/14/2018 02:44 AM, Ram Pai wrote:
> >Test
> >----
> >Verified for correctness on powerpc. Need help verifying on x86.
> >Compiles on x86.
> 
> It breaks make in tools/testing/selftests/x86:
> 
> make: *** No rule to make target `protection_keys.c', needed by
> `/home/linux/tools/testing/selftests/x86/protection_keys_64'.  Stop.

Ah.. it has to be taken out from the Makefile of
/home/linux/tools/testing/selftests/x86/

The sources have been moved to /home/linux/tools/testing/selftests/mm/

> 
> The generic implementation no longer builds 32-bit binaries.  Is
> this the intent?

No. But building it 32-bit after moving it to a the new directory 
needs some special code in the Makefile. 

> 
> It's possible to build 32-bit binaries with “make CC='gcc -m32'”, so
> perhaps this is good enough?

Dave Hansen did mention it, but he did not complain too much. So I kept
quite.

> 
> But with that, I get a warning:
> 
> protection_keys.c: In function ‘dump_mem’:
> protection_keys.c:172:3: warning: format ‘%lx’ expects argument of
> type ‘long unsigned int’, but argument 4 has type ‘uint64_t’
> [-Wformat=]
>    dprintf1("dump[%03d][@%p]: %016lx\n", i, ptr, *ptr);
>    ^
> 
> I suppose you could use %016llx and add a cast to unsigned long long
> to fix this.

yes.

> 
> Anyway, both the 32-bit and 64-bit tests fail here:
> 
> assert() at protection_keys.c::943 test_nr: 12 iteration: 1
> running abort_hooks()...
> 
> I've yet checked what causes this.  It's with the kernel headers
> from 4.17, but with other userspace headers based on glibc 2.17.  I
> hope to look into this some more before the weekend, but I
> eventually have to return the test machine to the pool.

I wish I could get a x86 machine which could do memory keys. Had a AWS
instance, but struggled to boot my kernel. Can't get to the console...
gave up.  If someone can give me a ready-made machine with support for
memkeys, I can quickly fix all the outstanding x86 issues.  But if
someone can just fix it for me, ....  ;)

RP

^ permalink raw reply

* [PATCH v3] powerpc/64s/radix: Fix MADV_[FREE|DONTNEED] TLB flush miss problem with THP
From: Nicholas Piggin @ 2018-06-15  1:38 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

The patch 99baac21e4 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
problem") added a force flush mode to the mmu_gather flush, which
unconditionally flushes the entire address range being invalidated
(even if actual ptes only covered a smaller range), to solve a problem
with concurrent threads invalidating the same PTEs causing them to
miss TLBs that need flushing.

This does not work with powerpc that invalidates mmu_gather batches
according to page size. Have powerpc flush all possible page sizes in
the range if it encounters this concurrency condition.

Patch 4647706ebe ("mm: always flush VMA ranges affected by
zap_page_range") does add a TLB flush for all page sizes on powerpc for
the zap_page_range case, but that is to be removed and replaced with
the mmu_gather flush to avoid redundant flushing. It is also thought to
not cover other obscure race conditions:

https://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com

Hash does not have a problem because it invalidates TLBs inside the
page table locks.

Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
Since v2:
- Fix compile breakage.

 arch/powerpc/mm/tlb-radix.c | 96 +++++++++++++++++++++++++++++--------
 1 file changed, 75 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 67a6e86d3e7e..a734e486664d 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -689,22 +689,17 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
 static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
 static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
 
-void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
-		     unsigned long end)
+static inline void __radix__flush_tlb_range(struct mm_struct *mm,
+					unsigned long start, unsigned long end,
+					bool flush_all_sizes)
 
 {
-	struct mm_struct *mm = vma->vm_mm;
 	unsigned long pid;
 	unsigned int page_shift = mmu_psize_defs[mmu_virtual_psize].shift;
 	unsigned long page_size = 1UL << page_shift;
 	unsigned long nr_pages = (end - start) >> page_shift;
 	bool local, full;
 
-#ifdef CONFIG_HUGETLB_PAGE
-	if (is_vm_hugetlb_page(vma))
-		return radix__flush_hugetlb_tlb_range(vma, start, end);
-#endif
-
 	pid = mm->context.id;
 	if (unlikely(pid == MMU_NO_CONTEXT))
 		return;
@@ -738,37 +733,64 @@ void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 				_tlbie_pid(pid, RIC_FLUSH_TLB);
 		}
 	} else {
-		bool hflush = false;
+		bool hflush = flush_all_sizes;
+		bool gflush = flush_all_sizes;
 		unsigned long hstart, hend;
+		unsigned long gstart, gend;
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-		hstart = (start + HPAGE_PMD_SIZE - 1) >> HPAGE_PMD_SHIFT;
-		hend = end >> HPAGE_PMD_SHIFT;
-		if (hstart < hend) {
-			hstart <<= HPAGE_PMD_SHIFT;
-			hend <<= HPAGE_PMD_SHIFT;
+		if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
 			hflush = true;
+
+		if (hflush) {
+			hstart = (start + PMD_SIZE - 1) & PMD_MASK;
+			hend = end & PMD_MASK;
+			if (hstart == hend)
+				hflush = false;
+		}
+
+		if (gflush) {
+			gstart = (start + PUD_SIZE - 1) & PUD_MASK;
+			gend = end & PUD_MASK;
+			if (gstart == gend)
+				gflush = false;
 		}
-#endif
 
 		asm volatile("ptesync": : :"memory");
 		if (local) {
 			__tlbiel_va_range(start, end, pid, page_size, mmu_virtual_psize);
 			if (hflush)
 				__tlbiel_va_range(hstart, hend, pid,
-						HPAGE_PMD_SIZE, MMU_PAGE_2M);
+						PMD_SIZE, MMU_PAGE_2M);
+			if (gflush)
+				__tlbiel_va_range(gstart, gend, pid,
+						PUD_SIZE, MMU_PAGE_1G);
 			asm volatile("ptesync": : :"memory");
 		} else {
 			__tlbie_va_range(start, end, pid, page_size, mmu_virtual_psize);
 			if (hflush)
 				__tlbie_va_range(hstart, hend, pid,
-						HPAGE_PMD_SIZE, MMU_PAGE_2M);
+						PMD_SIZE, MMU_PAGE_2M);
+			if (gflush)
+				__tlbie_va_range(gstart, gend, pid,
+						PUD_SIZE, MMU_PAGE_1G);
 			fixup_tlbie();
 			asm volatile("eieio; tlbsync; ptesync": : :"memory");
 		}
 	}
 	preempt_enable();
 }
+
+void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
+		     unsigned long end)
+
+{
+#ifdef CONFIG_HUGETLB_PAGE
+	if (is_vm_hugetlb_page(vma))
+		return radix__flush_hugetlb_tlb_range(vma, start, end);
+#endif
+
+	__radix__flush_tlb_range(vma->vm_mm, start, end, false);
+}
 EXPORT_SYMBOL(radix__flush_tlb_range);
 
 static int radix_get_mmu_psize(int page_size)
@@ -837,6 +859,8 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 	int psize = 0;
 	struct mm_struct *mm = tlb->mm;
 	int page_size = tlb->page_size;
+	unsigned long start = tlb->start;
+	unsigned long end = tlb->end;
 
 	/*
 	 * if page size is not something we understand, do a full mm flush
@@ -847,15 +871,45 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 	 */
 	if (tlb->fullmm) {
 		__flush_all_mm(mm, true);
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE)
+	} else if (mm_tlb_flush_nested(mm)) {
+		/*
+		 * If there is a concurrent invalidation that is clearing ptes,
+		 * then it's possible this invalidation will miss one of those
+		 * cleared ptes and miss flushing the TLB. If this invalidate
+		 * returns before the other one flushes TLBs, that can result
+		 * in it returning while there are still valid TLBs inside the
+		 * range to be invalidated.
+		 *
+		 * See mm/memory.c:tlb_finish_mmu() for more details.
+		 *
+		 * The solution to this is ensure the entire range is always
+		 * flushed here. The problem for powerpc is that the flushes
+		 * are page size specific, so this "forced flush" would not
+		 * do the right thing if there are a mix of page sizes in
+		 * the range to be invalidated. So use __flush_tlb_range
+		 * which invalidates all possible page sizes in the range.
+		 *
+		 * PWC flush probably is not be required because the core code
+		 * shouldn't free page tables in this path, but accounting
+		 * for the possibility makes us a bit more robust.
+		 *
+		 * need_flush_all is an uncommon case because page table
+		 * teardown should be done with exclusive locks held (but
+		 * after locks are dropped another invalidate could come
+		 * in), it could be optimized further if necessary.
+		 */
+		if (!tlb->need_flush_all)
+			__radix__flush_tlb_range(mm, start, end, true);
+		else
+			radix__flush_all_mm(mm);
+#endif
 	} else if ( (psize = radix_get_mmu_psize(page_size)) == -1) {
 		if (!tlb->need_flush_all)
 			radix__flush_tlb_mm(mm);
 		else
 			radix__flush_all_mm(mm);
 	} else {
-		unsigned long start = tlb->start;
-		unsigned long end = tlb->end;
-
 		if (!tlb->need_flush_all)
 			radix__flush_tlb_range_psize(mm, start, end, psize);
 		else
-- 
2.17.0

^ permalink raw reply related

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
From: Ricardo Neri @ 2018-06-15  2:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu
In-Reply-To: <alpine.DEB.2.21.1806131128570.2280@nanos.tec.linutronix.de>

On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
> >  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> >  		kick_timer(hdata);
> >  
> > +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
> 
> Eeew.

If you don't mind me asking. What is the problem with this error message?
> 
> >  /**
> > + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> > + * @val:	Attribute associated with the NMI. Not used.
> > + * @regs:	Register values as seen when the NMI was asserted
> > + *
> > + * When an NMI is issued, look for hardlockups. If the timer is not periodic,
> > + * kick it. The interrupt is always handled when if delivered via the
> > + * Front-Side Bus.
> > + *
> > + * Returns:
> > + *
> > + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> > + * otherwise.
> > + */
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +					   struct pt_regs *regs)
> > +{
> > +	struct hpet_hld_data *hdata = hld_data;
> > +	unsigned int use_fsb;
> > +
> > +	/*
> > +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> > +	 * edge-triggered and there is no need to check the ISR register.
> > +	 */
> > +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> > +
> > +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> > +		return NMI_DONE;
> 
> So for 'use_fsb == True' every single NMI will fall through into the
> watchdog code below.
> 
> > +	inspect_for_hardlockups(regs);
> > +
> > +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > +		kick_timer(hdata);
> 
> And in case that the HPET does not support periodic mode this reprogramms
> the timer on every NMI which means that while perf is running the watchdog
> will never ever detect anything.

Yes. I see that this is wrong. With MSI interrupts, as far as I can
see, there is not a way to make sure that the HPET timer caused the NMI
perhaps the only option is to use an IO APIC interrupt and read the
interrupt status register.

> 
> Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> access is horribly slow, so any high frequency perf monitoring will take a
> massive performance hit.

If an IO APIC interrupt is used, only HPET register (the status register)
would need to be read for every NMI. Would that be more acceptable? Otherwise,
there is no way to determine if the HPET cause the NMI.

Alternatively, there could be a counter that skips reading the HPET status
register (and the detection of hardlockups) for every X NMIs. This would
reduce the overall frequency of HPET register reads.

Is that more acceptable?

Thanks and BR,
Ricardo

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox