LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V2] KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
From: Aneesh Kumar K.V @ 2014-05-06 18:01 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm, kvm-ppc, Aneesh Kumar K.V

On recent IBM Power CPUs, while the hashed page table is looked up using
the page size from the segmentation hardware (i.e. the SLB), it is
possible to have the HPT entry indicate a larger page size.  Thus for
example it is possible to put a 16MB page in a 64kB segment, but since
the hash lookup is done using a 64kB page size, it may be necessary to
put multiple entries in the HPT for a single 16MB page.  This
capability is called mixed page-size segment (MPSS).  With MPSS,
there are two relevant page sizes: the base page size, which is the
size used in searching the HPT, and the actual page size, which is the
size indicated in the HPT entry. [ Note that the actual page size is
always >= base page size ].

We use "ibm,segment-page-sizes" device tree node to advertise
the MPSS support to PAPR guest. The penc encoding indicates whether
we support a specific combination of base page size and actual
page size in the same segment. We also use the penc value in the
LP encoding of HPTE entry.

This patch exposes MPSS support to KVM guest by advertising the
feature via "ibm,segment-page-sizes". It also adds the necessary changes
to decode the base page size and the actual page size correctly from the
HPTE entry.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
Changes from V1:
* Update commit message
* Rename variables as per review feedback

 arch/powerpc/include/asm/kvm_book3s_64.h | 146 ++++++++++++++++++++++++++-----
 arch/powerpc/kvm/book3s_hv.c             |   7 ++
 2 files changed, 130 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 51388befeddb..fddb72b48ce9 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -77,34 +77,122 @@ static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits)
 	return old == 0;
 }
 
+static inline int __hpte_actual_psize(unsigned int lp, int psize)
+{
+	int i, shift;
+	unsigned int mask;
+
+	/* start from 1 ignoring MMU_PAGE_4K */
+	for (i = 1; i < MMU_PAGE_COUNT; i++) {
+
+		/* invalid penc */
+		if (mmu_psize_defs[psize].penc[i] == -1)
+			continue;
+		/*
+		 * encoding bits per actual page size
+		 *        PTE LP     actual page size
+		 *    rrrr rrrz		>=8KB
+		 *    rrrr rrzz		>=16KB
+		 *    rrrr rzzz		>=32KB
+		 *    rrrr zzzz		>=64KB
+		 * .......
+		 */
+		shift = mmu_psize_defs[i].shift - LP_SHIFT;
+		if (shift > LP_BITS)
+			shift = LP_BITS;
+		mask = (1 << shift) - 1;
+		if ((lp & mask) == mmu_psize_defs[psize].penc[i])
+			return i;
+	}
+	return -1;
+}
+
 static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 					     unsigned long pte_index)
 {
-	unsigned long rb, va_low;
+	int b_psize, a_psize;
+	unsigned int penc;
+	unsigned long rb = 0, va_low, sllp;
+	unsigned int lp = (r >> LP_SHIFT) & ((1 << LP_BITS) - 1);
+
+	if (!(v & HPTE_V_LARGE)) {
+		/* both base and actual psize is 4k */
+		b_psize = MMU_PAGE_4K;
+		a_psize = MMU_PAGE_4K;
+	} else {
+		for (b_psize = 0; b_psize < MMU_PAGE_COUNT; b_psize++) {
+
+			/* valid entries have a shift value */
+			if (!mmu_psize_defs[b_psize].shift)
+				continue;
 
+			a_psize = __hpte_actual_psize(lp, b_psize);
+			if (a_psize != -1)
+				break;
+		}
+	}
+	/*
+	 * Ignore the top 14 bits of va
+	 * v have top two bits covering segment size, hence move
+	 * by 16 bits, Also clear the lower HPTE_V_AVPN_SHIFT (7) bits.
+	 * AVA field in v also have the lower 23 bits ignored.
+	 * For base page size 4K we need 14 .. 65 bits (so need to
+	 * collect extra 11 bits)
+	 * For others we need 14..14+i
+	 */
+	/* This covers 14..54 bits of va*/
 	rb = (v & ~0x7fUL) << 16;		/* AVA field */
+	/*
+	 * AVA in v had cleared lower 23 bits. We need to derive
+	 * that from pteg index
+	 */
 	va_low = pte_index >> 3;
 	if (v & HPTE_V_SECONDARY)
 		va_low = ~va_low;
-	/* xor vsid from AVA */
+	/*
+	 * get the vpn bits from va_low using reverse of hashing.
+	 * In v we have va with 23 bits dropped and then left shifted
+	 * HPTE_V_AVPN_SHIFT (7) bits. Now to find vsid we need
+	 * right shift it with (SID_SHIFT - (23 - 7))
+	 */
 	if (!(v & HPTE_V_1TB_SEG))
-		va_low ^= v >> 12;
+		va_low ^= v >> (SID_SHIFT - 16);
 	else
-		va_low ^= v >> 24;
+		va_low ^= v >> (SID_SHIFT_1T - 16);
 	va_low &= 0x7ff;
-	if (v & HPTE_V_LARGE) {
-		rb |= 1;			/* L field */
-		if (cpu_has_feature(CPU_FTR_ARCH_206) &&
-		    (r & 0xff000)) {
-			/* non-16MB large page, must be 64k */
-			/* (masks depend on page size) */
-			rb |= 0x1000;		/* page encoding in LP field */
-			rb |= (va_low & 0x7f) << 16; /* 7b of VA in AVA/LP field */
-			rb |= ((va_low << 4) & 0xf0);	/* AVAL field (P7 doesn't seem to care) */
-		}
-	} else {
-		/* 4kB page */
-		rb |= (va_low & 0x7ff) << 12;	/* remaining 11b of VA */
+
+	switch (b_psize) {
+	case MMU_PAGE_4K:
+		sllp = ((mmu_psize_defs[a_psize].sllp & SLB_VSID_L) >> 6) |
+			((mmu_psize_defs[a_psize].sllp & SLB_VSID_LP) >> 4);
+		rb |= sllp << 5;	/*  AP field */
+		rb |= (va_low & 0x7ff) << 12;	/* remaining 11 bits of AVA */
+		break;
+	default:
+	{
+		int aval_shift;
+		/*
+		 * remaining 7bits of AVA/LP fields
+		 * Also contain the rr bits of LP
+		 */
+		rb |= (va_low & 0x7f) << 16;
+		/*
+		 * Now clear not needed LP bits based on actual psize
+		 */
+		rb &= ~((1ul << mmu_psize_defs[a_psize].shift) - 1);
+		/*
+		 * AVAL field 58..77 - base_page_shift bits of va
+		 * we have space for 58..64 bits, Missing bits should
+		 * be zero filled. +1 is to take care of L bit shift
+		 */
+		aval_shift = 64 - (77 - mmu_psize_defs[b_psize].shift) + 1;
+		rb |= ((va_low << aval_shift) & 0xfe);
+
+		rb |= 1;		/* L field */
+		penc = mmu_psize_defs[b_psize].penc[a_psize];
+		rb |= penc << 12;	/* LP field */
+		break;
+	}
 	}
 	rb |= (v >> 54) & 0x300;		/* B field */
 	return rb;
@@ -112,14 +200,26 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 
 static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
 {
+	int size, a_psize;
+	/* Look at the 8 bit LP value */
+	unsigned int lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1);
+
 	/* only handle 4k, 64k and 16M pages for now */
 	if (!(h & HPTE_V_LARGE))
-		return 1ul << 12;		/* 4k page */
-	if ((l & 0xf000) == 0x1000 && cpu_has_feature(CPU_FTR_ARCH_206))
-		return 1ul << 16;		/* 64k page */
-	if ((l & 0xff000) == 0)
-		return 1ul << 24;		/* 16M page */
-	return 0;				/* error */
+		return 1ul << 12;
+	else {
+		for (size = 0; size < MMU_PAGE_COUNT; size++) {
+			/* valid entries have a shift value */
+			if (!mmu_psize_defs[size].shift)
+				continue;
+
+			a_psize = __hpte_actual_psize(lp, size);
+			if (a_psize != -1)
+				return 1ul << mmu_psize_defs[a_psize].shift;
+		}
+
+	}
+	return 0;
 }
 
 static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8227dba5af0f..f5d4b9d77dc9 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1949,6 +1949,13 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
 	 * support pte_enc here
 	 */
 	(*sps)->enc[0].pte_enc = def->penc[linux_psize];
+	/*
+	 * Add 16MB MPSS support if host supports it
+	 */
+	if (linux_psize != MMU_PAGE_16M && def->penc[MMU_PAGE_16M] != -1) {
+		(*sps)->enc[1].page_shift = 24;
+		(*sps)->enc[1].pte_enc = def->penc[MMU_PAGE_16M];
+	}
 	(*sps)++;
 }
 
-- 
1.9.1

^ permalink raw reply related

* RE: [PATCH v2 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail
From: mihai.caraman @ 2014-05-06 19:06 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org
In-Reply-To: <53636B71.2020103@suse.de>

> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Friday, May 02, 2014 12:55 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org
> Subject: Re: [PATCH v2 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail
>=20
> On 05/01/2014 02:45 AM, Mihai Caraman wrote:
...
> > diff --git a/arch/powerpc/include/asm/kvm_ppc.h
> b/arch/powerpc/include/asm/kvm_ppc.h
> > index 4096f16..6e7c358 100644
> > --- a/arch/powerpc/include/asm/kvm_ppc.h
> > +++ b/arch/powerpc/include/asm/kvm_ppc.h
> > @@ -72,6 +72,8 @@ extern int kvmppc_sanity_check(struct kvm_vcpu
> *vcpu);
> >   extern int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu);
> >   extern void kvmppc_subarch_vcpu_uninit(struct kvm_vcpu *vcpu);
> >
> > +extern int kvmppc_get_last_inst(struct kvm_vcpu *vcpu, u32 *inst);
>=20
> Phew. Moving this into a separate function sure has some performance
> implications. Was there no way to keep it in a header?
>=20
> You could just move it into its own .h file which we include after
> kvm_ppc.h. That way everything's available. That would also help me a
> lot with the little endian port where I'm also struggling with header
> file inclusion order and kvmppc_need_byteswap().

Great, I will do this.

> > diff --git a/arch/powerpc/kvm/book3s_pr.c
> b/arch/powerpc/kvm/book3s_pr.c
> > index c5c052a..b7fffd1 100644
> > --- a/arch/powerpc/kvm/book3s_pr.c
> > +++ b/arch/powerpc/kvm/book3s_pr.c
> > @@ -608,12 +608,9 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu,
> ulong msr)
> >
> >   static int kvmppc_read_inst(struct kvm_vcpu *vcpu)
> >   {
> > -	ulong srr0 =3D kvmppc_get_pc(vcpu);
> > -	u32 last_inst =3D kvmppc_get_last_inst(vcpu);
> > -	int ret;
> > +	u32 last_inst;
> >
> > -	ret =3D kvmppc_ld(vcpu, &srr0, sizeof(u32), &last_inst, false);
> > -	if (ret =3D=3D -ENOENT) {
> > +	if (kvmppc_get_last_inst(vcpu, &last_inst) =3D=3D -ENOENT) {
>=20
> ENOENT?

You have to tell us :) Why does kvmppc_ld() mix emulation_result
enumeration with generic errors? Do you want to change that and
use EMULATE_FAIL instead?

>=20
> >   		ulong msr =3D vcpu->arch.shared->msr;
> >
> >   		msr =3D kvmppc_set_field(msr, 33, 33, 1);
> > @@ -867,15 +864,18 @@ int kvmppc_handle_exit_pr(struct kvm_run *run,
> struct kvm_vcpu *vcpu,
> >   	{
> >   		enum emulation_result er;
> >   		ulong flags;
> > +		u32 last_inst;
> >
> >   program_interrupt:
> >   		flags =3D vcpu->arch.shadow_srr1 & 0x1f0000ull;
> > +		kvmppc_get_last_inst(vcpu, &last_inst);
>=20
> No check for the return value?

Should we queue a program exception and resume guest?

>=20
> >
> >   		if (vcpu->arch.shared->msr & MSR_PR) {
> >   #ifdef EXIT_DEBUG
> > -			printk(KERN_INFO "Userspace triggered 0x700 exception
> at 0x%lx (0x%x)\n", kvmppc_get_pc(vcpu), kvmppc_get_last_inst(vcpu));
> > +			pr_info("Userspace triggered 0x700 exception at\n"
> > +			    "0x%lx (0x%x)\n", kvmppc_get_pc(vcpu), last_inst);
> >   #endif
> > -			if ((kvmppc_get_last_inst(vcpu) & 0xff0007ff) !=3D
> > +			if ((last_inst & 0xff0007ff) !=3D
> >   			    (INS_DCBZ & 0xfffffff7)) {
> >   				kvmppc_core_queue_program(vcpu, flags);
> >   				r =3D RESUME_GUEST;
> > @@ -894,7 +894,7 @@ program_interrupt:
> >   			break;
> >   		case EMULATE_FAIL:
> >   			printk(KERN_CRIT "%s: emulation at %lx failed
> (%08x)\n",
> > -			       __func__, kvmppc_get_pc(vcpu),
> kvmppc_get_last_inst(vcpu));
> > +			       __func__, kvmppc_get_pc(vcpu), last_inst);
> >   			kvmppc_core_queue_program(vcpu, flags);
> >   			r =3D RESUME_GUEST;
> >   			break;
> > @@ -911,8 +911,12 @@ program_interrupt:
> >   		break;
> >   	}
> >   	case BOOK3S_INTERRUPT_SYSCALL:
> > +	{
> > +		u32 last_sc;
> > +
> > +		kvmppc_get_last_sc(vcpu, &last_sc);
>=20
> No check for the return value?

The existing code does not handle KVM_INST_FETCH_FAILED.=20
How should we continue if papr is enabled and last_sc fails?

>=20
> >   		if (vcpu->arch.papr_enabled &&
> > -		    (kvmppc_get_last_sc(vcpu) =3D=3D 0x44000022) &&
> > +		    (last_sc =3D=3D 0x44000022) &&
> >   		    !(vcpu->arch.shared->msr & MSR_PR)) {
> >   			/* SC 1 papr hypercalls */
> >   			ulong cmd =3D kvmppc_get_gpr(vcpu, 3);
> > @@ -957,6 +961,7 @@ program_interrupt:
> >   			r =3D RESUME_GUEST;
> >   		}
> >   		break;
> > +	}
> >   	case BOOK3S_INTERRUPT_FP_UNAVAIL:
> >   	case BOOK3S_INTERRUPT_ALTIVEC:
> >   	case BOOK3S_INTERRUPT_VSX:
> > @@ -985,15 +990,20 @@ program_interrupt:
> >   		break;
> >   	}
> >   	case BOOK3S_INTERRUPT_ALIGNMENT:
> > +	{
> > +		u32 last_inst;
> > +
> >   		if (kvmppc_read_inst(vcpu) =3D=3D EMULATE_DONE) {
> > -			vcpu->arch.shared->dsisr =3D kvmppc_alignment_dsisr(vcpu,
> > -				kvmppc_get_last_inst(vcpu));
> > -			vcpu->arch.shared->dar =3D kvmppc_alignment_dar(vcpu,
> > -				kvmppc_get_last_inst(vcpu));
> > +			kvmppc_get_last_inst(vcpu, &last_inst);
>=20
> I think with an error returning kvmppc_get_last_inst we can just use
> completely get rid of kvmppc_read_inst() and only use
> kvmppc_get_last_inst() instead.

The only purpose of kvmppc_read_inst() function is to generate an
instruction storage interrupt in case of load failure. It is also called
from kvmppc_check_ext(). Do you suggest to get rid of this logic? Or to
duplicate it in order to avoid kvmppc_get_last_inst() redundant call?

>=20
> > +			vcpu->arch.shared->dsisr =3D
> > +				kvmppc_alignment_dsisr(vcpu, last_inst);
> > +			vcpu->arch.shared->dar =3D
> > +				kvmppc_alignment_dar(vcpu, last_inst);
> >   			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
> >   		}
> >   		r =3D RESUME_GUEST;
> >   		break;
> > +	}
> >   	case BOOK3S_INTERRUPT_MACHINE_CHECK:
> >   	case BOOK3S_INTERRUPT_TRACE:
> >   		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
> > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> > index ab62109..df25db0 100644
> > --- a/arch/powerpc/kvm/booke.c
> > +++ b/arch/powerpc/kvm/booke.c
> > @@ -752,6 +752,9 @@ static int emulation_exit(struct kvm_run *run,
> struct kvm_vcpu *vcpu)
> >   		 * they were actually modified by emulation. */
> >   		return RESUME_GUEST_NV;
> >
> > +	case EMULATE_AGAIN:
> > +		return RESUME_GUEST;
> > +
> >   	case EMULATE_DO_DCR:
> >   		run->exit_reason =3D KVM_EXIT_DCR;
> >   		return RESUME_HOST;
> > @@ -1911,6 +1914,17 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
> >   	vcpu->kvm->arch.kvm_ops->vcpu_put(vcpu);
> >   }
> >
> > +int kvmppc_get_last_inst(struct kvm_vcpu *vcpu, u32 *inst)
> > +{
> > +	int result =3D EMULATE_DONE;
> > +
> > +	if (vcpu->arch.last_inst =3D=3D KVM_INST_FETCH_FAILED)
> > +		result =3D kvmppc_ld_inst(vcpu, &vcpu->arch.last_inst);
> > +	*inst =3D vcpu->arch.last_inst;
>=20
> This function looks almost identical to the book3s one. Can we share the
> same code path here? Just always return false for
> kvmppc_needs_byteswap() on booke.

Sounds good.

-Mike

^ permalink raw reply

* [PATCH v2] powerpc: Use PFN_PHYS() to avoid truncating the physical address
From: Emil Medve @ 2014-05-06 18:47 UTC (permalink / raw)
  To: benh, linuxppc-dev; +Cc: Emil Medve

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
---

v2: Rebased and updated due to upstream changes since v1

 arch/powerpc/include/asm/io.h          |  2 +-
 arch/powerpc/include/asm/page.h        |  2 +-
 arch/powerpc/include/asm/pgalloc-32.h  |  2 +-
 arch/powerpc/include/asm/rtas.h        |  3 ++-
 arch/powerpc/kernel/crash_dump.c       |  2 +-
 arch/powerpc/kernel/eeh.c              |  4 +---
 arch/powerpc/kernel/io-workarounds.c   |  2 +-
 arch/powerpc/kernel/pci-common.c       |  2 +-
 arch/powerpc/kernel/vdso.c             |  6 +++---
 arch/powerpc/kvm/book3s_64_mmu_host.c  |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  5 ++---
 arch/powerpc/kvm/book3s_hv.c           | 10 +++++-----
 arch/powerpc/kvm/book3s_hv_rm_mmu.c    |  4 ++--
 arch/powerpc/kvm/e500_mmu_host.c       |  5 ++---
 arch/powerpc/mm/hugepage-hash64.c      |  2 +-
 arch/powerpc/mm/hugetlbpage-book3e.c   |  2 +-
 arch/powerpc/mm/hugetlbpage-hash64.c   |  2 +-
 arch/powerpc/mm/mem.c                  |  9 ++++-----
 arch/powerpc/mm/numa.c                 | 13 +++++++------
 arch/powerpc/platforms/powernv/opal.c  |  2 +-
 arch/powerpc/platforms/pseries/iommu.c |  8 ++++----
 21 files changed, 43 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 97d3869..8f7af05 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -790,7 +790,7 @@ static inline void * phys_to_virt(unsigned long address)
 /*
  * Change "struct page" to physical address.
  */
-#define page_to_phys(page)	((phys_addr_t)page_to_pfn(page) << PAGE_SHIFT)
+#define page_to_phys(page)	PFN_PHYS(page_to_pfn(page))
 
 /*
  * 32 bits still uses virt_to_bus() for it's implementation of DMA
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 32e4e21..7193d45 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -131,7 +131,7 @@ extern long long virt_phys_offset;
 #endif
 
 #define virt_to_page(kaddr)	pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
-#define pfn_to_kaddr(pfn)	__va((pfn) << PAGE_SHIFT)
+#define pfn_to_kaddr(pfn)	__va(PFN_PHYS(pfn))
 #define virt_addr_valid(kaddr)	pfn_valid(__pa(kaddr) >> PAGE_SHIFT)
 
 /*
diff --git a/arch/powerpc/include/asm/pgalloc-32.h b/arch/powerpc/include/asm/pgalloc-32.h
index 842846c..3d19a8e 100644
--- a/arch/powerpc/include/asm/pgalloc-32.h
+++ b/arch/powerpc/include/asm/pgalloc-32.h
@@ -24,7 +24,7 @@ extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
 #define pmd_populate_kernel(mm, pmd, pte)	\
 		(pmd_val(*(pmd)) = __pa(pte) | _PMD_PRESENT)
 #define pmd_populate(mm, pmd, pte)	\
-		(pmd_val(*(pmd)) = (page_to_pfn(pte) << PAGE_SHIFT) | _PMD_PRESENT)
+		(pmd_val(*(pmd)) = PFN_PHYS(page_to_pfn(pte)) | _PMD_PRESENT)
 #define pmd_pgtable(pmd) pmd_page(pmd)
 #else
 #define pmd_populate_kernel(mm, pmd, pte)	\
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index b390f55..c19bd9f 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -3,6 +3,7 @@
 #ifdef __KERNEL__
 
 #include <linux/spinlock.h>
+#include <linux/pfn.h>
 #include <asm/page.h>
 
 /*
@@ -418,7 +419,7 @@ extern void rtas_take_timebase(void);
 #ifdef CONFIG_PPC_RTAS
 static inline int page_is_rtas_user_buf(unsigned long pfn)
 {
-	unsigned long paddr = (pfn << PAGE_SHIFT);
+	unsigned long paddr = PFN_PHYS(pfn);
 	if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + RTAS_RMOBUF_MAX))
 		return 1;
 	return 0;
diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
index 7a13f37..a46a9c2 100644
--- a/arch/powerpc/kernel/crash_dump.c
+++ b/arch/powerpc/kernel/crash_dump.c
@@ -104,7 +104,7 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
 		return 0;
 
 	csize = min_t(size_t, csize, PAGE_SIZE);
-	paddr = pfn << PAGE_SHIFT;
+	paddr = PFN_PHYS(pfn);
 
 	if (memblock_is_region_memory(paddr, csize)) {
 		vaddr = __va(paddr);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 3764fb7..7f2ba3d 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -271,7 +271,6 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 static inline unsigned long eeh_token_to_phys(unsigned long token)
 {
 	pte_t *ptep;
-	unsigned long pa;
 	int hugepage_shift;
 
 	/*
@@ -281,9 +280,8 @@ static inline unsigned long eeh_token_to_phys(unsigned long token)
 	if (!ptep)
 		return token;
 	WARN_ON(hugepage_shift);
-	pa = pte_pfn(*ptep) << PAGE_SHIFT;
 
-	return pa | (token & (PAGE_SIZE-1));
+	return PFN_PHYS(pte_pfn(*ptep)) | (token & (PAGE_SIZE-1));
 }
 
 /*
diff --git a/arch/powerpc/kernel/io-workarounds.c b/arch/powerpc/kernel/io-workarounds.c
index 24b968f..dd9a4a2 100644
--- a/arch/powerpc/kernel/io-workarounds.c
+++ b/arch/powerpc/kernel/io-workarounds.c
@@ -81,7 +81,7 @@ struct iowa_bus *iowa_mem_find_bus(const PCI_IO_ADDR addr)
 			 * we don't have hugepages backing iomem
 			 */
 			WARN_ON(hugepage_shift);
-			paddr = pte_pfn(*ptep) << PAGE_SHIFT;
+			paddr = PFN_PHYS(pte_pfn(*ptep));
 		}
 		bus = iowa_pci_find(vaddr, paddr);
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index add166a..c26f5a9 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -411,7 +411,7 @@ pgprot_t pci_phys_mem_access_prot(struct file *file,
 {
 	struct pci_dev *pdev = NULL;
 	struct resource *found = NULL;
-	resource_size_t offset = ((resource_size_t)pfn) << PAGE_SHIFT;
+	resource_size_t offset = PFN_PHYS(pfn);
 	int i;
 
 	if (page_is_ram(pfn))
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index ce74c33..d8095ad 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -144,12 +144,12 @@ struct lib64_elfinfo
 #ifdef __DEBUG
 static void dump_one_vdso_page(struct page *pg, struct page *upg)
 {
-	printk("kpg: %p (c:%d,f:%08lx)", __va(page_to_pfn(pg) << PAGE_SHIFT),
+	printk("kpg: %p (c:%d,f:%08lx)", __va(PFN_PHYS(page_to_pfn(pg)));
 	       page_count(pg),
 	       pg->flags);
 	if (upg && !IS_ERR(upg) /* && pg != upg*/) {
-		printk(" upg: %p (c:%d,f:%08lx)", __va(page_to_pfn(upg)
-						       << PAGE_SHIFT),
+		printk(" upg: %p (c:%d,f:%08lx)",
+		       __va(PFN_PHYS(page_to_pfn(upg))),
 		       page_count(upg),
 		       upg->flags);
 	}
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 0d513af..4dbdba6 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -110,7 +110,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 		r = -EINVAL;
 		goto out;
 	}
-	hpaddr = pfn << PAGE_SHIFT;
+	hpaddr = PFN_PHYS(pfn);
 
 	/* and write the mapping ea -> hpa into the pt */
 	vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index fb25ebc..f01110f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -349,8 +349,7 @@ static long kvmppc_get_guest_page(struct kvm *kvm, unsigned long gfn,
 	spin_lock(&kvm->arch.slot_phys_lock);
 	for (i = 0; i < npages; ++i) {
 		if (!physp[i]) {
-			physp[i] = ((pfn + i) << PAGE_SHIFT) +
-				got + is_io + pgorder;
+			physp[i] = PFN_PHYS(pfn + i) + got + is_io + pgorder;
 			got = 0;
 		}
 	}
@@ -725,7 +724,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	 */
 	if (psize < PAGE_SIZE)
 		psize = PAGE_SIZE;
-	r = (r & ~(HPTE_R_PP0 - psize)) | ((pfn << PAGE_SHIFT) & ~(psize - 1));
+	r = (r & ~(HPTE_R_PP0 - psize)) | (PFN_PHYS(pfn) & ~(psize - 1));
 	if (hpte_is_writable(r) && !write_ok)
 		r = hpte_make_readonly(r);
 	ret = RESUME_GUEST;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8227dba..5c96231 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2232,10 +2232,11 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 			/* POWER7 */
 			lpcr_mask = LPCR_VPM0 | LPCR_VRMA_L | LPCR_RMLS;
 			lpcr = rmls << LPCR_RMLS_SH;
-			kvm->arch.rmor = ri->base_pfn << PAGE_SHIFT;
+			kvm->arch.rmor = PFN_PHYS(ri->base_pfn);
 		}
-		pr_info("KVM: Using RMO at %lx size %lx (LPCR = %lx)\n",
-			ri->base_pfn << PAGE_SHIFT, rma_size, lpcr);
+		pr_info("KVM: Using RMO at %llx size %lx (LPCR = %lx)\n",
+			(unsigned long long)PFN_PHYS(ri->base_pfn),
+			rma_size, lpcr);
 
 		/* Initialize phys addrs of pages in RMO */
 		npages = kvm_rma_pages;
@@ -2246,8 +2247,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 				npages = memslot->npages;
 			spin_lock(&kvm->arch.slot_phys_lock);
 			for (i = 0; i < npages; ++i)
-				physp[i] = ((ri->base_pfn + i) << PAGE_SHIFT) +
-					porder;
+				physp[i] = PFN_PHYS(ri->base_pfn + i) + porder;
 			spin_unlock(&kvm->arch.slot_phys_lock);
 		}
 	}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1d6c56a..799503c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -31,7 +31,7 @@ static void *real_vmalloc_addr(void *x)
 	if (!p || !pte_present(*p))
 		return NULL;
 	/* assume we don't have huge pages in vmalloc space... */
-	addr = (pte_pfn(*p) << PAGE_SHIFT) | (addr & ~PAGE_MASK);
+	addr = PFN_PHYS(pte_pfn(*p)) | (addr & ~PAGE_MASK);
 	return __va(addr);
 }
 
@@ -239,7 +239,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 				/* make the actual HPTE be read-only */
 				ptel = hpte_make_readonly(ptel);
 			is_io = hpte_cache_bits(pte_val(pte));
-			pa = pte_pfn(pte) << PAGE_SHIFT;
+			pa = PFN_PHYS(pte_pfn(pte));
 			pa |= hva & (pte_size - 1);
 			pa |= gpa & ~PAGE_MASK;
 		}
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index dd2cc03..2368e2c 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -168,8 +168,7 @@ void kvmppc_map_magic(struct kvm_vcpu *vcpu)
 	magic.mas1 = MAS1_VALID | MAS1_TS | MAS1_TID(stid) |
 		     MAS1_TSIZE(BOOK3E_PAGESZ_4K);
 	magic.mas2 = vcpu->arch.magic_page_ea | MAS2_M;
-	magic.mas7_3 = ((u64)pfn << PAGE_SHIFT) |
-		       MAS3_SW | MAS3_SR | MAS3_UW | MAS3_UR;
+	magic.mas7_3 = PFN_PHYS(pfn) | MAS3_SW | MAS3_SR | MAS3_UW | MAS3_UR;
 	magic.mas8 = 0;
 
 	__write_host_tlbe(&magic, MAS0_TLBSEL(1) | MAS0_ESEL(tlbcam_index));
@@ -311,7 +310,7 @@ static void kvmppc_e500_setup_stlbe(
 	/* Force IPROT=0 for all guest mappings. */
 	stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
 	stlbe->mas2 = (gvaddr & MAS2_EPN) | (ref->flags & E500_TLB_MAS2_ATTR);
-	stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
+	stlbe->mas7_3 = PFN_PHYS(pfn) |
 			e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
 
 #ifdef CONFIG_KVM_BOOKE_HV
diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c
index 826893f..5004539 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -118,7 +118,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
 		unsigned long hpte_group;
 
 		/* insert new entry */
-		pa = pmd_pfn(__pmd(old_pmd)) << PAGE_SHIFT;
+		pa = PFN_PHYS(pmd_pfn(__pmd(old_pmd)));
 repeat:
 		hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
 
diff --git a/arch/powerpc/mm/hugetlbpage-book3e.c b/arch/powerpc/mm/hugetlbpage-book3e.c
index 5e4ee25..1c94b28 100644
--- a/arch/powerpc/mm/hugetlbpage-book3e.c
+++ b/arch/powerpc/mm/hugetlbpage-book3e.c
@@ -123,7 +123,7 @@ void book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea,
 	mas1 = MAS1_VALID | MAS1_TID(mm->context.id) | MAS1_TSIZE(tsize);
 	mas2 = ea & ~((1UL << shift) - 1);
 	mas2 |= (pte_val(pte) >> PTE_WIMGE_SHIFT) & MAS2_WIMGE_MASK;
-	mas7_3 = (u64)pte_pfn(pte) << PAGE_SHIFT;
+	mas7_3 = PFN_PHYS(pte_pfn(pte));
 	mas7_3 |= (pte_val(pte) >> PTE_BAP_SHIFT) & MAS3_BAP_MASK;
 	if (!pte_dirty(pte))
 		mas7_3 &= ~(MAS3_SW|MAS3_UW);
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c
index a5bcf93..3351ae2 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -88,7 +88,7 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
 	if (likely(!(old_pte & _PAGE_HASHPTE))) {
 		unsigned long hash = hpt_hash(vpn, shift, ssize);
 
-		pa = pte_pfn(__pte(old_pte)) << PAGE_SHIFT;
+		pa = PFN_PHYS(pte_pfn(__pte(old_pte)));
 
 		/* clear HPTE slot informations in new PTE */
 #ifdef CONFIG_PPC_64K_PAGES
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 2c8e90f..32202c9 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -82,7 +82,7 @@ int page_is_ram(unsigned long pfn)
 #ifndef CONFIG_PPC64	/* XXX for now */
 	return pfn < max_pfn;
 #else
-	unsigned long paddr = (pfn << PAGE_SHIFT);
+	unsigned long paddr = PFN_PHYS(pfn);
 	struct memblock_region *reg;
 
 	for_each_memblock(memory, reg)
@@ -333,9 +333,8 @@ void __init mem_init(void)
 
 		highmem_mapnr = lowmem_end_addr >> PAGE_SHIFT;
 		for (pfn = highmem_mapnr; pfn < max_mapnr; ++pfn) {
-			phys_addr_t paddr = (phys_addr_t)pfn << PAGE_SHIFT;
 			struct page *page = pfn_to_page(pfn);
-			if (!memblock_is_reserved(paddr))
+			if (!memblock_is_reserved(PFN_PHYS(pfn)))
 				free_highmem_page(page);
 		}
 	}
@@ -417,7 +416,7 @@ void flush_dcache_icache_page(struct page *page)
 	/* On 8xx there is no need to kmap since highmem is not supported */
 	__flush_dcache_icache(page_address(page)); 
 #else
-	__flush_dcache_icache_phys(page_to_pfn(page) << PAGE_SHIFT);
+	__flush_dcache_icache_phys(PFN_PHYS(page_to_pfn(page)));
 #endif
 }
 EXPORT_SYMBOL(flush_dcache_icache_page);
@@ -553,7 +552,7 @@ subsys_initcall(add_system_ram_resources);
  */
 int devmem_is_allowed(unsigned long pfn)
 {
-	if (iomem_is_exclusive(pfn << PAGE_SHIFT))
+	if (iomem_is_exclusive(PFN_PHYS(pfn)))
 		return 0;
 	if (!page_is_ram(pfn))
 		return 1;
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 3b181b2..123e677 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -116,7 +116,7 @@ static int __init fake_numa_create_new_node(unsigned long end_pfn,
 
 	curr_boundary = mem;
 
-	if ((end_pfn << PAGE_SHIFT) > mem) {
+	if (PFN_PHYS(end_pfn) > mem) {
 		/*
 		 * Skip commas and spaces
 		 */
@@ -939,7 +939,7 @@ static void __init *careful_zallocation(int nid, unsigned long size,
 	int new_nid;
 	unsigned long ret_paddr;
 
-	ret_paddr = __memblock_alloc_base(size, align, end_pfn << PAGE_SHIFT);
+	ret_paddr = __memblock_alloc_base(size, align, PFN_PHYS(end_pfn));
 
 	/* retry over all memory */
 	if (!ret_paddr)
@@ -1013,7 +1013,7 @@ static void __init mark_reserved_regions_for_nid(int nid)
 			 * then trim size to active region
 			 */
 			if (end_pfn > node_ar.end_pfn)
-				reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
+				reserve_size = PFN_PHYS(node_ar.end_pfn)
 					- physbase;
 			/*
 			 * Only worry about *this* node, others may not
@@ -1039,7 +1039,7 @@ static void __init mark_reserved_regions_for_nid(int nid)
 			 *   reserved region
 			 */
 			start_pfn = node_ar.end_pfn;
-			physbase = start_pfn << PAGE_SHIFT;
+			physbase = PFN_PHYS(start_pfn);
 			size = size - reserve_size;
 			get_node_active_region(start_pfn, &node_ar);
 		}
@@ -1088,8 +1088,9 @@ void __init do_init_bootmem(void)
 		if (NODE_DATA(nid)->node_spanned_pages == 0)
   			continue;
 
-  		dbg("start_paddr = %lx\n", start_pfn << PAGE_SHIFT);
-  		dbg("end_paddr = %lx\n", end_pfn << PAGE_SHIFT);
+		dbg("start_paddr = %llx\nend_paddr = %llx\n",
+		    (unsigned long long)PFN_PHYS(start_pfn),
+		    (unsigned long long)PFN_PHYS(end_pfn));
 
 		bootmap_pages = bootmem_bootmap_pages(end_pfn - start_pfn);
 		bootmem_vaddr = careful_zallocation(nid,
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 360ad80c..052e423 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -653,7 +653,7 @@ struct opal_sg_list *opal_vmalloc_to_sg_list(void *vmalloc_addr,
 	first = sg;
 
 	while (vmalloc_size > 0) {
-		uint64_t data = vmalloc_to_pfn(vmalloc_addr) << PAGE_SHIFT;
+		uint64_t data = PFN_PHYS(vmalloc_to_pfn(vmalloc_addr));
 		uint64_t length = min(vmalloc_size, PAGE_SIZE);
 
 		sg->entry[i].data = cpu_to_be64(data);
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 33b552f..96c7bb1 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -359,8 +359,8 @@ static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
 
 	tce_shift = be32_to_cpu(maprange->tce_shift);
 	tce_size = 1ULL << tce_shift;
-	next = start_pfn << PAGE_SHIFT;
-	num_tce = num_pfn << PAGE_SHIFT;
+	next = PFN_PHYS(start_pfn);
+	num_tce = PFN_PHYS(num_pfn);
 
 	/* round back to the beginning of the tce page size */
 	num_tce += next & (tce_size - 1);
@@ -415,8 +415,8 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
 	liobn = (u64)be32_to_cpu(maprange->liobn);
 	tce_shift = be32_to_cpu(maprange->tce_shift);
 	tce_size = 1ULL << tce_shift;
-	next = start_pfn << PAGE_SHIFT;
-	num_tce = num_pfn << PAGE_SHIFT;
+	next = PFN_PHYS(start_pfn);
+	num_tce = PFN_PHYS(num_pfn);
 
 	/* round back to the beginning of the tce page size */
 	num_tce += next & (tce_size - 1);
-- 
1.9.2

^ permalink raw reply related

* [PATCH 1/2 v2] bootmem/powerpc: Unify bootmem initialization
From: Emil Medve @ 2014-05-06 18:48 UTC (permalink / raw)
  To: benh, linuxppc-dev; +Cc: Emil Medve

Unify the low/highmem code path from do_init_bootmem() by using (the)
lowmem related variables/parameters even when the low/highmem split
is not needed (64-bit) or configured. In such cases the "lowmem"
variables/parameters continue to observe the definition by referring
to memory directly mapped by the kernel

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
---

v2: Rebased, no changes

 arch/powerpc/mm/mem.c | 36 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 32202c9..eaf5d1d8 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -188,27 +188,31 @@ EXPORT_SYMBOL_GPL(walk_system_ram_range);
 void __init do_init_bootmem(void)
 {
 	unsigned long start, bootmap_pages;
-	unsigned long total_pages;
 	struct memblock_region *reg;
 	int boot_mapsize;
+	phys_addr_t _total_lowmem;
+	phys_addr_t _lowmem_end_addr;
 
-	max_low_pfn = max_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
-	total_pages = (memblock_end_of_DRAM() - memstart_addr) >> PAGE_SHIFT;
-#ifdef CONFIG_HIGHMEM
-	total_pages = total_lowmem >> PAGE_SHIFT;
-	max_low_pfn = lowmem_end_addr >> PAGE_SHIFT;
+#ifndef CONFIG_HIGHMEM
+	_lowmem_end_addr = memblock_end_of_DRAM();
+#else
+	_lowmem_end_addr = lowmem_end_addr;
 #endif
 
+	max_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
+	max_low_pfn = _lowmem_end_addr >> PAGE_SHIFT;
+	min_low_pfn = MEMORY_START >> PAGE_SHIFT;
+
 	/*
 	 * Find an area to use for the bootmem bitmap.  Calculate the size of
 	 * bitmap required as (Total Memory) / PAGE_SIZE / BITS_PER_BYTE.
 	 * Add 1 additional page in case the address isn't page-aligned.
 	 */
-	bootmap_pages = bootmem_bootmap_pages(total_pages);
+	_total_lowmem = _lowmem_end_addr - memstart_addr;
+	bootmap_pages = bootmem_bootmap_pages(_total_lowmem >> PAGE_SHIFT);
 
 	start = memblock_alloc(bootmap_pages << PAGE_SHIFT, PAGE_SIZE);
 
-	min_low_pfn = MEMORY_START >> PAGE_SHIFT;
 	boot_mapsize = init_bootmem_node(NODE_DATA(0), start >> PAGE_SHIFT, min_low_pfn, max_low_pfn);
 
 	/* Place all memblock_regions in the same node and merge contiguous
@@ -219,26 +223,18 @@ void __init do_init_bootmem(void)
 	/* Add all physical memory to the bootmem map, mark each area
 	 * present.
 	 */
-#ifdef CONFIG_HIGHMEM
-	free_bootmem_with_active_regions(0, lowmem_end_addr >> PAGE_SHIFT);
+	free_bootmem_with_active_regions(0, max_low_pfn);
 
 	/* reserve the sections we're already using */
 	for_each_memblock(reserved, reg) {
-		unsigned long top = reg->base + reg->size - 1;
-		if (top < lowmem_end_addr)
+		if (reg->base + reg->size - 1 < _lowmem_end_addr)
 			reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
-		else if (reg->base < lowmem_end_addr) {
-			unsigned long trunc_size = lowmem_end_addr - reg->base;
+		else if (reg->base < _lowmem_end_addr) {
+			unsigned long trunc_size = _lowmem_end_addr - reg->base;
 			reserve_bootmem(reg->base, trunc_size, BOOTMEM_DEFAULT);
 		}
 	}
-#else
-	free_bootmem_with_active_regions(0, max_pfn);
 
-	/* reserve the sections we're already using */
-	for_each_memblock(reserved, reg)
-		reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
-#endif
 	/* XXX need to clip this if using highmem? */
 	sparse_memory_present_with_active_regions(0);
 
-- 
1.9.2

^ permalink raw reply related

* [PATCH 2/2 v2] powerpc: Enable NO_BOOTMEM
From: Emil Medve @ 2014-05-06 18:48 UTC (permalink / raw)
  To: benh, linuxppc-dev; +Cc: Emil Medve
In-Reply-To: <1399402084-6325-1-git-send-email-Emilian.Medve@Freescale.com>

Currently bootmem is just a wrapper around memblock. This gets rid of
the wrapper code just as other ARHC(es) did: x86, arm, etc.

For now only cover !NUMA systems/builds

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
---

v2: Acknowledge that NUMA systems/builds are not covered by this patch

 arch/powerpc/Kconfig  | 3 +++
 arch/powerpc/mm/mem.c | 8 ++++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e099899..07b164b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -475,6 +475,9 @@ config SYS_SUPPORTS_HUGETLBFS
 
 source "mm/Kconfig"
 
+config NO_BOOTMEM
+	def_bool !NUMA
+
 config ARCH_MEMORY_PROBE
 	def_bool y
 	depends on MEMORY_HOTPLUG
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index eaf5d1d8..d3e1d5f 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -187,10 +187,12 @@ EXPORT_SYMBOL_GPL(walk_system_ram_range);
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 void __init do_init_bootmem(void)
 {
+#ifndef CONFIG_NO_BOOTMEM
 	unsigned long start, bootmap_pages;
 	struct memblock_region *reg;
 	int boot_mapsize;
 	phys_addr_t _total_lowmem;
+#endif
 	phys_addr_t _lowmem_end_addr;
 
 #ifndef CONFIG_HIGHMEM
@@ -203,6 +205,7 @@ void __init do_init_bootmem(void)
 	max_low_pfn = _lowmem_end_addr >> PAGE_SHIFT;
 	min_low_pfn = MEMORY_START >> PAGE_SHIFT;
 
+#ifndef CONFIG_NO_BOOTMEM
 	/*
 	 * Find an area to use for the bootmem bitmap.  Calculate the size of
 	 * bitmap required as (Total Memory) / PAGE_SIZE / BITS_PER_BYTE.
@@ -214,12 +217,14 @@ void __init do_init_bootmem(void)
 	start = memblock_alloc(bootmap_pages << PAGE_SHIFT, PAGE_SIZE);
 
 	boot_mapsize = init_bootmem_node(NODE_DATA(0), start >> PAGE_SHIFT, min_low_pfn, max_low_pfn);
+#endif
 
 	/* Place all memblock_regions in the same node and merge contiguous
 	 * memblock_regions
 	 */
 	memblock_set_node(0, (phys_addr_t)ULLONG_MAX, &memblock.memory, 0);
 
+#ifndef CONFIG_NO_BOOTMEM
 	/* Add all physical memory to the bootmem map, mark each area
 	 * present.
 	 */
@@ -234,11 +239,14 @@ void __init do_init_bootmem(void)
 			reserve_bootmem(reg->base, trunc_size, BOOTMEM_DEFAULT);
 		}
 	}
+#endif
 
 	/* XXX need to clip this if using highmem? */
 	sparse_memory_present_with_active_regions(0);
 
+#ifndef CONFIG_NO_BOOTMEM
 	init_bootmem_done = 1;
+#endif
 }
 
 /* mark pages that don't exist as nosave */
-- 
1.9.2

^ permalink raw reply related

* Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
From: Benjamin Herrenschmidt @ 2014-05-06 20:35 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, paulus, Alexander Graf, kvm-ppc, kvm
In-Reply-To: <87ha52ritd.fsf@linux.vnet.ibm.com>

On Tue, 2014-05-06 at 21:38 +0530, Aneesh Kumar K.V wrote:

> >> I updated the commit message as below. Let me know if this is ok.
> >>
> >>      KVM: PPC: BOOK3S: HV: THP support for guest
> >
> > This has nothing to do with THP.
> 
> THP support in guest depend on KVM advertising MPSS feature. We already
> have rest of the changes needed to support transparent huge pages
> upstream. (We do support THP with PowerVM LPAR already). The primary
> motivation of this patch is to enable THP in powerkvm guest. 

I would argue (nit picking, I know ... :-) that the subject should be
"Enable MPSS support for guests", and the description can then explain
that this allows Linux guests to use THP.

Cheers,
Ben.

> >
> >>      
> >>      On recent IBM Power CPUs, while the hashed page table is looked up using
> >>      the page size from the segmentation hardware (i.e. the SLB), it is
> >>      possible to have the HPT entry indicate a larger page size.  Thus for
> >>      example it is possible to put a 16MB page in a 64kB segment, but since
> >>      the hash lookup is done using a 64kB page size, it may be necessary to
> >>      put multiple entries in the HPT for a single 16MB page.  This
> >>      capability is called mixed page-size segment (MPSS).  With MPSS,
> >>      there are two relevant page sizes: the base page size, which is the
> >>      size used in searching the HPT, and the actual page size, which is the
> >>      size indicated in the HPT entry. [ Note that the actual page size is
> >>      always >= base page size ].
> >>      
> >>      We advertise MPSS feature to guest only if the host CPU supports the
> >>      same. We use "ibm,segment-page-sizes" device tree node to advertise
> >>      the MPSS support. The penc encoding indicate whether we support
> >>      a specific combination of base page size and actual page size
> >>      in the same segment. It is also the value used in the L|LP encoding
> >>      of HPTE entry.
> >>      
> >>      In-order to support MPSS in guest, KVM need to handle the below details
> >>      * advertise MPSS via ibm,segment-page-sizes
> >>      * Decode the base and actual page size correctly from the HPTE entry
> >>        so that we know what we are dealing with in H_ENTER and and can do
> >
> > Which code path exactly changes for H_ENTER?
> 
> There is no real code path changes. Any code path that use
> hpte_page_size() is impacted. We return actual page size there. 
> 
> >
> >>        the appropriate TLB invalidation in H_REMOVE and evictions.
> >
> > Apart from the grammar (which is pretty broken for the part that is not 
> > copied from Paul) and the subject line this sounds quite reasonable.
> >
> 
> Wll try to fix.
> 
> -aneesh

^ permalink raw reply

* Re: [PATCH 2/2 v2] powerpc: Enable NO_BOOTMEM
From: Scott Wood @ 2014-05-06 21:49 UTC (permalink / raw)
  To: Emil Medve; +Cc: linuxppc-dev
In-Reply-To: <1399402084-6325-2-git-send-email-Emilian.Medve@Freescale.com>

On Tue, 2014-05-06 at 13:48 -0500, Emil Medve wrote:
> Currently bootmem is just a wrapper around memblock. This gets rid of
> the wrapper code just as other ARHC(es) did: x86, arm, etc.
> 
> For now only cover !NUMA systems/builds
> 
> Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
> ---
> 
> v2: Acknowledge that NUMA systems/builds are not covered by this patch
> 
>  arch/powerpc/Kconfig  | 3 +++
>  arch/powerpc/mm/mem.c | 8 ++++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index e099899..07b164b 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -475,6 +475,9 @@ config SYS_SUPPORTS_HUGETLBFS
>  
>  source "mm/Kconfig"
>  
> +config NO_BOOTMEM
> +	def_bool !NUMA

This will allow a user to manually turn on CONFIG_NO_BOOTMEM in the
presence of NUMA.  From the changelog it sounds like this is not what
you intended.

What are the issues with NUMA?  As is, you're not getting rid of wrapper
code -- only adding ifdefs.

-Scott

^ permalink raw reply

* Re: [PATCH 2/2 v2] powerpc: Enable NO_BOOTMEM
From: Scott Wood @ 2014-05-06 22:02 UTC (permalink / raw)
  To: Emil Medve; +Cc: linuxppc-dev
In-Reply-To: <1399412988.15726.202.camel@snotra.buserror.net>

On Tue, 2014-05-06 at 16:49 -0500, Scott Wood wrote:
> On Tue, 2014-05-06 at 13:48 -0500, Emil Medve wrote:
> > Currently bootmem is just a wrapper around memblock. This gets rid of
> > the wrapper code just as other ARHC(es) did: x86, arm, etc.
> > 
> > For now only cover !NUMA systems/builds
> > 
> > Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
> > ---
> > 
> > v2: Acknowledge that NUMA systems/builds are not covered by this patch
> > 
> >  arch/powerpc/Kconfig  | 3 +++
> >  arch/powerpc/mm/mem.c | 8 ++++++++
> >  2 files changed, 11 insertions(+)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index e099899..07b164b 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -475,6 +475,9 @@ config SYS_SUPPORTS_HUGETLBFS
> >  
> >  source "mm/Kconfig"
> >  
> > +config NO_BOOTMEM
> > +	def_bool !NUMA
> 
> This will allow a user to manually turn on CONFIG_NO_BOOTMEM in the
> presence of NUMA.  From the changelog it sounds like this is not what
> you intended.

Ignore this part -- I see it doesn't have an option string for it to
show up to the user.

-Scott

^ permalink raw reply

* Re: [PATCH 2/2 v2] powerpc: Enable NO_BOOTMEM
From: Emil Medve @ 2014-05-07  0:16 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <1399412988.15726.202.camel@snotra.buserror.net>

Hello Scott,

On 05/06/2014 04:49 PM, Scott Wood wrote:
> On Tue, 2014-05-06 at 13:48 -0500, Emil Medve wrote:
>> Currently bootmem is just a wrapper around memblock. This gets rid of
>> the wrapper code just as other ARHC(es) did: x86, arm, etc.
>>
>> For now only cover !NUMA systems/builds
>>
>> Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
>> ---
>>
>> v2: Acknowledge that NUMA systems/builds are not covered by this patch
>>
>>  arch/powerpc/Kconfig  | 3 +++
>>  arch/powerpc/mm/mem.c | 8 ++++++++
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index e099899..07b164b 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -475,6 +475,9 @@ config SYS_SUPPORTS_HUGETLBFS
>>  
>>  source "mm/Kconfig"
>>  
>> +config NO_BOOTMEM
>> +	def_bool !NUMA
> 
> This will allow a user to manually turn on CONFIG_NO_BOOTMEM in the
> presence of NUMA.  From the changelog it sounds like this is not what
> you intended.
> 
> What are the issues with NUMA?

Well, I don't have access to a NUMA box/board. I could enable NUMA for a
!NUMA board but I'd feel better if I could actually test/debug on a
relevant system

> As is, you're not getting rid of wrapper code -- only adding ifdefs.

First, you're talking about the bootmem initialization wrapper code for
powerpc. The actual bootmem code is in include/linux/bootmem.h and
mm/bootmem.c. We can't remove those files as they are still used by
other arches. Also, the word wrapper is somewhat imprecise as in powerpc
land bootmem sort of runs on top of memblock

When NO_BOOTMEM is configured the mm/nobootmem.c is used that is the
bootmem API actually re-implemented with memblock. The bootmem API is
used in various places in the arch independent code

This patch wants to isolate for removal the bootmem initialization code
for powerpc and to exclude mm/bootmem.c from being built. This being the
first step I didn't want to actually remove the code, so it will be easy
to debug if some issues crop up. Also, people that want the use the
bootmem code for some reason can easily do that. Once this change spends
some time in the tree, we can actually remove the bootmem initialization
code

Cheers,

^ permalink raw reply

* Re: [PATCH 2/2 v2] powerpc: Enable NO_BOOTMEM
From: Scott Wood @ 2014-05-07  2:44 UTC (permalink / raw)
  To: Emil Medve; +Cc: linuxppc-dev
In-Reply-To: <53697B5C.6090107@Freescale.com>

On Tue, 2014-05-06 at 19:16 -0500, Emil Medve wrote:
> Hello Scott,
> 
> 
> On 05/06/2014 04:49 PM, Scott Wood wrote:
> > On Tue, 2014-05-06 at 13:48 -0500, Emil Medve wrote:
> >> Currently bootmem is just a wrapper around memblock. This gets rid of
> >> the wrapper code just as other ARHC(es) did: x86, arm, etc.
> >>
> >> For now only cover !NUMA systems/builds
> >>
> >> Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
> >> ---
> >>
> >> v2: Acknowledge that NUMA systems/builds are not covered by this patch
> >>
> >>  arch/powerpc/Kconfig  | 3 +++
> >>  arch/powerpc/mm/mem.c | 8 ++++++++
> >>  2 files changed, 11 insertions(+)
> >>
> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> >> index e099899..07b164b 100644
> >> --- a/arch/powerpc/Kconfig
> >> +++ b/arch/powerpc/Kconfig
> >> @@ -475,6 +475,9 @@ config SYS_SUPPORTS_HUGETLBFS
> >>  
> >>  source "mm/Kconfig"
> >>  
> >> +config NO_BOOTMEM
> >> +	def_bool !NUMA
> > 
> > This will allow a user to manually turn on CONFIG_NO_BOOTMEM in the
> > presence of NUMA.  From the changelog it sounds like this is not what
> > you intended.
> > 
> > What are the issues with NUMA?
> 
> Well, I don't have access to a NUMA box/board. I could enable NUMA for a
> !NUMA board but I'd feel better if I could actually test/debug on a
> relevant system

You could first test with NUMA on a non-NUMA board, and then if that
works ask the list for help testing on NUMA hardware (and various
non-Freescale non-NUMA hardware, for that matter).

Is there a specific issue that would need to be addressed to make it
work on NUMA?

> > As is, you're not getting rid of wrapper code -- only adding ifdefs.
> 
> First, you're talking about the bootmem initialization wrapper code for
> powerpc. The actual bootmem code is in include/linux/bootmem.h and
> mm/bootmem.c. We can't remove those files as they are still used by
> other arches. Also, the word wrapper is somewhat imprecise as in powerpc
> land bootmem sort of runs on top of memblock

My point was just that the changelog says "This gets rid of wrapper
code" when it actually removes no source code, and adds configuration
complexity.

> When NO_BOOTMEM is configured the mm/nobootmem.c is used that is the
> bootmem API actually re-implemented with memblock. The bootmem API is
> used in various places in the arch independent code
> 
> This patch wants to isolate for removal the bootmem initialization code
> for powerpc and to exclude mm/bootmem.c from being built. This being the
> first step I didn't want to actually remove the code, so it will be easy
> to debug if some issues crop up. Also, people that want the use the
> bootmem code for some reason can easily do that. Once this change spends
> some time in the tree, we can actually remove the bootmem initialization
> code

Is there a plausible reason someone would "want to use the bootmem
code"?

While the "ifdef it for a while" approach is sometimes sensible, usually
it's better to just make the change rather than ifdef it.  Consider what
the code would look like if there were ifdefs for a ton of random
changes, half of which nobody ever bothered to go back and clean up
after the change got widespread testing.  Why is this patch risky enough
to warrant such an approach?  Shouldn't boot-time issues be pretty
obvious?

-Scott

^ permalink raw reply

* Re: [PATCH 4/6] powerpc/corenet: Create the dts components for the DPAA FMan
From: Scott Wood @ 2014-05-07  2:54 UTC (permalink / raw)
  To: Emil Medve; +Cc: devicetree, Shruti Kanetkar, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <53687913.70900@Freescale.com>

On Tue, 2014-05-06 at 00:54 -0500, Emil Medve wrote:
> Hello Scott,
> 
> 
> On 05/05/2014 06:25 PM, Scott Wood wrote:
> > On Sat, 2014-05-03 at 05:02 -0500, Emil Medve wrote:
> >> Hello Scott,
> >>
> >>
> >> On 04/21/2014 05:11 PM, Scott Wood wrote:
> >>> On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote:
> >>>> +fman@400000 {
> >>>> +	mdio@f1000 {
> >>>> +		#address-cells = <1>;
> >>>> +		#size-cells = <0>;
> >>>> +		compatible = "fsl,fman-xmdio";
> >>>> +		reg = <0xf1000 0x1000>;
> >>>> +	};
> >>>> +};
> >>>
> >>> I'd like to see a complete fman binding before we start adding pieces.
> >>
> >> The driver for the FMan 10 Gb/s MDIO has upstreamed a couple of years
> >> ago: '9f35a73 net/fsl: introduce Freescale 10G MDIO driver', granted
> >> without a binding writeup.
> > 
> > Pushing driver code through the netdev tree does not establish device
> > tree ABI.  Binding documents and dts files do.
> 
> Sure, ideally and formally. But upstreaming a driver represents, if
> nothing else, a statement of intent to observe a device tree ABI.

Ideally, yes (or rather, ideally the driver patch should have been
rejected due to lack of a binding document).  But in practice it's way
too easy for bad stuff to slip in via driver code, especially when it
goes via a subsystem maintainer that is different from the one who would
be reviewing the binding.

>  Via the SDK, FSL customers are using the device tree ABI the driver de
> facto establishes.

ABI of any sort established by the SDK or other non-upstream trees is
not binding on upstream.  Yes, it's a pain for customers, which is why
ABI should go upstream ASAP.

> >> This patch series should probably include a
> >> binding blurb. However, let's not gate this patchset on a complete
> >> binding for the FMan
> > 
> > I at least want to see enough of the FMan binding to have confidence
> > that what we're adding now is correct.
> 
> I'm not sure what you're looking for. The nodes we're adding are
> describing a very common CCSR space interface for quite common device blocks

...embedded in a variety of different blocks.

If the mdio can truly stand alone, then maybe just submit the mdio node
without being enclosed in an fman node.

> >> As you know we don't own the FMan work and the FMan work is... not ready
> >> for upstreaming.
> > 
> > I'm not asking for a driver, just a binding that describes hardware.  Is
> > there any reason why the fman node needs to be anywhere near as
> > complicated as it is in the SDK, if we're limiting it to actual hardware
> > description?
> 
> Is this a trick question? :-) Of course it doesn't need to be more
> complicated than actual hardware. But, to repeat myself, said
> description is not... ready and I don't know when it will be. Somebody
> else owns pushing the bulk of FMan upstream and I'd rather not step on
> their turf quite like this

If they want to defend their "turf" they can submit a patch.  Now.  This
has gone on long enough.

I'm tempted to submit a binding myself.  I don't know much about
datapath, so I'll probably screw it up.  Please beat me to it. :-)

> > Do we really need to have nodes for all the sub-blocks?
> 
> Definitely no, and internally I'm pushing to clean that up. However, you
> surely remember we've been pushing from the early days of P4080 and it's
> been, to put it optimistically, slow

Stop pushing them and start pushing patches.

Just do it in the right order. :-)

> >> In an attempt to make some sort of progress we've
> >> decided to upstream the pieces that are less controversial and MDIO is
> >> an obvious candidate
> >>
> >>>> +fman@400000 {
> >>>> +	mdio0: mdio@e1120 {
> >>>> +		#address-cells = <1>;
> >>>> +		#size-cells = <0>;
> >>>> +		compatible = "fsl,fman-mdio";
> >>>> +		reg = <0xe1120 0xee0>;
> >>>> +	};
> >>>> +};
> >>>
> >>> What is the difference between "fsl,fman-mdio" and "fsl,fman-xmdio"?  I
> >>> don't see the latter on the list of compatibles in patch 3/6.
> >>
> >> 'fsl,fman-mdio' is the 1 Gb/s MDIO (Clause 22 only). 'fsl,fman-xmdio' is
> >> the 10 Gb/s MDIO (Clause 45 only). We can respin this patch wi
> >>
> > 
> > "respin this patch wi..."?
> 
> Not sure where the end of that sentence went. I meant we'll re-spin with
> a binding for the 10 Gb/s MDIO block
> 
> >> I believe 'fsl,fman-mdio' (and others on that list) was added
> >> gratuitously as the FMan MDIO is completely compatible with the
> >> eTSEC/gianfar MDIO driver, but we can deal with that later
> > 
> > It's still good to identify the specific device, even if it's believed
> > to be 100% compatible.
> 
> You suggesting we create new compatibles for every instance/integration
> of a hardware block even though is identical with an earlier hardware
> integration?

"100% compatible" is a different statement from actually being identical
logic.  How do you know that absolutely nothing changed?

>  Well, I guess that's been done that and now we have about 8
> different compatibles that convey no real difference at all

So?  Remember that a node can have multiple compatible strings.  After
you identify the exact logic being used, you can list older instances
that are believed to be compatible.

> >>> Within each category, is the exact fman version discoverable from the
> >>> mdio registers?
> >>
> >> No, but that's irrelevant as that's not the difference between the two
> >> compatibles
> > 
> > It's relevant because it means the compatible string should have a block
> > version number in it, or at least some other way in the MDIO node to
> > indicate the block version.
> 
> The 1 Gb/s MDIO block doesn't track a version of its own and from a
> programming interface perspective it has no visible difference since
> eTSEC. The 10 Gb/s MDIO doesn't track a version of its own either and
> across the existing FMan versions is identical from a programming
> interface perspective
> 
> I guess we can append a 'v1.0' to the MDIO compatible(s). However, given
> the SDK we'll have to support the compatibles the (already upstream)
> drivers support. Dealing with all that legacy is going to be so tedious

I'm not going to insist that the drivers stop supporting what they
currently support, but I don't agree with the statement that "given the
SDK we'll have to...".

> >>>> +fman@500000 {
> >>>> +	#address-cells = <1>;
> >>>> +	#size-cells = <1>;
> >>>> +	compatible = "simple-bus";
> >>>
> >>> Why is this simple-bus?
> >>
> >> Because that's the translation type for the FMan sub-nodes.
> > 
> > What do you mean by "translation type"?
> 
> I mean address translation across buses

What translation across buses?

> >> We need it now to get the MDIO nodes probed
> > 
> > No.  "simple-bus" is stating an attribute of the hardware, that the
> > child nodes represent simple memory-mapped devices that can be used
> > without special bus knowledge.  I don't think that applies here.
> 
> Yes it does. The FMan sub-nodes are "simple memory-mapped devices that
> can be used without special bus knowledge". Perhaps you're thinking
> about the PHY devices on the MDIO bus

No, I'm not thinking about that.

What I find particularly disturbing about putting simple-bus on the fman
node is that it applies to whatever other subnodes get added in the
future, and we don't yet have any idea what those nodes will be (or at
least I don't).

> > You can get the MDIO node probed without misusing simple-bus by adding
> > the fman node's compatible to the probe list in the kernel code.
> 
> I think that's gratuitous and it's been done gratuitously in the past
> for CCSR space (sub-)nodes

CCSR is a simple-bus, so I'm not sure what you're referring to.

> > This sort of thing is why I want to see what the rest of the fman
> > binding will look like.
> > 
> >>  and we'll needed later to probe other nodes/devices that will have
> >> standalone drivers: MAC, MURAM. etc. 
> > 
> > How are they truly standalone?
> 
> I meant that they have individual drivers and they are not handled by
> the high-level FMan driver

That's software description, not hardware description.  Surely those
drivers cooperate in some manner.

> > The exist in service to the greater
> > entity that is fman.  They presumably work together in some fashion.
> 
> Some blocks can work independently. 

If any cannot, then simple-bus is wrong.

> The MURAM is an example and it seems the existing CPM/QE MURAM code
> allows it to be used as regular memory.

But it's supposed to be used for CPM/QE/Fman.  If an OS chooses to
ignore that and bind a generic driver to it, that's the OS's choice, but
the device tree shouldn't pretend that this is an unrelated bag of
devices.

>  The MDIO block could handle
> PHY(s) for other MACs in the system.

That doesn't necessarily mean the MDIO is totally independent.  E.g.
some versions of eTSEC have a TBIPA register that affects the operation
of the MDIO controller, but is not in the MDIO register area.

> >>> Where's the compatible?  Why is this file different from all the others?
> >>
> >> The FMan v3 MDIO block (supports both Clause 22/45) is compatible with
> >> the FMan v2 10 Gb/s MDIO (the xgmac-mdio driver). However, the driver
> >> needs a small clean-up patch (still in internal review) that will get it
> >> working for FMan v3 MDIO.
> > 
> > This suggests that it is not 100% backwards compatible.
> 
> It is. The code is just not everything it should be

The code works with one v2, and breaks with v3.  Thus, something is
different.  Whether the difference is enough to prevent claiming
compatibility depends on the detials (e.g. if the difference is
parameterized through another DT property, or if the difference is
within what is allowed by the specs of the original device), but it
suggests they are not literally the same logic.  Unles the difference is
not due to v2 versus v3 but some difference elsewhere in the system?

-Scott

^ permalink raw reply

* [PATCH 1/1] booke/watchdog: refine and clean up the codes
From: Tang Yuantian @ 2014-05-07  3:30 UTC (permalink / raw)
  To: scottwood; +Cc: Tang Yuantian, linuxppc-dev

From: Tang Yuantian <yuantian.tang@freescale.com>

Basically, this patch does the following:
1. Move the codes of parsing boot parameters from setup-common.c
   to driver. In this way, code reader can know directly that
   there are boot parameters that can change the timeout.
2. Make boot parameter 'booke_wdt_period' effective.
   currently, when driver is loaded, default timeout is always
   being used in stead of booke_wdt_period.
3. Wrap up the watchdog timeout in device struct and clean up
   unnecessary codes.

Signed-off-by: Tang Yuantian <yuantian.tang@freescale.com>
---
 arch/powerpc/kernel/setup-common.c | 27 --------------------
 drivers/watchdog/booke_wdt.c       | 51 ++++++++++++++++++++++++--------------
 2 files changed, 33 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index bc76cc6..5874aef 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -715,33 +715,6 @@ static int powerpc_debugfs_init(void)
 arch_initcall(powerpc_debugfs_init);
 #endif
 
-#ifdef CONFIG_BOOKE_WDT
-extern u32 booke_wdt_enabled;
-extern u32 booke_wdt_period;
-
-/* Checks wdt=x and wdt_period=xx command-line option */
-notrace int __init early_parse_wdt(char *p)
-{
-	if (p && strncmp(p, "0", 1) != 0)
-		booke_wdt_enabled = 1;
-
-	return 0;
-}
-early_param("wdt", early_parse_wdt);
-
-int __init early_parse_wdt_period(char *p)
-{
-	unsigned long ret;
-	if (p) {
-		if (!kstrtol(p, 0, &ret))
-			booke_wdt_period = ret;
-	}
-
-	return 0;
-}
-early_param("wdt_period", early_parse_wdt_period);
-#endif	/* CONFIG_BOOKE_WDT */
-
 void ppc_printk_progress(char *s, unsigned short hex)
 {
 	pr_info("%s\n", s);
diff --git a/drivers/watchdog/booke_wdt.c b/drivers/watchdog/booke_wdt.c
index a8dbceb3..08a7853 100644
--- a/drivers/watchdog/booke_wdt.c
+++ b/drivers/watchdog/booke_wdt.c
@@ -41,6 +41,28 @@ u32 booke_wdt_period = CONFIG_BOOKE_WDT_DEFAULT_TIMEOUT;
 #define WDTP_MASK	(TCR_WP_MASK)
 #endif
 
+/* Checks wdt=x and wdt_period=xx command-line option */
+notrace int __init early_parse_wdt(char *p)
+{
+	if (p && strncmp(p, "0", 1) != 0)
+		booke_wdt_enabled = 1;
+
+	return 0;
+}
+early_param("wdt", early_parse_wdt);
+
+int __init early_parse_wdt_period(char *p)
+{
+	unsigned long ret;
+	if (p) {
+		if (!kstrtol(p, 0, &ret))
+			booke_wdt_period = ret;
+	}
+
+	return 0;
+}
+early_param("wdt_period", early_parse_wdt_period);
+
 #ifdef CONFIG_PPC_FSL_BOOK3E
 
 /* For the specified period, determine the number of seconds
@@ -103,17 +125,18 @@ static unsigned int sec_to_period(unsigned int secs)
 static void __booke_wdt_set(void *data)
 {
 	u32 val;
+	struct watchdog_device *wdog = data;
 
 	val = mfspr(SPRN_TCR);
 	val &= ~WDTP_MASK;
-	val |= WDTP(booke_wdt_period);
+	val |= WDTP(sec_to_period(wdog->timeout));
 
 	mtspr(SPRN_TCR, val);
 }
 
-static void booke_wdt_set(void)
+static void booke_wdt_set(void *data)
 {
-	on_each_cpu(__booke_wdt_set, NULL, 0);
+	on_each_cpu(__booke_wdt_set, data, 0);
 }
 
 static void __booke_wdt_ping(void *data)
@@ -131,12 +154,13 @@ static int booke_wdt_ping(struct watchdog_device *wdog)
 static void __booke_wdt_enable(void *data)
 {
 	u32 val;
+	struct watchdog_device *wdog = data;
 
 	/* clear status before enabling watchdog */
 	__booke_wdt_ping(NULL);
 	val = mfspr(SPRN_TCR);
 	val &= ~WDTP_MASK;
-	val |= (TCR_WIE|TCR_WRC(WRC_CHIP)|WDTP(booke_wdt_period));
+	val |= (TCR_WIE|TCR_WRC(WRC_CHIP)|WDTP(sec_to_period(wdog->timeout)));
 
 	mtspr(SPRN_TCR, val);
 }
@@ -162,25 +186,17 @@ static void __booke_wdt_disable(void *data)
 
 }
 
-static void __booke_wdt_start(struct watchdog_device *wdog)
+static int booke_wdt_start(struct watchdog_device *wdog)
 {
-	on_each_cpu(__booke_wdt_enable, NULL, 0);
+	on_each_cpu(__booke_wdt_enable, wdog, 0);
 	pr_debug("watchdog enabled (timeout = %u sec)\n", wdog->timeout);
-}
 
-static int booke_wdt_start(struct watchdog_device *wdog)
-{
-	if (booke_wdt_enabled == 0) {
-		booke_wdt_enabled = 1;
-		__booke_wdt_start(wdog);
-	}
 	return 0;
 }
 
 static int booke_wdt_stop(struct watchdog_device *wdog)
 {
 	on_each_cpu(__booke_wdt_disable, NULL, 0);
-	booke_wdt_enabled = 0;
 	pr_debug("watchdog disabled\n");
 
 	return 0;
@@ -191,9 +207,8 @@ static int booke_wdt_set_timeout(struct watchdog_device *wdt_dev,
 {
 	if (timeout > MAX_WDT_TIMEOUT)
 		return -EINVAL;
-	booke_wdt_period = sec_to_period(timeout);
 	wdt_dev->timeout = timeout;
-	booke_wdt_set();
+	booke_wdt_set(wdt_dev);
 
 	return 0;
 }
@@ -231,10 +246,10 @@ static int __init booke_wdt_init(void)
 	pr_info("powerpc book-e watchdog driver loaded\n");
 	booke_wdt_info.firmware_version = cur_cpu_spec->pvr_value;
 	booke_wdt_set_timeout(&booke_wdt_dev,
-			      period_to_sec(CONFIG_BOOKE_WDT_DEFAULT_TIMEOUT));
+			      period_to_sec(booke_wdt_period));
 	watchdog_set_nowayout(&booke_wdt_dev, nowayout);
 	if (booke_wdt_enabled)
-		__booke_wdt_start(&booke_wdt_dev);
+		booke_wdt_start(&booke_wdt_dev);
 
 	ret = watchdog_register_device(&booke_wdt_dev);
 
-- 
1.8.5

^ permalink raw reply related

* Re: [PATCH] KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options on
From: Paul Mackerras @ 2014-05-07  5:56 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <1399224368-22122-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Sun, May 04, 2014 at 10:56:08PM +0530, Aneesh Kumar K.V wrote:
> With debug option "sleep inside atomic section checking" enabled we get
> the below WARN_ON during a PR KVM boot. This is because upstream now
> have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the
> warning by adding preempt_disable/enable around floating point and altivec
> enable.

This worries me a bit.  In this code:

>  	if (msr & MSR_FP) {
> +		preempt_disable();
>  		enable_kernel_fp();
>  		load_fp_state(&vcpu->arch.fp);
>  		t->fp_save_area = &vcpu->arch.fp;
> +		preempt_enable();

What would happen if we actually did get preempted at this point?
Wouldn't we lose the FP state we just loaded?

In other words, how come we're not already preempt-disabled at this
point?

Paul.

^ permalink raw reply

* Re: [PATCH] powerpc/fsl: Updated corenet-cf compatible string for corenet1-cf chips
From: Diana Craciun @ 2014-05-07  6:14 UTC (permalink / raw)
  To: Scott Wood; +Cc: devicetree, linuxppc-dev
In-Reply-To: <1399395448.15726.185.camel@snotra.buserror.net>

On 05/06/2014 07:57 PM, Scott Wood wrote:
> On Tue, 2014-05-06 at 17:56 +0300, Diana Craciun wrote:
>> From: Diana Craciun <Diana.Craciun@freescale.com>
>>
>> Updated the device trees according to the corenet-cf
>> binding definition.
>>
>> Signed-off-by: Diana Craciun <Diana.Craciun@freescale.com>
>> ---
>>   arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 2 +-
>>   arch/powerpc/boot/dts/fsl/p3041si-post.dtsi | 2 +-
>>   arch/powerpc/boot/dts/fsl/p4080si-post.dtsi | 2 +-
>>   arch/powerpc/boot/dts/fsl/p5020si-post.dtsi | 2 +-
>>   4 files changed, 4 insertions(+), 4 deletions(-)
> Oops, I meant to include this in the patch I sent, but forgot to squash
> the two patches together. :-P
>
> Where's p5040?

Ups, somehow I failed to commit that file, respinning....

Diana

^ permalink raw reply

* [PATCH v2] powerpc/fsl: Updated corenet-cf compatible string for corenet1-cf chips
From: Diana Craciun @ 2014-05-07  6:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: scottwood, devicetree, Diana Craciun

From: Diana Craciun <Diana.Craciun@freescale.com>

Updated the device trees according to the corenet-cf
binding definition.

Signed-off-by: Diana Craciun <Diana.Craciun@freescale.com>
---
v2:
	Added missing p5040

 arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 2 +-
 arch/powerpc/boot/dts/fsl/p3041si-post.dtsi | 2 +-
 arch/powerpc/boot/dts/fsl/p4080si-post.dtsi | 2 +-
 arch/powerpc/boot/dts/fsl/p5020si-post.dtsi | 2 +-
 arch/powerpc/boot/dts/fsl/p5040si-post.dtsi | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi b/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi
index e2987a3..b5daa4c 100644
--- a/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi
@@ -246,7 +246,7 @@
 	};
 
 	corenet-cf@18000 {
-		compatible = "fsl,corenet-cf";
+		compatible = "fsl,corenet1-cf", "fsl,corenet-cf";
 		reg = <0x18000 0x1000>;
 		interrupts = <16 2 1 31>;
 		fsl,ccf-num-csdids = <32>;
diff --git a/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi b/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi
index 7af6d45..5abd1fc 100644
--- a/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi
@@ -273,7 +273,7 @@
 	};
 
 	corenet-cf@18000 {
-		compatible = "fsl,corenet-cf";
+		compatible = "fsl,corenet1-cf", "fsl,corenet-cf";
 		reg = <0x18000 0x1000>;
 		interrupts = <16 2 1 31>;
 		fsl,ccf-num-csdids = <32>;
diff --git a/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi b/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi
index 2415e1f..bf0e7c9 100644
--- a/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi
@@ -281,7 +281,7 @@
 	};
 
 	corenet-cf@18000 {
-		compatible = "fsl,corenet-cf";
+		compatible = "fsl,corenet1-cf", "fsl,corenet-cf";
 		reg = <0x18000 0x1000>;
 		interrupts = <16 2 1 31>;
 		fsl,ccf-num-csdids = <32>;
diff --git a/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi b/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
index 2985de4..f7ca9f4 100644
--- a/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
@@ -278,7 +278,7 @@
 	};
 
 	corenet-cf@18000 {
-		compatible = "fsl,corenet-cf";
+		compatible = "fsl,corenet1-cf", "fsl,corenet-cf";
 		reg = <0x18000 0x1000>;
 		interrupts = <16 2 1 31>;
 		fsl,ccf-num-csdids = <32>;
diff --git a/arch/powerpc/boot/dts/fsl/p5040si-post.dtsi b/arch/powerpc/boot/dts/fsl/p5040si-post.dtsi
index 546a899..91477b5 100644
--- a/arch/powerpc/boot/dts/fsl/p5040si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p5040si-post.dtsi
@@ -233,7 +233,7 @@
 	};
 
 	corenet-cf@18000 {
-		compatible = "fsl,corenet-cf";
+		compatible = "fsl,corenet1-cf", "fsl,corenet-cf";
 		reg = <0x18000 0x1000>;
 		interrupts = <16 2 1 31>;
 		fsl,ccf-num-csdids = <32>;
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH 2/2 v2] powerpc: Enable NO_BOOTMEM
From: Aneesh Kumar K.V @ 2014-05-07  6:35 UTC (permalink / raw)
  To: Emil Medve, benh, linuxppc-dev; +Cc: Emil Medve
In-Reply-To: <1399402084-6325-2-git-send-email-Emilian.Medve@Freescale.com>

Emil Medve <Emilian.Medve@Freescale.com> writes:

> Currently bootmem is just a wrapper around memblock. This gets rid of
> the wrapper code just as other ARHC(es) did: x86, arm, etc.
>
> For now only cover !NUMA systems/builds
>
> Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
> ---
>
> v2: Acknowledge that NUMA systems/builds are not covered by this patch
>
>  arch/powerpc/Kconfig  | 3 +++
>  arch/powerpc/mm/mem.c | 8 ++++++++
>  2 files changed, 11 insertions(+)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index e099899..07b164b 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -475,6 +475,9 @@ config SYS_SUPPORTS_HUGETLBFS
>  
>  source "mm/Kconfig"
>  
> +config NO_BOOTMEM
> +	def_bool !NUMA


There is actually one in  mm/Kconfig

So I guess you should make the platform that you are interested
/tested just do select No_BOOTMEM like we do in arch/arm/Kconfig etc ?

-aneesh

^ permalink raw reply

* Re: [PATCH] KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options on
From: Aneesh Kumar K.V @ 2014-05-07  7:58 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <20140507055626.GA26650@iris.ozlabs.ibm.com>

Paul Mackerras <paulus@samba.org> writes:

> On Sun, May 04, 2014 at 10:56:08PM +0530, Aneesh Kumar K.V wrote:
>> With debug option "sleep inside atomic section checking" enabled we get
>> the below WARN_ON during a PR KVM boot. This is because upstream now
>> have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the
>> warning by adding preempt_disable/enable around floating point and altivec
>> enable.
>
> This worries me a bit.  In this code:
>
>>  	if (msr & MSR_FP) {
>> +		preempt_disable();
>>  		enable_kernel_fp();
>>  		load_fp_state(&vcpu->arch.fp);
>>  		t->fp_save_area = &vcpu->arch.fp;
>> +		preempt_enable();
>
> What would happen if we actually did get preempted at this point?
> Wouldn't we lose the FP state we just loaded?

I was not sure we have got CONFIG_PREEMPT working with kvm. So i was
not looking at preempted case. But yes, if we have PREEMPT enabled it
would break as per the current code. 

>
> In other words, how come we're not already preempt-disabled at this
> point?

I don't see us disabling preempt in the code path. Are you suggesting
that we should be preempt disabled for the whole duration of
kvmppc_handle_ext ? 

-aneesh

^ permalink raw reply

* Re: [PATCH v4 8/8] DMA: Freescale: add suspend resume functions for DMA driver
From: Shevchenko, Andriy @ 2014-05-07  8:31 UTC (permalink / raw)
  To: Hongbo Zhang
  Cc: leo.li@freescale.com, Koul, Vinod, linux-kernel@vger.kernel.org,
	scottwood@freescale.com, vkoul@infradead.org,
	dmaengine@vger.kernel.org, Williams, Dan J,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <536614E7.9050301@freescale.com>

T24gU3VuLCAyMDE0LTA1LTA0IGF0IDE4OjIyICswODAwLCBIb25nYm8gWmhhbmcgd3JvdGU6DQo+
IE9uIDA1LzAzLzIwMTQgMTI6NDYgQU0sIFZpbm9kIEtvdWwgd3JvdGU6DQo+ID4gT24gRnJpLCBB
cHIgMTgsIDIwMTQgYXQgMDQ6MTc6NTFQTSArMDgwMCwgaG9uZ2JvLnpoYW5nQGZyZWVzY2FsZS5j
b20gd3JvdGU6DQo+ID4+IEZyb206IEhvbmdibyBaaGFuZyA8aG9uZ2JvLnpoYW5nQGZyZWVzY2Fs
ZS5jb20+DQo+ID4+DQo+ID4+IFRoaXMgcGF0Y2ggYWRkcyBzdXNwZW5kIHJlc3VtZSBmdW5jdGlv
bnMgZm9yIEZyZWVzY2FsZSBETUEgZHJpdmVyLg0KPiA+PiAucHJlcGFyZSBjYWxsYmFjayBpcyB1
c2VkIHRvIHN0b3AgZnVydGhlciBkZXNjcmlwdG9ycyBmcm9tIGJlaW5nIGFkZGVkIGludG8gdGhl
DQo+ID4+IHBlbmRpbmcgcXVldWUsIGFuZCBhbHNvIGlzc3VlIHBlbmRpbmcgcXVldWVzIGludG8g
ZXhlY3V0aW9uIGlmIHRoZXJlIGlzIGFueS4NCj4gPj4gLnN1c3BlbmQgY2FsbGJhY2sgbWFrZXMg
c3VyZSBhbGwgdGhlIHBlbmRpbmcgam9icyBhcmUgY2xlYW5lZCB1cCBhbmQgYWxsIHRoZQ0KPiA+
PiBjaGFubmVscyBhcmUgaWRsZSwgYW5kIHNhdmUgdGhlIG1vZGUgcmVnaXN0ZXJzLg0KPiA+PiAu
cmVzdW1lIGNhbGxiYWNrIHJlLWluaXRpYWxpemVzIHRoZSBjaGFubmVscyBieSByZXN0b3JlIHRo
ZSBtb2RlIHJlZ2lzdGVycy4NCj4gPj4NCj4gPj4gKw0KPiA+PiArc3RhdGljIGNvbnN0IHN0cnVj
dCBkZXZfcG1fb3BzIGZzbGRtYV9wbV9vcHMgPSB7DQo+ID4+ICsJLnByZXBhcmUJPSBmc2xkbWFf
cHJlcGFyZSwNCj4gPj4gKwkuc3VzcGVuZAk9IGZzbGRtYV9zdXNwZW5kLA0KPiA+PiArCS5yZXN1
bWUJCT0gZnNsZG1hX3Jlc3VtZSwNCj4gPj4gK307DQo+ID4gSSB0aGluayB0aGlzIGlzIG5vdCBj
b3JyZWN0LiBXZSBkaXNjdXNzZWQgdGhpcyBzb21ldGltZSBiYWNrIG9uIGxpc3QuIFRoZQ0KPiA+
IERNQWVuZ2luZSBkcml2ZXJzIHNob3VsZCB1c2UgbGF0ZSByZXN1bWUgYW5kIGVhcmx5IHN1c3Bl
bmQgdG8gZW5zdXJlIHRoZXkgZ2V0DQo+ID4gc3VzcGVuZGVkIGFmdGVyIGNsaWVudHMgKHdobyBz
aG91bGQgdXNlIG5vcm1hbCBvbmVzKSBhbmQgcmVzdW1lIGJlZm9yZSB0aGVtDQo+ID4NCj4gT0ss
IHdpbGwgdXBkYXRlIGl0IGxpa2UgdGhpczoNCj4gdXNlIC5zdXNwZW5kIHRvIHRha2UgcGxhY2Ug
b2YgY3VycmVudCAucHJlcGFyZQ0KDQpDb3VsZCB5b3UgcmVtb3ZlIHRoaXMgYXQgYWxsPw0KDQpB
bnN3ZXJpbmcgdG8geW91ciBwcmV2aW91cyBzdGF0ZW1lbnRzIEkgY291bGQgc2F5IHRoYXQuDQpE
ZXZpY2UgZHJpdmVycyAoRE1BYyB1c2VycykgdGhhdCBkb24ndCBpbXBsZW1lbnQgLnN1c3BlbmQg
Y2FsbGJhY2sgYXJlDQpvbiB0aGVpciBvd24gd2l0aCB0cm91YmxlcywgeW91IGhhdmUgbm90IHRv
IGNhcmUgYWJvdXQgdGhlbSBpbiB0aGUgRE1BDQpkcml2ZXIuDQoNCj4gdXNlIC5zdXNwZW5kX2xh
dGUgdG8gdGFrZSBwbGFjZSBvZiBjdXJyZW50IC5zdXNwZW5kDQo+IHVzZSAucmVzdW1lX2Vhcmx5
IHRvIHRha2UgcGxhY2Ugb2YgY3VycmVudCAucmVzdW1lDQo+IA0KPiANCj4gDQo+IC0tDQo+IFRv
IHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBk
bWFlbmdpbmUiIGluDQo+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5r
ZXJuZWwub3JnDQo+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5v
cmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KDQoNCi0tIA0KQW5keSBTaGV2Y2hlbmtvIDxhbmRyaXku
c2hldmNoZW5rb0BpbnRlbC5jb20+DQpJbnRlbCBGaW5sYW5kIE95DQotLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KSW50
ZWwgRmlubGFuZCBPeQpSZWdpc3RlcmVkIEFkZHJlc3M6IFBMIDI4MSwgMDAxODEgSGVsc2lua2kg
CkJ1c2luZXNzIElkZW50aXR5IENvZGU6IDAzNTc2MDYgLSA0IApEb21pY2lsZWQgaW4gSGVsc2lu
a2kgCgpUaGlzIGUtbWFpbCBhbmQgYW55IGF0dGFjaG1lbnRzIG1heSBjb250YWluIGNvbmZpZGVu
dGlhbCBtYXRlcmlhbCBmb3IKdGhlIHNvbGUgdXNlIG9mIHRoZSBpbnRlbmRlZCByZWNpcGllbnQo
cykuIEFueSByZXZpZXcgb3IgZGlzdHJpYnV0aW9uCmJ5IG90aGVycyBpcyBzdHJpY3RseSBwcm9o
aWJpdGVkLiBJZiB5b3UgYXJlIG5vdCB0aGUgaW50ZW5kZWQKcmVjaXBpZW50LCBwbGVhc2UgY29u
dGFjdCB0aGUgc2VuZGVyIGFuZCBkZWxldGUgYWxsIGNvcGllcy4K

^ permalink raw reply

* Re: [PATCH] KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options on
From: Alexander Graf @ 2014-05-07 11:37 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Aneesh Kumar K.V, kvm-ppc, kvm
In-Reply-To: <20140507055626.GA26650@iris.ozlabs.ibm.com>

On 05/07/2014 07:56 AM, Paul Mackerras wrote:
> On Sun, May 04, 2014 at 10:56:08PM +0530, Aneesh Kumar K.V wrote:
>> With debug option "sleep inside atomic section checking" enabled we get
>> the below WARN_ON during a PR KVM boot. This is because upstream now
>> have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the
>> warning by adding preempt_disable/enable around floating point and altivec
>> enable.
> This worries me a bit.  In this code:
>
>>   	if (msr & MSR_FP) {
>> +		preempt_disable();
>>   		enable_kernel_fp();
>>   		load_fp_state(&vcpu->arch.fp);
>>   		t->fp_save_area = &vcpu->arch.fp;
>> +		preempt_enable();
> What would happen if we actually did get preempted at this point?
> Wouldn't we lose the FP state we just loaded?
>
> In other words, how come we're not already preempt-disabled at this
> point?

This is probably because we're trying to confuse Linux :). The entry 
path happens with interrupts hard disabled, but preempt enabled so that 
Linux doesn't consider the guest time as non-preemptible. That's the 
only call I could find where preempt is logically enabled (though it 
really isn't).


Alex

^ permalink raw reply

* [RFT PATCH -next ] [BUGFIX] kprobes: Fix "Failed to find blacklist" error on ia64 and ppc64
From: Masami Hiramatsu @ 2014-05-07 11:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Tony Luck
  Cc: Jeremy Fitzhardinge, linux-ia64, sparse, H. Peter Anvin,
	Thomas Gleixner, linux-tip-commits, anil.s.keshavamurthy,
	Ingo Molnar, Fenghua Yu, Arnd Bergmann, Rusty Russell,
	Chris Wright, yrl.pp-manager.tt, akataria, Tony Luck, Kevin Hao,
	Linus Torvalds, rdunlap, Linux Kernel Mailing List, dl9pf,
	Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <536A16A6.5040709@hitachi.com>

On ia64 and ppc64, the function pointer does not point the
entry address of the function, but the address of function
discriptor (which contains the entry address and misc
data.) Since the kprobes passes the function pointer stored
by NOKPROBE_SYMBOL() to kallsyms_lookup_size_offset() for
initalizing its blacklist, it fails and reports many errors
as below.

  Failed to find blacklist 0001013168300000
  Failed to find blacklist 0001013000f0a000
  Failed to find blacklist 000101315f70a000
  Failed to find blacklist 000101324c80a000
  Failed to find blacklist 0001013063f0a000
  Failed to find blacklist 000101327800a000
  Failed to find blacklist 0001013277f0a000
  Failed to find blacklist 000101315a70a000
  Failed to find blacklist 0001013277e0a000
  Failed to find blacklist 000101305a20a000
  Failed to find blacklist 0001013277d0a000
  Failed to find blacklist 00010130bdc0a000
  Failed to find blacklist 00010130dc20a000
  Failed to find blacklist 000101309a00a000
  Failed to find blacklist 0001013277c0a000
  Failed to find blacklist 0001013277b0a000
  Failed to find blacklist 0001013277a0a000
  Failed to find blacklist 000101327790a000
  Failed to find blacklist 000101303140a000
  Failed to find blacklist 0001013a3280a000

To fix this bug, this introduces function_entry() macro to
retrieve the entry address from the given function pointer,
and uses it in NOKPROBE_SYMBOL().


Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reported-by: Tony Luck <tony.luck@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/ia64/include/asm/types.h    |    2 ++
 arch/powerpc/include/asm/types.h |   11 +++++++++++
 include/linux/kprobes.h          |    3 ++-
 include/linux/types.h            |    4 ++++
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/types.h b/arch/ia64/include/asm/types.h
index 4c351b1..6ab7b6c 100644
--- a/arch/ia64/include/asm/types.h
+++ b/arch/ia64/include/asm/types.h
@@ -27,5 +27,7 @@ struct fnptr {
 	unsigned long gp;
 };
 
+#define constant_function_entry(fn) (((struct fnptr *)(fn))->ip)
+
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASM_IA64_TYPES_H */
diff --git a/arch/powerpc/include/asm/types.h b/arch/powerpc/include/asm/types.h
index bfb6ded..fd297b8 100644
--- a/arch/powerpc/include/asm/types.h
+++ b/arch/powerpc/include/asm/types.h
@@ -25,6 +25,17 @@ typedef struct {
 	unsigned long env;
 } func_descr_t;
 
+#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF == 1)
+/*
+ * On PPC64 ABIv1 the function pointer actually points to the
+ * function's descriptor. The first entry in the descriptor is the
+ * address of the function text.
+ */
+#define constant_function_entry(fn)	(((func_descr_t *)(fn))->entry)
+#else
+#define constant_function_entry(fn)	((unsigned long)(fn))
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_TYPES_H */
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index e059507..637eafe 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -40,6 +40,7 @@
 #include <linux/rcupdate.h>
 #include <linux/mutex.h>
 #include <linux/ftrace.h>
+#include <linux/types.h>
 
 #ifdef CONFIG_KPROBES
 #include <asm/kprobes.h>
@@ -485,7 +486,7 @@ static inline int enable_jprobe(struct jprobe *jp)
 #define __NOKPROBE_SYMBOL(fname)			\
 static unsigned long __used				\
 	__attribute__((section("_kprobe_blacklist")))	\
-	_kbl_addr_##fname = (unsigned long)fname;
+	_kbl_addr_##fname = constant_function_entry(fname);
 #define NOKPROBE_SYMBOL(fname)	__NOKPROBE_SYMBOL(fname)
 #else
 #define NOKPROBE_SYMBOL(fname)
diff --git a/include/linux/types.h b/include/linux/types.h
index 4d118ba..78e2d7d 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -212,5 +212,9 @@ struct callback_head {
 };
 #define rcu_head callback_head
 
+#ifndef constant_function_entry
+#define constant_function_entry(fn)	((unsigned long)(fn))
+#endif
+
 #endif /*  __ASSEMBLY__ */
 #endif /* _LINUX_TYPES_H */

^ permalink raw reply related

* Re: [RFT PATCH -next ] [BUGFIX] kprobes: Fix "Failed to find blacklist" error on ia64 and ppc64
From: Masami Hiramatsu @ 2014-05-07 11:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Tony Luck
  Cc: Jeremy Fitzhardinge, linux-ia64, sparse, H. Peter Anvin,
	Thomas Gleixner, linux-tip-commits, anil.s.keshavamurthy,
	Ingo Molnar, Fenghua Yu, Arnd Bergmann, Rusty Russell,
	Chris Wright, yrl.pp-manager.tt, akataria, Tony Luck, Kevin Hao,
	Linus Torvalds, rdunlap, Linux Kernel Mailing List, dl9pf,
	Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <20140507115551.22259.70581.stgit@ltc230.yrl.intra.hitachi.co.jp>

Hi Tony, Benjamin and Paul,

I've tried to fix this bug, but since I don't have either ppc64 nor ia64,
this patch is not tested on those archs. Please review and test it on
those machines.

Thank you,

(2014/05/07 20:55), Masami Hiramatsu wrote:
> On ia64 and ppc64, the function pointer does not point the
> entry address of the function, but the address of function
> discriptor (which contains the entry address and misc
> data.) Since the kprobes passes the function pointer stored
> by NOKPROBE_SYMBOL() to kallsyms_lookup_size_offset() for
> initalizing its blacklist, it fails and reports many errors
> as below.
> 
>   Failed to find blacklist 0001013168300000
>   Failed to find blacklist 0001013000f0a000
>   Failed to find blacklist 000101315f70a000
>   Failed to find blacklist 000101324c80a000
>   Failed to find blacklist 0001013063f0a000
>   Failed to find blacklist 000101327800a000
>   Failed to find blacklist 0001013277f0a000
>   Failed to find blacklist 000101315a70a000
>   Failed to find blacklist 0001013277e0a000
>   Failed to find blacklist 000101305a20a000
>   Failed to find blacklist 0001013277d0a000
>   Failed to find blacklist 00010130bdc0a000
>   Failed to find blacklist 00010130dc20a000
>   Failed to find blacklist 000101309a00a000
>   Failed to find blacklist 0001013277c0a000
>   Failed to find blacklist 0001013277b0a000
>   Failed to find blacklist 0001013277a0a000
>   Failed to find blacklist 000101327790a000
>   Failed to find blacklist 000101303140a000
>   Failed to find blacklist 0001013a3280a000
> 
> To fix this bug, this introduces function_entry() macro to
> retrieve the entry address from the given function pointer,
> and uses it in NOKPROBE_SYMBOL().
> 
> 
> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Reported-by: Tony Luck <tony.luck@gmail.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> Cc: Kevin Hao <haokexin@gmail.com>
> Cc: linux-ia64@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> ---
>  arch/ia64/include/asm/types.h    |    2 ++
>  arch/powerpc/include/asm/types.h |   11 +++++++++++
>  include/linux/kprobes.h          |    3 ++-
>  include/linux/types.h            |    4 ++++
>  4 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/ia64/include/asm/types.h b/arch/ia64/include/asm/types.h
> index 4c351b1..6ab7b6c 100644
> --- a/arch/ia64/include/asm/types.h
> +++ b/arch/ia64/include/asm/types.h
> @@ -27,5 +27,7 @@ struct fnptr {
>  	unsigned long gp;
>  };
>  
> +#define constant_function_entry(fn) (((struct fnptr *)(fn))->ip)
> +
>  #endif /* !__ASSEMBLY__ */
>  #endif /* _ASM_IA64_TYPES_H */
> diff --git a/arch/powerpc/include/asm/types.h b/arch/powerpc/include/asm/types.h
> index bfb6ded..fd297b8 100644
> --- a/arch/powerpc/include/asm/types.h
> +++ b/arch/powerpc/include/asm/types.h
> @@ -25,6 +25,17 @@ typedef struct {
>  	unsigned long env;
>  } func_descr_t;
>  
> +#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF == 1)
> +/*
> + * On PPC64 ABIv1 the function pointer actually points to the
> + * function's descriptor. The first entry in the descriptor is the
> + * address of the function text.
> + */
> +#define constant_function_entry(fn)	(((func_descr_t *)(fn))->entry)
> +#else
> +#define constant_function_entry(fn)	((unsigned long)(fn))
> +#endif
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_POWERPC_TYPES_H */
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index e059507..637eafe 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -40,6 +40,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/mutex.h>
>  #include <linux/ftrace.h>
> +#include <linux/types.h>
>  
>  #ifdef CONFIG_KPROBES
>  #include <asm/kprobes.h>
> @@ -485,7 +486,7 @@ static inline int enable_jprobe(struct jprobe *jp)
>  #define __NOKPROBE_SYMBOL(fname)			\
>  static unsigned long __used				\
>  	__attribute__((section("_kprobe_blacklist")))	\
> -	_kbl_addr_##fname = (unsigned long)fname;
> +	_kbl_addr_##fname = constant_function_entry(fname);
>  #define NOKPROBE_SYMBOL(fname)	__NOKPROBE_SYMBOL(fname)
>  #else
>  #define NOKPROBE_SYMBOL(fname)
> diff --git a/include/linux/types.h b/include/linux/types.h
> index 4d118ba..78e2d7d 100644
> --- a/include/linux/types.h
> +++ b/include/linux/types.h
> @@ -212,5 +212,9 @@ struct callback_head {
>  };
>  #define rcu_head callback_head
>  
> +#ifndef constant_function_entry
> +#define constant_function_entry(fn)	((unsigned long)(fn))
> +#endif
> +
>  #endif /*  __ASSEMBLY__ */
>  #endif /* _LINUX_TYPES_H */
> 
> 
> 


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply

* Re: [PATCH 2/2 v2] powerpc: Enable NO_BOOTMEM
From: Emil Medve @ 2014-05-07 18:37 UTC (permalink / raw)
  To: Aneesh Kumar K.V, benh, linuxppc-dev
In-Reply-To: <87lhueumdp.fsf@linux.vnet.ibm.com>

Hello Aneesh,


On 05/07/2014 01:35 AM, Aneesh Kumar K.V wrote:
> Emil Medve <Emilian.Medve@Freescale.com> writes:
> 
>> Currently bootmem is just a wrapper around memblock. This gets rid of
>> the wrapper code just as other ARHC(es) did: x86, arm, etc.
>>
>> For now only cover !NUMA systems/builds
>>
>> Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
>> ---
>>
>> v2: Acknowledge that NUMA systems/builds are not covered by this patch
>>
>>  arch/powerpc/Kconfig  | 3 +++
>>  arch/powerpc/mm/mem.c | 8 ++++++++
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index e099899..07b164b 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -475,6 +475,9 @@ config SYS_SUPPORTS_HUGETLBFS
>>  
>>  source "mm/Kconfig"
>>  
>> +config NO_BOOTMEM
>> +	def_bool !NUMA
> 
> 
> There is actually one in  mm/Kconfig
> 
> So I guess you should make the platform that you are interested
> /tested just do select No_BOOTMEM like we do in arch/arm/Kconfig etc ?

I had a look at what was done for x86 and I missed why the mm/Kconfig
symbol was added. Will include your suggestion in the next version


Cheers,

^ permalink raw reply

* Re: [PATCH 00/38] MMC updates, plus CuBox-i WiFi support
From: Tim Kryger @ 2014-05-07 20:49 UTC (permalink / raw)
  To: Chris Ball
  Cc: Mark Rutland, Ulf Hansson, Linux Doc List, Seungwon Jeon,
	Thierry Reding, Russell King - ARM Linux, Anton Vorontsov,
	Michal Simek, Jaehoon Chung, ARM Kernel List, Device Tree List,
	Pawel Moll, Stephen Warren, spear-devel, Rob Herring, Ben Dooks,
	linux-tegra, Ian Campbell, Shawn Guo, Barry Song, Randy Dunlap,
	linux-mmc@vger.kernel.org, Viresh Kumar, Sascha Hauer, Kumar Gala,
	linuxppc-dev
In-Reply-To: <86mwf5tn0v.fsf@void.printf.net>

On Mon, Apr 28, 2014 at 9:52 AM, Chris Ball <chris@printf.net> wrote:
> Hi,
>
> On Mon, Apr 28 2014, Stephen Warren wrote:
>> The series,
>> Tested-by: Stephen Warren <swarren@nvidia.com>
>>
>> (On an NVIDIA Tegra "Jetson TK1" board, with the patches applied on top
>> of next-20140428, also with Andrew Bresticker's Tegra SDHCI patches
>> "mmc: tegra: disable UHS modes" and "mmc: tegra: fix reporting of base
>> clock frequency" applied)
>
> Thanks very much Stephen, Russell for the series, and Ulf.
>
> This seems like a good time to merge to mmc-next and call for testing
> in linux-next -- I'm happy to merge a v2 PR when ready.
>
> - Chris.

Hi Russell,

Do you have a rough idea about when you will post your v2 pull request?

I have a pending sdhci patch that I would like to rebase on top of
your latest work once it is available.

Thanks,
Tim Kryger

^ permalink raw reply

* Re: [PATCH 2/2 v2] powerpc: Enable NO_BOOTMEM
From: Emil Medve @ 2014-05-07 21:26 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <1399430673.15726.249.camel@snotra.buserror.net>

Hello Scott,


On 05/06/2014 09:44 PM, Scott Wood wrote:
> On Tue, 2014-05-06 at 19:16 -0500, Emil Medve wrote:
>> Hello Scott,
>>
>>
>> On 05/06/2014 04:49 PM, Scott Wood wrote:
>>> On Tue, 2014-05-06 at 13:48 -0500, Emil Medve wrote:
>>>> Currently bootmem is just a wrapper around memblock. This gets rid of
>>>> the wrapper code just as other ARHC(es) did: x86, arm, etc.
>>>>
>>>> For now only cover !NUMA systems/builds
>>>>
>>>> Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
>>>> ---
>>>>
>>>> v2: Acknowledge that NUMA systems/builds are not covered by this patch
>>>>
>>>>  arch/powerpc/Kconfig  | 3 +++
>>>>  arch/powerpc/mm/mem.c | 8 ++++++++
>>>>  2 files changed, 11 insertions(+)
>>>>
>>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>>> index e099899..07b164b 100644
>>>> --- a/arch/powerpc/Kconfig
>>>> +++ b/arch/powerpc/Kconfig
>>>> @@ -475,6 +475,9 @@ config SYS_SUPPORTS_HUGETLBFS
>>>>  
>>>>  source "mm/Kconfig"
>>>>  
>>>> +config NO_BOOTMEM
>>>> +	def_bool !NUMA
>>>
>>> This will allow a user to manually turn on CONFIG_NO_BOOTMEM in the
>>> presence of NUMA.  From the changelog it sounds like this is not what
>>> you intended.
>>>
>>> What are the issues with NUMA?
>>
>> Well, I don't have access to a NUMA box/board. I could enable NUMA for a
>> !NUMA board but I'd feel better if I could actually test/debug on a
>> relevant system
> 
> You could first test with NUMA on a non-NUMA board, and then if that
> works ask the list for help testing on NUMA hardware (and various
> non-Freescale non-NUMA hardware, for that matter).
> 
> Is there a specific issue that would need to be addressed to make it
> work on NUMA?

Nope. Just different code/file(s)... with NUMA specific details...

>>> As is, you're not getting rid of wrapper code -- only adding ifdefs.
>>
>> First, you're talking about the bootmem initialization wrapper code for
>> powerpc. The actual bootmem code is in include/linux/bootmem.h and
>> mm/bootmem.c. We can't remove those files as they are still used by
>> other arches. Also, the word wrapper is somewhat imprecise as in powerpc
>> land bootmem sort of runs on top of memblock
> 
> My point was just that the changelog says "This gets rid of wrapper
> code" when it actually removes no source code, and adds configuration
> complexity.

The patch gets rid of the wrapper code, bootmem itself and arch specific
initialization, from the build/kernel image. I guess I'll re-word that
so it doesn't sound so literal

>> When NO_BOOTMEM is configured the mm/nobootmem.c is used that is the
>> bootmem API actually re-implemented with memblock. The bootmem API is
>> used in various places in the arch independent code
>>
>> This patch wants to isolate for removal the bootmem initialization code
>> for powerpc and to exclude mm/bootmem.c from being built. This being the
>> first step I didn't want to actually remove the code, so it will be easy
>> to debug if some issues crop up. Also, people that want the use the
>> bootmem code for some reason can easily do that. Once this change spends
>> some time in the tree, we can actually remove the bootmem initialization
>> code
> 
> Is there a plausible reason someone would "want to use the bootmem
> code"?

Don't know, but as before it wasn't even possible to make a build with
NO_BOOTMEM I decided to err on the side of caution

> While the "ifdef it for a while" approach is sometimes sensible, usually
> it's better to just make the change rather than ifdef it.

I felt it was sensible in this case

> Consider what
> the code would look like if there were ifdefs for a ton of random
> changes, half of which nobody ever bothered to go back and clean up
> after the change got widespread testing.

Well, this is not a random change, but I certainly don't disagree with
the principle of what you're saying

> Why is this patch risky enough to warrant such an approach?

I don't think it's risky and we've been using it for months on various
SoC(s)/boards. Just didn't want to be very abrupt in removing the option
of using bootmem

> Shouldn't boot-time issues be pretty obvious?

Gross errors should be obvious. But the devil is in the details, and
even tough I've debugged/compared the memory layout/allocations with
bootmem and memblock only, I'm prepared to admit I might have missed
something (especially in the first patch of the sequence)


Cheers,

^ permalink raw reply

* Re: [PATCH 5/6] powerpc/corenet: Add DPAA FMan support to the SoC device tree(s)
From: Scott Wood @ 2014-05-07 23:14 UTC (permalink / raw)
  To: Emil Medve
  Cc: devicetree, Kanetkar Shruti-B44454, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <5368811A.3060609@Freescale.com>

On Tue, 2014-05-06 at 01:28 -0500, Emil Medve wrote:
> Hello Scott,
> 
> 
> On 05/05/2014 06:34 PM, Scott Wood wrote:
> > On Sun, 2014-05-04 at 05:59 -0500, Emil Medve wrote:
> >>  Anyway, most days PHYs can be discovered so they don't use/need
> >> compatible properties. That's I guess part of the reason we don't have
> >> bindings for them PHY nodes
> > 
> > I don't see why there couldn't be a compatible that describes the
> > standard programming interface.
> 
> Because it can be detected at runtime and I guess stuff like that should
> stay out of the device tree. I'm using PCI as an analogy here

But in this case aren't you using a standardized component of the
programming model itself to probe the specific PHY type?  I think a
better analogy is the "cfi-flash" compatible.

-Scott

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox