LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 3/3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Li Yang @ 2011-12-19 11:05 UTC (permalink / raw)
  To: Scott Wood
  Cc: Artem.Bityutskiy, dedekind1, linuxppc-dev, LiuShuo, linux-kernel,
	shuo.liu, linux-mtd, akpm, dwmw2
In-Reply-To: <4EEB8704.8030201@freescale.com>

On Sat, Dec 17, 2011 at 1:59 AM, Scott Wood <scottwood@freescale.com> wrote=
:
> On 12/15/2011 08:44 PM, LiuShuo wrote:
>> hi Artem,
>> Could this patch be applied now and we make a independent patch for =C2=
=A0bad
>> block information
>> migration later?
>
> This patch is not safe to use without migration.

Hi Scott,

We agree it's not entirely safe without migrating the bad block flag.
But let's consider two sides of the situation.

Firstly, it's only unsafe when there is a need to re-built the Bad
Block Table from scratch(old BBT broken).  But currently there is no
easy way to do that(re-build BBT on demand), which means it's not a
common problem that we can easily address now.

Secondly, even if the previous said problem happens(BBT broken).  We
can still recover all the data if we overrule the bad block flag.
Only the card is not so good to be used again, however, it can be used
if we take the risk of losing data from errors that ECC can't
notice(low possibility too).

Finally, I don't think this is a blocker issue but a better to have enhance=
ment.

- Leo

^ permalink raw reply

* RE: RapidIO Direct I/O Support?
From: Bounine, Alexandre @ 2011-12-19 14:51 UTC (permalink / raw)
  To: Daniel Ng, linuxppc-dev
In-Reply-To: <CAJTBoNnMfegZid7bf_ZxunwVmk8QEpeTn8Bcfz7g3sneL76s9Q@mail.gmail.com>

On Monday, December 19, 2011 1:39 AM, Daniel Ng wrote:

>Is there RapidIO Direct Memory I/O Support in the latest kernel?
>
>I've seen these patches from Freescale, but it seems they were never
integrated-
>http://kerneltrap.org/mailarchive/linux-netdev/2009/5/12/5686954
>
>Does anyone know why these weren't integrated?=20
>
>What is the latest state of these patches? Do they work?

I am in process of submitting set of patches that add DMA Engine support
into RapidIO subsystem. One of these patches brings back an upper level
interface for inbound memory mapping from referenced thread. It does not
include HW specific mapping for fsl_rio though.

I used an inbound mapping on 8548 based platform during my testing and
that part did not take too much time to get it working.

The v.2 set of my DMA patches will be published as soon as DMAengine
maintainers
release an update for dma_slave API.
For outbound SRIO requests new patches rely on DMA capabilities of SRIO
controller. These patches add DMA channel driver for Tsi721 PCIe-to-SRIO
bridge.

Alex.

^ permalink raw reply

* Re: [PATCH v3 04/14] KVM: PPC: Keep page physical addresses in per-slot arrays
From: Alexander Graf @ 2011-12-19 15:10 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <20111212222821.GE18868@bloggs.ozlabs.ibm.com>


On 12.12.2011, at 23:28, Paul Mackerras wrote:

> This allocates an array for each memory slot that is added to store
> the physical addresses of the pages in the slot.  This array is
> vmalloc'd and accessed in kvmppc_h_enter using real_vmalloc_addr().
> This allows us to remove the ram_pginfo field from the kvm_arch
> struct, and removes the 64GB guest RAM limit that we had.
>=20
> We use the low-order bits of the array entries to store a flag
> indicating that we have done get_page on the corresponding page,
> and therefore need to call put_page when we are finished with the
> page.  Currently this is set for all pages except those in our
> special RMO regions.
>=20
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
> arch/powerpc/include/asm/kvm_host.h |    9 ++-
> arch/powerpc/kvm/book3s_64_mmu_hv.c |   18 +++---
> arch/powerpc/kvm/book3s_hv.c        |  114 =
+++++++++++++++++------------------
> arch/powerpc/kvm/book3s_hv_rm_mmu.c |   41 +++++++++++-
> 4 files changed, 107 insertions(+), 75 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/kvm_host.h =
b/arch/powerpc/include/asm/kvm_host.h
> index 629df2e..7a17ab5 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -38,6 +38,7 @@
> #define KVM_MEMORY_SLOTS 32
> /* memory slots that does not exposed to userspace */
> #define KVM_PRIVATE_MEM_SLOTS 4
> +#define KVM_MEM_SLOTS_NUM (KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS)
>=20
> #ifdef CONFIG_KVM_MMIO
> #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> @@ -175,25 +176,27 @@ struct revmap_entry {
> 	unsigned long guest_rpte;
> };
>=20
> +/* Low-order bits in kvm->arch.slot_phys[][] */
> +#define KVMPPC_GOT_PAGE		0x80
> +
> struct kvm_arch {
> #ifdef CONFIG_KVM_BOOK3S_64_HV
> 	unsigned long hpt_virt;
> 	struct revmap_entry *revmap;
> -	unsigned long ram_npages;
> 	unsigned long ram_psize;
> 	unsigned long ram_porder;
> -	struct kvmppc_pginfo *ram_pginfo;
> 	unsigned int lpid;
> 	unsigned int host_lpid;
> 	unsigned long host_lpcr;
> 	unsigned long sdr1;
> 	unsigned long host_sdr1;
> 	int tlbie_lock;
> -	int n_rma_pages;
> 	unsigned long lpcr;
> 	unsigned long rmor;
> 	struct kvmppc_rma_info *rma;
> 	struct list_head spapr_tce_tables;
> +	unsigned long *slot_phys[KVM_MEM_SLOTS_NUM];
> +	int slot_npages[KVM_MEM_SLOTS_NUM];
> 	unsigned short last_vcpu[NR_CPUS];
> 	struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
> #endif /* CONFIG_KVM_BOOK3S_64_HV */
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c =
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 80ece8d..e4c6069 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -98,16 +98,16 @@ void kvmppc_free_hpt(struct kvm *kvm)
> void kvmppc_map_vrma(struct kvm *kvm, struct =
kvm_userspace_memory_region *mem)
> {
> 	unsigned long i;
> -	unsigned long npages =3D kvm->arch.ram_npages;
> -	unsigned long pfn;
> +	unsigned long npages;
> +	unsigned long pa;
> 	unsigned long *hpte;
> 	unsigned long hash;
> 	unsigned long porder =3D kvm->arch.ram_porder;
> 	struct revmap_entry *rev;
> -	struct kvmppc_pginfo *pginfo =3D kvm->arch.ram_pginfo;
> +	unsigned long *physp;
>=20
> -	if (!pginfo)
> -		return;
> +	physp =3D kvm->arch.slot_phys[mem->slot];
> +	npages =3D kvm->arch.slot_npages[mem->slot];
>=20
> 	/* VRMA can't be > 1TB */
> 	if (npages > 1ul << (40 - porder))
> @@ -117,9 +117,10 @@ void kvmppc_map_vrma(struct kvm *kvm, struct =
kvm_userspace_memory_region *mem)
> 		npages =3D HPT_NPTEG;
>=20
> 	for (i =3D 0; i < npages; ++i) {
> -		pfn =3D pginfo[i].pfn;
> -		if (!pfn)
> +		pa =3D physp[i];
> +		if (!pa)
> 			break;
> +		pa &=3D PAGE_MASK;
> 		/* can't use hpt_hash since va > 64 bits */
> 		hash =3D (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & =
HPT_HASH_MASK;
> 		/*
> @@ -131,8 +132,7 @@ void kvmppc_map_vrma(struct kvm *kvm, struct =
kvm_userspace_memory_region *mem)
> 		hash =3D (hash << 3) + 7;
> 		hpte =3D (unsigned long *) (kvm->arch.hpt_virt + (hash =
<< 4));
> 		/* HPTE low word - RPN, protection, etc. */
> -		hpte[1] =3D (pfn << PAGE_SHIFT) | HPTE_R_R | HPTE_R_C |
> -			HPTE_R_M | PP_RWXX;
> +		hpte[1] =3D pa | HPTE_R_R | HPTE_R_C | HPTE_R_M | =
PP_RWXX;
> 		smp_wmb();
> 		hpte[0] =3D HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
> 			(i << (VRMA_PAGE_ORDER - 16)) | HPTE_V_BOLTED |
> diff --git a/arch/powerpc/kvm/book3s_hv.c =
b/arch/powerpc/kvm/book3s_hv.c
> index da7db14..86d3e4b 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -50,14 +50,6 @@
> #include <linux/vmalloc.h>
> #include <linux/highmem.h>
>=20
> -/*
> - * For now, limit memory to 64GB and require it to be large pages.
> - * This value is chosen because it makes the ram_pginfo array be
> - * 64kB in size, which is about as large as we want to be trying
> - * to allocate with kmalloc.
> - */
> -#define MAX_MEM_ORDER		36
> -
> #define LARGE_PAGE_ORDER	24	/* 16MB pages */
>=20
> /* #define EXIT_DEBUG */
> @@ -147,10 +139,12 @@ static unsigned long do_h_register_vpa(struct =
kvm_vcpu *vcpu,
> 				       unsigned long vcpuid, unsigned =
long vpa)
> {
> 	struct kvm *kvm =3D vcpu->kvm;
> -	unsigned long pg_index, ra, len;
> +	unsigned long gfn, pg_index, ra, len;
> 	unsigned long pg_offset;
> 	void *va;
> 	struct kvm_vcpu *tvcpu;
> +	struct kvm_memory_slot *memslot;
> +	unsigned long *physp;
>=20
> 	tvcpu =3D kvmppc_find_vcpu(kvm, vcpuid);
> 	if (!tvcpu)
> @@ -164,14 +158,20 @@ static unsigned long do_h_register_vpa(struct =
kvm_vcpu *vcpu,
> 		if (vpa & 0x7f)
> 			return H_PARAMETER;
> 		/* registering new area; convert logical addr to real */
> -		pg_index =3D vpa >> kvm->arch.ram_porder;
> -		pg_offset =3D vpa & (kvm->arch.ram_psize - 1);
> -		if (pg_index >=3D kvm->arch.ram_npages)
> +		gfn =3D vpa >> PAGE_SHIFT;
> +		memslot =3D gfn_to_memslot(kvm, gfn);
> +		if (!memslot || !(memslot->flags & KVM_MEMSLOT_INVALID))
> +			return H_PARAMETER;
> +		physp =3D kvm->arch.slot_phys[memslot->id];
> +		if (!physp)
> 			return H_PARAMETER;
> -		if (kvm->arch.ram_pginfo[pg_index].pfn =3D=3D 0)
> +		pg_index =3D (gfn - memslot->base_gfn) >>
> +			(kvm->arch.ram_porder - PAGE_SHIFT);
> +		pg_offset =3D vpa & (kvm->arch.ram_psize - 1);
> +		ra =3D physp[pg_index];
> +		if (!ra)
> 			return H_PARAMETER;
> -		ra =3D kvm->arch.ram_pginfo[pg_index].pfn << PAGE_SHIFT;
> -		ra |=3D pg_offset;
> +		ra =3D (ra & PAGE_MASK) | pg_offset;
> 		va =3D __va(ra);
> 		if (flags <=3D 1)
> 			len =3D *(unsigned short *)(va + 4);
> @@ -1108,12 +1108,11 @@ int kvmppc_core_prepare_memory_region(struct =
kvm *kvm,
> 				struct kvm_userspace_memory_region *mem)
> {
> 	unsigned long psize, porder;
> -	unsigned long i, npages, totalpages;
> -	unsigned long pg_ix;
> -	struct kvmppc_pginfo *pginfo;
> +	unsigned long i, npages;
> 	unsigned long hva;
> 	struct kvmppc_rma_info *ri =3D NULL;
> 	struct page *page;
> +	unsigned long *phys;
>=20
> 	/* For now, only allow 16MB pages */
> 	porder =3D LARGE_PAGE_ORDER;
> @@ -1125,20 +1124,21 @@ int kvmppc_core_prepare_memory_region(struct =
kvm *kvm,
> 		return -EINVAL;
> 	}
>=20
> +	/* Allocate a slot_phys array */
> 	npages =3D mem->memory_size >> porder;
> -	totalpages =3D (mem->guest_phys_addr + mem->memory_size) >> =
porder;
> -
> -	/* More memory than we have space to track? */
> -	if (totalpages > (1ul << (MAX_MEM_ORDER - LARGE_PAGE_ORDER)))
> -		return -EINVAL;
> +	phys =3D kvm->arch.slot_phys[mem->slot];
> +	if (!phys) {
> +		phys =3D vzalloc(npages * sizeof(unsigned long));
> +		if (!phys)
> +			return -ENOMEM;
> +		kvm->arch.slot_phys[mem->slot] =3D phys;
> +		kvm->arch.slot_npages[mem->slot] =3D npages;
> +	}
>=20
> 	/* Do we already have an RMA registered? */
> 	if (mem->guest_phys_addr =3D=3D 0 && kvm->arch.rma)
> 		return -EINVAL;
>=20
> -	if (totalpages > kvm->arch.ram_npages)
> -		kvm->arch.ram_npages =3D totalpages;
> -
> 	/* Is this one of our preallocated RMAs? */
> 	if (mem->guest_phys_addr =3D=3D 0) {
> 		struct vm_area_struct *vma;
> @@ -1171,7 +1171,6 @@ int kvmppc_core_prepare_memory_region(struct kvm =
*kvm,
> 		}
> 		atomic_inc(&ri->use_count);
> 		kvm->arch.rma =3D ri;
> -		kvm->arch.n_rma_pages =3D rma_size >> porder;
>=20
> 		/* Update LPCR and RMOR */
> 		lpcr =3D kvm->arch.lpcr;
> @@ -1195,12 +1194,9 @@ int kvmppc_core_prepare_memory_region(struct =
kvm *kvm,
> 			ri->base_pfn << PAGE_SHIFT, rma_size, lpcr);
> 	}
>=20
> -	pg_ix =3D mem->guest_phys_addr >> porder;
> -	pginfo =3D kvm->arch.ram_pginfo + pg_ix;
> -	for (i =3D 0; i < npages; ++i, ++pg_ix) {
> -		if (ri && pg_ix < kvm->arch.n_rma_pages) {
> -			pginfo[i].pfn =3D ri->base_pfn +
> -				(pg_ix << (porder - PAGE_SHIFT));
> +	for (i =3D 0; i < npages; ++i) {
> +		if (ri && i < ri->npages) {
> +			phys[i] =3D (ri->base_pfn << PAGE_SHIFT) + (i << =
porder);
> 			continue;
> 		}
> 		hva =3D mem->userspace_addr + (i << porder);
> @@ -1216,7 +1212,7 @@ int kvmppc_core_prepare_memory_region(struct kvm =
*kvm,
> 			       hva, compound_order(page));
> 			goto err;
> 		}
> -		pginfo[i].pfn =3D page_to_pfn(page);
> +		phys[i] =3D (page_to_pfn(page) << PAGE_SHIFT) | =
KVMPPC_GOT_PAGE;
> 	}
>=20
> 	return 0;
> @@ -1225,6 +1221,28 @@ int kvmppc_core_prepare_memory_region(struct =
kvm *kvm,
> 	return -EINVAL;
> }
>=20
> +static void unpin_slot(struct kvm *kvm, int slot_id)
> +{
> +	unsigned long *physp;
> +	unsigned long j, npages, pfn;
> +	struct page *page;
> +
> +	physp =3D kvm->arch.slot_phys[slot_id];
> +	npages =3D kvm->arch.slot_npages[slot_id];
> +	if (physp) {
> +		for (j =3D 0; j < npages; j++) {
> +			if (!(physp[j] & KVMPPC_GOT_PAGE))
> +				continue;
> +			pfn =3D physp[j] >> PAGE_SHIFT;
> +			page =3D pfn_to_page(pfn);
> +			SetPageDirty(page);
> +			put_page(page);
> +		}
> +		vfree(physp);
> +		kvm->arch.slot_phys[slot_id] =3D NULL;
> +	}
> +}
> +
> void kvmppc_core_commit_memory_region(struct kvm *kvm,
> 				struct kvm_userspace_memory_region *mem)
> {
> @@ -1236,8 +1254,6 @@ void kvmppc_core_commit_memory_region(struct kvm =
*kvm,
> int kvmppc_core_init_vm(struct kvm *kvm)
> {
> 	long r;
> -	unsigned long npages =3D 1ul << (MAX_MEM_ORDER - =
LARGE_PAGE_ORDER);
> -	long err =3D -ENOMEM;
> 	unsigned long lpcr;
>=20
> 	/* Allocate hashed page table */
> @@ -1247,19 +1263,9 @@ int kvmppc_core_init_vm(struct kvm *kvm)
>=20
> 	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
>=20
> -	kvm->arch.ram_pginfo =3D kzalloc(npages * sizeof(struct =
kvmppc_pginfo),
> -				       GFP_KERNEL);
> -	if (!kvm->arch.ram_pginfo) {
> -		pr_err("kvmppc_core_init_vm: couldn't alloc %lu =
bytes\n",
> -		       npages * sizeof(struct kvmppc_pginfo));
> -		goto out_free;
> -	}
> -
> -	kvm->arch.ram_npages =3D 0;
> 	kvm->arch.ram_psize =3D 1ul << LARGE_PAGE_ORDER;
> 	kvm->arch.ram_porder =3D LARGE_PAGE_ORDER;
> 	kvm->arch.rma =3D NULL;
> -	kvm->arch.n_rma_pages =3D 0;
>=20
> 	kvm->arch.host_sdr1 =3D mfspr(SPRN_SDR1);
>=20
> @@ -1282,25 +1288,15 @@ int kvmppc_core_init_vm(struct kvm *kvm)
> 	kvm->arch.lpcr =3D lpcr;
>=20
> 	return 0;
> -
> - out_free:
> -	kvmppc_free_hpt(kvm);
> -	return err;
> }
>=20
> void kvmppc_core_destroy_vm(struct kvm *kvm)
> {
> -	struct kvmppc_pginfo *pginfo;
> 	unsigned long i;
>=20
> -	if (kvm->arch.ram_pginfo) {
> -		pginfo =3D kvm->arch.ram_pginfo;
> -		kvm->arch.ram_pginfo =3D NULL;
> -		for (i =3D kvm->arch.n_rma_pages; i < =
kvm->arch.ram_npages; ++i)
> -			if (pginfo[i].pfn)
> -				put_page(pfn_to_page(pginfo[i].pfn));
> -		kfree(pginfo);
> -	}
> +	for (i =3D 0; i < KVM_MEM_SLOTS_NUM; i++)
> +		unpin_slot(kvm, i);
> +
> 	if (kvm->arch.rma) {
> 		kvm_release_rma(kvm->arch.rma);
> 		kvm->arch.rma =3D NULL;
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c =
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 6148493..84dae82 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -20,6 +20,25 @@
> #include <asm/synch.h>
> #include <asm/ppc-opcode.h>
>=20
> +/*
> + * Since this file is built in even if KVM is a module, we need
> + * a local copy of this function for the case where kvm_main.c is
> + * modular.
> + */
> +static struct kvm_memory_slot *builtin_gfn_to_memslot(struct kvm =
*kvm,
> +						gfn_t gfn)
> +{

Shouldn't this rather be in a header file then? I'd rather not have this =
code duplicated. Please follow up with a patch to merge this copy and =
the real one into something in a header file.


Alex

^ permalink raw reply

* Re: [PATCH 3/3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Scott Wood @ 2011-12-19 16:47 UTC (permalink / raw)
  To: Li Yang
  Cc: Artem.Bityutskiy, dedekind1, dwmw2, LiuShuo, linux-kernel,
	shuo.liu, linux-mtd, akpm, linuxppc-dev
In-Reply-To: <CADRPPNT-akj5KBQKdRaFrA2XpLU6-xtuRduzDYEWjv_dzVmTAA@mail.gmail.com>

On 12/19/2011 05:05 AM, Li Yang wrote:
> On Sat, Dec 17, 2011 at 1:59 AM, Scott Wood <scottwood@freescale.com> wrote:
>> On 12/15/2011 08:44 PM, LiuShuo wrote:
>>> hi Artem,
>>> Could this patch be applied now and we make a independent patch for  bad
>>> block information
>>> migration later?
>>
>> This patch is not safe to use without migration.
> 
> Hi Scott,
> 
> We agree it's not entirely safe without migrating the bad block flag.
> But let's consider two sides of the situation.
> 
> Firstly, it's only unsafe when there is a need to re-built the Bad
> Block Table from scratch(old BBT broken).

No, it's unsafe in the presence of bad blocks.

The BBT erasure issue relates to how me mark the flash as migrated, not
whether we migrate in the first place.

>  But currently there is no
> easy way to do that(re-build BBT on demand),

You scrub the blocks with U-Boot.  It's not supposed to be *easy*, it's
a developer recovery mechanism.

> Secondly, even if the previous said problem happens(BBT broken).  We
> can still recover all the data if we overrule the bad block flag.

How so?  The bad block markers -- including ones legitimately written to
the BBT after the fact -- are used for block skipping with certain types
of writes.  Without the knowledge of which blocks were marked bad, how
do we know which blocks were skipped?

> Only the card is not so good to be used again,

That's a pretty crappy thing to happen every time you hit a bug during
development.

But again, that's irrelevant to whether this patch should be applied
as-is, because we currently don't have any bad block migration at all.

> however, it can be used
> if we take the risk of losing data from errors that ECC can't
> notice(low possibility too).

Can you quantify "low possibility" here?

Note that any block that *was* marked bad will have a multi-bit error
from the marker itself, since it will be embedded in the main data area.

> Finally, I don't think this is a blocker issue but a better to have enhancement.

No, it is not an enhancement.  Processing bad block markers correctly is
a fundamental requirement.  And if anyone *does* start using it right
away, then we'll have to deal with their complaints if we start checking
for a migration marker later.

Why is it so critical that it be merged now, and not in a few weeks (or
next merge window) when I have a chance to do the migration code
(assuming nobody else does it first) and add a suitable check for the
migration marker in the Linux driver?

-Scott

^ permalink raw reply

* Re: [PATCH v3 12/14] KVM: Add barriers to allow mmu_notifier_retry to be used locklessly
From: Alexander Graf @ 2011-12-19 17:18 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, kvm list, kvm-ppc, Avi Kivity
In-Reply-To: <20111212223720.GM18868@bloggs.ozlabs.ibm.com>


On 12.12.2011, at 23:37, Paul Mackerras wrote:

> This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an
> smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give
> the correct answer when called without kvm->mmu_lock being held.
> PowerPC Book3S HV KVM wants to use a bitlock per guest page rather =
than
> a single global spinlock in order to improve the scalability of =
updates
> to the guest MMU hashed page table, and so needs this.
>=20
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Avi, mind to ack?


Alex

> ---
> include/linux/kvm_host.h |   14 +++++++++-----
> virt/kvm/kvm_main.c      |    6 +++---
> 2 files changed, 12 insertions(+), 8 deletions(-)
>=20
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 8c5c303..ec79a45 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -700,12 +700,16 @@ static inline int mmu_notifier_retry(struct =
kvm_vcpu *vcpu, unsigned long mmu_se
> 	if (unlikely(vcpu->kvm->mmu_notifier_count))
> 		return 1;
> 	/*
> -	 * Both reads happen under the mmu_lock and both values are
> -	 * modified under mmu_lock, so there's no need of smb_rmb()
> -	 * here in between, otherwise mmu_notifier_count should be
> -	 * read before mmu_notifier_seq, see
> -	 * mmu_notifier_invalidate_range_end write side.
> +	 * Ensure the read of mmu_notifier_count happens before the read
> +	 * of mmu_notifier_seq.  This interacts with the smp_wmb() in
> +	 * mmu_notifier_invalidate_range_end to make sure that the =
caller
> +	 * either sees the old (non-zero) value of mmu_notifier_count or
> +	 * the new (incremented) value of mmu_notifier_seq.
> +	 * PowerPC Book3s HV KVM calls this under a per-page lock
> +	 * rather than under kvm->mmu_lock, for scalability, so
> +	 * can't rely on kvm->mmu_lock to keep things ordered.
> 	 */
> +	smp_rmb();
> 	if (vcpu->kvm->mmu_notifier_seq !=3D mmu_seq)
> 		return 1;
> 	return 0;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e289486..c144132 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -357,11 +357,11 @@ static void =
kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
> 	 * been freed.
> 	 */
> 	kvm->mmu_notifier_seq++;
> +	smp_wmb();
> 	/*
> 	 * The above sequence increase must be visible before the
> -	 * below count decrease but both values are read by the kvm
> -	 * page fault under mmu_lock spinlock so we don't need to add
> -	 * a smb_wmb() here in between the two.
> +	 * below count decrease, which is ensured by the smp_wmb above
> +	 * in conjunction with the smp_rmb in mmu_notifier_retry().
> 	 */
> 	kvm->mmu_notifier_count--;
> 	spin_unlock(&kvm->mmu_lock);
> --=20
> 1.7.7.3
>=20

^ permalink raw reply

* Re: [PATCH v3 12/14] KVM: Add barriers to allow mmu_notifier_retry to be used locklessly
From: Avi Kivity @ 2011-12-19 17:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, Paul Mackerras, kvm list, kvm-ppc
In-Reply-To: <29064998-0526-435B-A8BA-DB6BF9CDED46@suse.de>

On 12/19/2011 07:18 PM, Alexander Graf wrote:
> On 12.12.2011, at 23:37, Paul Mackerras wrote:
>
> > This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an
> > smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give
> > the correct answer when called without kvm->mmu_lock being held.
> > PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than
> > a single global spinlock in order to improve the scalability of updates
> > to the guest MMU hashed page table, and so needs this.
> > 
> > Signed-off-by: Paul Mackerras <paulus@samba.org>
>
> Avi, mind to ack?
>

Acked-by: Avi Kivity <avi@redhat.com>

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH v3 00/14] KVM: PPC: Update Book3S HV memory handling
From: Alexander Graf @ 2011-12-19 17:39 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <20111212222347.GA18868@bloggs.ozlabs.ibm.com>


On 12.12.2011, at 23:23, Paul Mackerras wrote:

> This series of patches updates the Book3S-HV KVM code that manages the
> guest hashed page table (HPT) to enable several things:
> 
> * MMIO emulation and MMIO pass-through
> 
> * Use of small pages (4kB or 64kB, depending on config) to back the
>  guest memory
> 
> * Pageable guest memory - i.e. backing pages can be removed from the
>  guest and reinstated on demand, using the MMU notifier mechanism
> 
> * Guests can be given read-only access to pages even though they think
>  they have mapped them read/write.  When they try to write to them
>  their access is upgraded to read/write.  This allows KSM to share
>  pages between guests.
> 
> On PPC970 we have no way to get DSIs and ISIs to come to the
> hypervisor, so we can't do MMIO emulation or pageable guest memory.
> On POWER7 we set the VPM1 bit in the LPCR to make all DSIs and ISIs
> come to the hypervisor (host) as HDSIs or HISIs.
> 
> This code is working well in my tests.  The sporadic crashes that I
> was seeing earlier are fixed by the second patch in the series.
> Somewhat to my surprise, when I implemented the last patch in the
> series I started to see KSM coalescing pages without any further
> effort on my part -- my tests were on a machine with Fedora 16
> installed, and it has ksmtuned running by default.
> 
> This series is on top of Alex Graf's kvm-ppc-next branch.  The first
> patch in my series fixes a bug in one of the patches in that branch
> ("KVM: PPC: booke: Improve timer register emulation").
> 
> These patches only touch arch/powerpc except for patch 12, which adds
> a couple of barriers to allow mmu_notifier_retry() to be used outside
> of the kvm->mmu_lock.

Thanks, applied all to kvm-ppc-next, awaiting the one follow-up patch though.


Alex

^ permalink raw reply

* Re: [PATCH 3/3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Scott Wood @ 2011-12-19 18:38 UTC (permalink / raw)
  To: dedekind1
  Cc: Artem.Bityutskiy, linuxppc-dev, linux-kernel, shuo.liu, linux-mtd,
	akpm, dwmw2
In-Reply-To: <1324132520.4240.26.camel@sauron.fi.intel.com>

On 12/17/2011 08:35 AM, Artem Bityutskiy wrote:
> On Mon, 2011-12-12 at 15:30 -0600, Scott Wood wrote:
>> On 12/12/2011 03:19 PM, Artem Bityutskiy wrote:
>>> On Mon, 2011-12-12 at 15:15 -0600, Scott Wood wrote:
>>>> NAND chips come from the factory with bad blocks marked at a certain
>>>> offset into each page.  This offset is normally in the OOB area, but
>>>> since we change the layout from "4k data, 128 byte oob" to "2k data, 64
>>>> byte oob, 2k data, 64 byte oob" the marker is no longer in the oob.  On
>>>> first use we need to migrate the markers so that they are still in the oob.
>>>
>>> Ah, I see, thanks. Are you planning to implement in-kernel migration or
>>> use a user-space tool?
>>
>> That's the kind of answer I was hoping to get from Shuo. :-)
>>
>> Most likely is a firmware-based tool, but I'd like there to be some way
>> for the tool to mark that this has happened, so that the Linux driver
>> can refuse to do non-raw accesses to a chip that isn't marked as having
>> been migrated (or at least yell loudly in the log).
>>
>> Speaking of raw accesses, these are currently broken in the eLBC
>> driver... we need some way for the generic layer to tell us what kind of
>> access it is before the transaction starts, not once it wants to read
>> out the buffer (unless we add more hacks to delay the start of a read
>> transaction until first buffer access...).  We'd be better off with a
>> high-level "read page/write page" function that does the whole thing
>> (not just buffer access, but command issuance as well).
> 
> It looks like currently you can re-define chip->read_page, so I guess
> you should rework MTD and make chip->write_page re-definable?

Unless something has changed very recently, there is no chip->read_page
or chip->write_page.  There is chip->ecc.read_page and
chip->ecc.write_page, but they are too low-level.  What we'd need to
replace is a portion of nand_do_read_ops()/nand_do_write_ops().

-Scott

^ permalink raw reply

* Re: [PATCH 3/3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Scott Wood @ 2011-12-19 18:42 UTC (permalink / raw)
  To: dedekind1
  Cc: Artem.Bityutskiy, dwmw2, linux-kernel, shuo.liu, linux-mtd, akpm,
	linuxppc-dev
In-Reply-To: <4EEF849B.7030804@freescale.com>

On 12/19/2011 12:38 PM, Scott Wood wrote:
> On 12/17/2011 08:35 AM, Artem Bityutskiy wrote:
>> It looks like currently you can re-define chip->read_page, so I guess
>> you should rework MTD and make chip->write_page re-definable?
> 
> Unless something has changed very recently, there is no chip->read_page
> or chip->write_page.  There is chip->ecc.read_page and
> chip->ecc.write_page, but they are too low-level.  What we'd need to
> replace is a portion of nand_do_read_ops()/nand_do_write_ops().

Sorry, chip->write_page does exist -- it's chip->read_page that would
need to be made similarly redefinable.

-Scott

^ permalink raw reply

* linux-next: manual merge of the cputime tree with the powerpc tree
From: Stephen Rothwell @ 2011-12-20  5:11 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: linux-kernel, linux-next, Andreas Schwab, Paul Mackerras,
	linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1017 bytes --]

Hi Martin,

Today's linux-next merge of the cputime tree got a conflict in
arch/powerpc/include/asm/cputime.h between commit 9f5072d4f63f ("powerpc:
Fix wrong divisor in usecs_to_cputime") from the powerpc tree and commit
648616343cdb ("[S390] cputime: add sparse checking and cleanup") from the
cputime tree.

I fixed it up (see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc arch/powerpc/include/asm/cputime.h
index 33a3580,e94935c..0000000
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@@ -130,7 -114,7 +114,7 @@@ extern u64 __cputime_usec_factor
  
  static inline unsigned long cputime_to_usecs(const cputime_t ct)
  {
- 	return mulhdu(ct, __cputime_usec_factor);
 -	return mulhdu((__force u64) ct, __cputime_msec_factor) * USEC_PER_MSEC;
++	return mulhdu((__force u64) ct, __cputime_usec_factor);
  }
  
  static inline cputime_t usecs_to_cputime(const unsigned long us)

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* RE: [PATCH 1/2] mtd/nand: fixup for fmr initialization of Freescale NAND controller
From: Liu Shengzhou-B36685 @ 2011-12-20  6:40 UTC (permalink / raw)
  To: dedekind1@gmail.com
  Cc: Wood Scott-B07421, Gala Kumar-B11780,
	linuxppc-dev@lists.ozlabs.org, dwmw2@infradead.org,
	linux-mtd@lists.infradead.org
In-Reply-To: <1324133097.4240.32.camel@sauron.fi.intel.com>

DQo+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IEFydGVtIEJpdHl1dHNraXkg
W21haWx0bzpkZWRla2luZDFAZ21haWwuY29tXQ0KPiBTZW50OiBTYXR1cmRheSwgRGVjZW1iZXIg
MTcsIDIwMTEgMTA6NDUgUE0NCj4gVG86IExpdSBTaGVuZ3pob3UtQjM2Njg1DQo+IENjOiBsaW51
eHBwYy1kZXZAbGlzdHMub3psYWJzLm9yZzsgV29vZCBTY290dC1CMDc0MjE7DQo+IGR3bXcyQGlu
ZnJhZGVhZC5vcmc7IEdhbGEgS3VtYXItQjExNzgwOyBsaW51eC1tdGRAbGlzdHMuaW5mcmFkZWFk
Lm9yZw0KPiBTdWJqZWN0OiBSZTogW1BBVENIIDEvMl0gbXRkL25hbmQ6IGZpeHVwIGZvciBmbXIg
aW5pdGlhbGl6YXRpb24gb2YNCj4gRnJlZXNjYWxlIE5BTkQgY29udHJvbGxlcg0KPiANCj4gT24g
TW9uLCAyMDExLTEyLTEyIGF0IDE3OjQwICswODAwLCBTaGVuZ3pob3UgTGl1IHdyb3RlOg0KPiA+
IFRoZXJlIHdhcyBhIGJ1ZyBmb3IgZm1yIGluaXRpYWxpemF0aW9uLCB3aGljaCBsZWFkIHRvICBm
bXIgd2FzIGFsd2F5cw0KPiA+IDB4MTAwIGluIGZzbF9lbGJjX2NoaXBfaW5pdCgpIGFuZCBjYXVz
ZWQgRkNNIGNvbW1hbmQgdGltZW91dCBiZWZvcmUNCj4gPiBjYWxsaW5nIGZzbF9lbGJjX2NoaXBf
aW5pdF90YWlsKCksIG5vdyB3ZSBpbml0aWFsaXplIENXVE8gdG8gbWF4aW11bQ0KPiA+IHRpbWVv
dXQgdmFsdWUgYW5kIG5vdCByZWx5aW5nIG9uIHRoZSBzZXR0aW5nIG9mIGJvb3Rsb2FkZXIuDQo+
ID4NCj4gPiBTaWduZWQtb2ZmLWJ5OiBTaGVuZ3pob3UgTGl1IDxTaGVuZ3pob3UuTGl1QGZyZWVz
Y2FsZS5jb20+DQo+IA0KPiBQdXNoZWQgYm90aCB0byBsMi1tdGQtMi42LmdpdCwgdGhhbmtzIQ0K
PiANCj4gLS0NCj4gQmVzdCBSZWdhcmRzLA0KPiBBcnRlbSBCaXR5dXRza2l5DQoNCkkgbm90ZWQg
aXQgaGFkIGJlZW4gYXBwbGllZCBpbiBsaW51eC1uZXh0LmdpdCB0cmVlLg0KRG9lcyBpdCBzdGls
bCBuZWVkIHRvIGwyLW10ZC0yLjYuZ2l0PyANClRoYW5rcy4NCg0KQmVzdCBSZWdhcmRzLA0KU2hl
bmd6aG91IExpdQ0K

^ permalink raw reply

* [PATCH] powerpc/mpc85xx: 32bit address support for p1022ds
From: r66093 @ 2011-12-20  6:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Jerry Huang

From: Jerry Huang <Chang-Ming.Huang@freescale.com>

All features for p1022ds are based on the 32bit address, 36bit only optional.
We should make the PHYS_64BIT optional, remove the 'select PHYS_64BIT'
from the Kconfig file in order to support 32bit address for P1022DS platform.

Signed-off-by: Jerry Huang <Chang-Ming.Huang@freescale.com>
---
 arch/powerpc/platforms/85xx/Kconfig |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
index 8f0543f..d58987f 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -80,7 +80,6 @@ config P1010_RDB
 config P1022_DS
 	bool "Freescale P1022 DS"
 	select DEFAULT_UIMAGE
-	select PHYS_64BIT	# The DTS has 36-bit addresses
 	select SWIOTLB
 	help
 	  This option enables support for the Freescale P1022DS reference board.
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH-v1]Copy machine descriptor after probe succeed
From: bill4carson @ 2011-12-20  9:08 UTC (permalink / raw)
  To: bill4carson; +Cc: linuxppc-dev

This patch fix minor issue with machine probe process,
It's much better to copy machine descriptor after probe 
succeed.

arch/powerpc/kernel/setup-common.c |    4 ++--
1 files changed, 2 insertions(+), 2 deletions(-

^ permalink raw reply

* [PATCH] Copy machine descriptor after probe succeed
From: bill4carson @ 2011-12-20  9:08 UTC (permalink / raw)
  To: bill4carson; +Cc: linuxppc-dev
In-Reply-To: <1324372127-8552-1-git-send-email-bill4carson@gmail.com>

From: Bill Carson <bill4carson@gmail.com>

It make more sense to copy machine descriptor AFTER machine probe return
succeed.

Signed-off-by: Bill Carson <bill4carson@gmail.com>
---
 arch/powerpc/kernel/setup-common.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index d426b1d..3362097 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -574,9 +574,9 @@ void probe_machine(void)
 	     machine_id < &__machine_desc_end;
 	     machine_id++) {
 		DBG("  %s ...", machine_id->name);
-		memcpy(&ppc_md, machine_id, sizeof(struct machdep_calls));
-		if (ppc_md.probe()) {
+		if (machine_id->probe()) {
 			DBG(" match !\n");
+			memcpy(&ppc_md, machine_id, sizeof(struct machdep_calls));
 			break;
 		}
 		DBG("\n");
-- 
1.6.3.1

^ permalink raw reply related

* Re: [PATCH 3/3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Li Yang @ 2011-12-20  9:08 UTC (permalink / raw)
  To: Scott Wood
  Cc: Artem.Bityutskiy, dedekind1, dwmw2, LiuShuo, linux-kernel,
	shuo.liu, linux-mtd, akpm, linuxppc-dev
In-Reply-To: <4EEF6AAA.3030806@freescale.com>

On Tue, Dec 20, 2011 at 12:47 AM, Scott Wood <scottwood@freescale.com> wrot=
e:
> On 12/19/2011 05:05 AM, Li Yang wrote:
>> On Sat, Dec 17, 2011 at 1:59 AM, Scott Wood <scottwood@freescale.com> wr=
ote:
>>> On 12/15/2011 08:44 PM, LiuShuo wrote:
>>>> hi Artem,
>>>> Could this patch be applied now and we make a independent patch for =
=C2=A0bad
>>>> block information
>>>> migration later?
>>>
>>> This patch is not safe to use without migration.
>>
>> Hi Scott,
>>
>> We agree it's not entirely safe without migrating the bad block flag.
>> But let's consider two sides of the situation.
>>
>> Firstly, it's only unsafe when there is a need to re-built the Bad
>> Block Table from scratch(old BBT broken).
>
> No, it's unsafe in the presence of bad blocks.
>

Instead of migrating the factory bad block markers I proposed to
modify the code of building BBT to make it different for 4K page, so
that the default BBT can correctly covers the factory bad blocks.  It
is the easiest way with nearly no harm to the functionality.

If you look at nand_default_block_markbad() in current implementation
of Linux MTD.  If we have set NAND_BBT_USE_FLASH option, which we did,
the bad block information in only updated in BBT not the oob area of
the first two pages of the bad block.  That means we are currently
only relies on the BBT for bad blocks.  If the BBT is created, the
factory bad block markers can be ignored, IMO.

> The BBT erasure issue relates to how me mark the flash as migrated, not
> whether we migrate in the first place.

It is connected to whether we do the migration at all.  I mentioned in
earlier mail that if we are doing the migration, we need to make sure
the migration only happens once.  And it need to be done before the
flash is used for the first time and before BBT is created.  If we
can't guarantee these condition, we are marking good blocks as bad by
doing the migration.  Even worse than doing nothing.

>
>> =C2=A0But currently there is no
>> easy way to do that(re-build BBT on demand),
>
> You scrub the blocks with U-Boot. =C2=A0It's not supposed to be *easy*, i=
t's
> a developer recovery mechanism.

Scrub clears the factory bad block markers also.  It is the same
result after scrub whether or not we migrated the factory bad block
markers.

>
>> Secondly, even if the previous said problem happens(BBT broken). =C2=A0W=
e
>> can still recover all the data if we overrule the bad block flag.
>
> How so? =C2=A0The bad block markers -- including ones legitimately writte=
n to
> the BBT after the fact -- are used for block skipping with certain types
> of writes. =C2=A0Without the knowledge of which blocks were marked bad, h=
ow
> do we know which blocks were skipped?

This is not supposed to be *easy*.  We might get more information in
the file system level.  Or we check the content of the blocks.

>
>> Only the card is not so good to be used again,
>
> That's a pretty crappy thing to happen every time you hit a bug during
> development.
>
> But again, that's irrelevant to whether this patch should be applied
> as-is, because we currently don't have any bad block migration at all.
>
>> however, it can be used
>> if we take the risk of losing data from errors that ECC can't
>> notice(low possibility too).
>
> Can you quantify "low possibility" here?
>
> Note that any block that *was* marked bad will have a multi-bit error
> from the marker itself, since it will be embedded in the main data area.

I found the definition of bad block from one NAND chip manual: Bad
Blocks are blocks that contain one or more invalid bits whose
reliability is not guaranteed.

There is no mentioning that the bad block has to have multi-bit error.
 Although the factory bad blocks might have worse error than wear-off
bad blocks, it's not what I can tell.

>
>> Finally, I don't think this is a blocker issue but a better to have enha=
ncement.
>
> No, it is not an enhancement. =C2=A0Processing bad block markers correctl=
y is
> a fundamental requirement. =C2=A0And if anyone *does* start using it righ=
t
> away, then we'll have to deal with their complaints if we start checking
> for a migration marker later.

I agree in some extend.  I suggested to have the code of creating
correct BBT for 4k page on first use, but not doing the migration.
Given the code we have right now.   We don't take more risk than
before, and take no functionality lose.

>
> Why is it so critical that it be merged now, and not in a few weeks (or
> next merge window) when I have a chance to do the migration code
> (assuming nobody else does it first) and add a suitable check for the
> migration marker in the Linux driver?

A few weeks might be ok.  But I feared that the merge can be further
delayed and might finally goes no where.  And as I argued above, I'm
not sure if migrating is necessary in the first place.

In general.  We are not trying to get unqualified code merged.  But I
also don't agree we need to perfect all things before any of the code
can be merged.  My understanding is that even if certain code is not
complete in feature or have certain drawbacks, if the current chunk
provided some useful features and the drawbacks are acceptable, we
should merge them and add more enhancements incrementally in the
future.  Some people don't have the luck to work on one thing for a
long time, and can't possibly finish all the enhancements in one go.
It's beneficial to merge part of the whole picture if it is acceptable
rather than wait for an uncertain time for all to be finished.

- Leo

^ permalink raw reply

* [PATCH] KVM: Move gfn_to_memslot() to kvm_host.h
From: Paul Mackerras @ 2011-12-20  9:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm, kvm-ppc

This moves gfn_to_memslot(), and the functions it calls, that is,
search_memslots() and __gfn_to_memslot(), from kvm_main.c to kvm_host.h
so that gfn_to_memslot() can be called from non-modular code even
when KVM is a module.  On powerpc, the Book3S HV style of KVM has
code that is called from real mode which needs to call gfn_to_memslot()
and thus needs this.  (Module code is allocated in the vmalloc region,
which can't be accessed in real mode.)

With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c
and thus eliminate a little bit of duplication.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |   23 ++---------------------
 include/linux/kvm_host.h            |   25 ++++++++++++++++++++++++-
 virt/kvm/kvm_main.c                 |   25 -------------------------
 3 files changed, 26 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index d3e36fc..063b00c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -21,25 +21,6 @@
 #include <asm/synch.h>
 #include <asm/ppc-opcode.h>
 
-/*
- * Since this file is built in even if KVM is a module, we need
- * a local copy of this function for the case where kvm_main.c is
- * modular.
- */
-static struct kvm_memory_slot *builtin_gfn_to_memslot(struct kvm *kvm,
-						gfn_t gfn)
-{
-	struct kvm_memslots *slots;
-	struct kvm_memory_slot *memslot;
-
-	slots = kvm_memslots(kvm);
-	kvm_for_each_memslot(memslot, slots)
-		if (gfn >= memslot->base_gfn &&
-		      gfn < memslot->base_gfn + memslot->npages)
-			return memslot;
-	return NULL;
-}
-
 /* Translate address of a vmalloc'd thing to a linear map address */
 static void *real_vmalloc_addr(void *x)
 {
@@ -97,7 +78,7 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index,
 	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 	ptel = rev->guest_rpte;
 	gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel));
-	memslot = builtin_gfn_to_memslot(kvm, gfn);
+	memslot = gfn_to_memslot(kvm, gfn);
 	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
 		return;
 
@@ -171,7 +152,7 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 	/* Find the memslot (if any) for this address */
 	gpa = (ptel & HPTE_R_RPN) & ~(psize - 1);
 	gfn = gpa >> PAGE_SHIFT;
-	memslot = builtin_gfn_to_memslot(kvm, gfn);
+	memslot = gfn_to_memslot(kvm, gfn);
 	pa = 0;
 	is_io = ~0ul;
 	rmap = NULL;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ec79a45..109828f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -429,7 +429,6 @@ int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 			      gpa_t gpa);
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
-struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
@@ -649,6 +648,30 @@ static inline void kvm_guest_exit(void)
 	current->flags &= ~PF_VCPU;
 }
 
+static inline struct kvm_memory_slot *
+search_memslots(struct kvm_memslots *slots, gfn_t gfn)
+{
+	struct kvm_memory_slot *memslot;
+
+	kvm_for_each_memslot(memslot, slots)
+		if (gfn >= memslot->base_gfn &&
+		      gfn < memslot->base_gfn + memslot->npages)
+			return memslot;
+
+	return NULL;
+}
+
+static inline struct kvm_memory_slot *
+__gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn)
+{
+	return search_memslots(slots, gfn);
+}
+
+static inline struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
+{
+	return __gfn_to_memslot(kvm_memslots(kvm), gfn);
+}
+
 static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
 {
 	return gfn_to_memslot(kvm, gfn)->id;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c144132..ef11529 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -640,19 +640,6 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 }
 #endif /* !CONFIG_S390 */
 
-static struct kvm_memory_slot *
-search_memslots(struct kvm_memslots *slots, gfn_t gfn)
-{
-	struct kvm_memory_slot *memslot;
-
-	kvm_for_each_memslot(memslot, slots)
-		if (gfn >= memslot->base_gfn &&
-		      gfn < memslot->base_gfn + memslot->npages)
-			return memslot;
-
-	return NULL;
-}
-
 static int cmp_memslot(const void *slot1, const void *slot2)
 {
 	struct kvm_memory_slot *s1, *s2;
@@ -1031,18 +1018,6 @@ int kvm_is_error_hva(unsigned long addr)
 }
 EXPORT_SYMBOL_GPL(kvm_is_error_hva);
 
-static struct kvm_memory_slot *__gfn_to_memslot(struct kvm_memslots *slots,
-						gfn_t gfn)
-{
-	return search_memslots(slots, gfn);
-}
-
-struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
-{
-	return __gfn_to_memslot(kvm_memslots(kvm), gfn);
-}
-EXPORT_SYMBOL_GPL(gfn_to_memslot);
-
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 {
 	struct kvm_memory_slot *memslot = gfn_to_memslot(kvm, gfn);
-- 
1.7.5.4

^ permalink raw reply related

* Re: [PATCH] Copy machine descriptor after probe succeed
From: Stephen Rothwell @ 2011-12-20 10:17 UTC (permalink / raw)
  To: bill4carson; +Cc: linuxppc-dev
In-Reply-To: <1324372127-8552-2-git-send-email-bill4carson@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 476 bytes --]

Hi Bill,

On Tue, 20 Dec 2011 17:08:47 +0800 bill4carson@gmail.com wrote:
>
> From: Bill Carson <bill4carson@gmail.com>
> 
> It make more sense to copy machine descriptor AFTER machine probe return
> succeed.

Some of the platform's probe routines modify the ppc_md structure and so
assume that it is has been popluated before the probe routine is called.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [RFC PATCH 0/2] KVM: PPC: Book3S HV: Report stolen time to guests
From: Paul Mackerras @ 2011-12-20 10:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc

Under pHyp, recent kernels use the dispatch trace log (DTL) to measure
stolen time.  The DTL is a ring buffer containing 48-byte entries,
where the hypervisor creates an entry each time a virtual cpu is
dispatched.  The entries contain a couple of fields that the kernel
interprets as stolen time, measured in timebase ticks.

Although this is not an ideal interface, it is one that our guest
kernels already support.  So this series of patches adds code to
Book3S HV KVM to measure stolen time and report it to the guest via a
dispatch trace log.

Stolen time is measured per virtual core (set of 4 vcpus, on POWER7)
as being all the time when no vcpu thread is executing inside
kvmppc_run_core(), or when a vcpu thread is running the virtual core
but is preempted.

The first patch fixes some potential races with the registration and
unregistration of the DTL and the other per-virtual-processor areas,
since the guest can (un)register a per-virtual-processor area for one
vcpu in a call to the H_REGISTER_VPA hypercall on another vcpu, and
hence potentially while KVM is using a previously-registered area.

The second patch adds the machinery for measuring stolen time and for
creating DTL entries.

Paul.

^ permalink raw reply

* [RFC PATCH 1/2] KVM: PPC: Book3S HV: Make virtual processor area registration more robust
From: Paul Mackerras @ 2011-12-20 10:22 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc
In-Reply-To: <20111220102142.GB5626@bloggs.ozlabs.ibm.com>

The PAPR API allows three sorts of per-virtual-processor areas to be
registered (VPA, SLB shadow buffer, and dispatch trace log), and
furthermore, these can be registered and unregistered for another
virtual CPU.  Currently we just update the vcpu fields pointing to
these areas at the time of registration or unregistration.  If this
is done on another vcpu, there is the possibility that the target vcpu
is using those fields at the time and could end up using a bogus
pointer and corrupting memory.

This fixes the race by making the target cpu itself do the update, so
we can be sure that the update happens at a time when the fields aren't
being used.  These are updated from a set of 'next_*' fields, which
are protected by a spinlock.  (We could have just taken the spinlock
when using the vpa, slb_shadow or dtl fields, but that would mean
taking the spinlock on every guest entry and exit.)

The code in do_h_register_vpa now takes the spinlock and updates the
'next_*' fields.  There is also a set of '*_pending' flags to indicate
that an update is pending.

This also changes 'struct dtl' (which was undefined) to 'struct dtl_entry',
which is what the rest of the kernel uses.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |   15 +++-
 arch/powerpc/kvm/book3s_hv.c        |  167 +++++++++++++++++++++++++----------
 2 files changed, 131 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 1cb6e52..b1126c1 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -82,7 +82,7 @@ struct kvm_vcpu;
 
 struct lppaca;
 struct slb_shadow;
-struct dtl;
+struct dtl_entry;
 
 struct kvm_vm_stat {
 	u32 remote_tlb_flush;
@@ -449,9 +449,18 @@ struct kvm_vcpu_arch {
 	u32 last_inst;
 
 	struct lppaca *vpa;
+	struct lppaca *next_vpa;
 	struct slb_shadow *slb_shadow;
-	struct dtl *dtl;
-	struct dtl *dtl_end;
+	struct slb_shadow *next_slb_shadow;
+	struct dtl_entry *dtl;
+	struct dtl_entry *dtl_end;
+	struct dtl_entry *dtl_ptr;
+	struct dtl_entry *next_dtl;
+	struct dtl_entry *next_dtl_end;
+	u8 vpa_pending;
+	u8 slb_shadow_pending;
+	u8 dtl_pending;
+	spinlock_t vpa_update_lock;
 
 	wait_queue_head_t *wqp;
 	struct kvmppc_vcore *vcore;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c11d960..6f6e88d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -140,7 +140,7 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu,
 {
 	struct kvm *kvm = vcpu->kvm;
 	unsigned long len, nb;
-	void *va;
+	void *va, *free_va, *tvpa, *dtl, *ss;
 	struct kvm_vcpu *tvcpu;
 	int err = H_PARAMETER;
 
@@ -152,6 +152,8 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu,
 	flags &= 7;
 	if (flags == 0 || flags == 4)
 		return H_PARAMETER;
+	free_va = va = NULL;
+	len = 0;
 	if (flags < 4) {
 		if (vpa & 0x7f)
 			return H_PARAMETER;
@@ -165,65 +167,122 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu,
 			len = *(unsigned short *)(va + 4);
 		else
 			len = *(unsigned int *)(va + 4);
+		free_va = va;
 		if (len > nb)
 			goto out_unpin;
-		switch (flags) {
-		case 1:		/* register VPA */
-			if (len < 640)
-				goto out_unpin;
-			if (tvcpu->arch.vpa)
-				kvmppc_unpin_guest_page(kvm, vcpu->arch.vpa);
-			tvcpu->arch.vpa = va;
-			init_vpa(vcpu, va);
-			break;
-		case 2:		/* register DTL */
-			if (len < 48)
-				goto out_unpin;
-			len -= len % 48;
-			if (tvcpu->arch.dtl)
-				kvmppc_unpin_guest_page(kvm, vcpu->arch.dtl);
-			tvcpu->arch.dtl = va;
-			tvcpu->arch.dtl_end = va + len;
+	}
+
+	spin_lock(&tvcpu->arch.vpa_update_lock);
+
+	switch (flags) {
+	case 1:		/* register VPA */
+		if (len < 640)
 			break;
-		case 3:		/* register SLB shadow buffer */
-			if (len < 16)
-				goto out_unpin;
-			if (tvcpu->arch.slb_shadow)
-				kvmppc_unpin_guest_page(kvm, vcpu->arch.slb_shadow);
-			tvcpu->arch.slb_shadow = va;
+		free_va = tvcpu->arch.next_vpa;
+		tvcpu->arch.next_vpa = va;
+		tvcpu->arch.vpa_pending = 1;
+		init_vpa(tvcpu, va);
+		err = 0;
+		break;
+	case 2:		/* register DTL */
+		if (len < 48)
 			break;
+		len -= len % 48;
+		tvpa = tvcpu->arch.vpa;
+		if (tvcpu->arch.vpa_pending)
+			tvpa = tvcpu->arch.next_vpa;
+		err = H_RESOURCE;
+		if (tvpa) {
+			free_va = tvcpu->arch.next_dtl;
+			tvcpu->arch.next_dtl = va;
+			tvcpu->arch.next_dtl_end = va + len;
+			tvcpu->arch.dtl_pending = 1;
+			err = 0;
 		}
-	} else {
-		switch (flags) {
-		case 5:		/* unregister VPA */
-			if (tvcpu->arch.slb_shadow || tvcpu->arch.dtl)
-				return H_RESOURCE;
-			if (!tvcpu->arch.vpa)
-				break;
-			kvmppc_unpin_guest_page(kvm, tvcpu->arch.vpa);
-			tvcpu->arch.vpa = NULL;
-			break;
-		case 6:		/* unregister DTL */
-			if (!tvcpu->arch.dtl)
-				break;
-			kvmppc_unpin_guest_page(kvm, tvcpu->arch.dtl);
-			tvcpu->arch.dtl = NULL;
-			break;
-		case 7:		/* unregister SLB shadow buffer */
-			if (!tvcpu->arch.slb_shadow)
-				break;
-			kvmppc_unpin_guest_page(kvm, tvcpu->arch.slb_shadow);
-			tvcpu->arch.slb_shadow = NULL;
+		break;
+	case 3:		/* register SLB shadow buffer */
+		if (len < 16)
 			break;
+		tvpa = tvcpu->arch.vpa;
+		if (tvcpu->arch.vpa_pending)
+			tvpa = tvcpu->arch.next_vpa;
+		err = H_RESOURCE;
+		if (tvpa) {
+			free_va = tvcpu->arch.next_slb_shadow;
+			tvcpu->arch.next_slb_shadow = va;
+			tvcpu->arch.slb_shadow_pending = 1;
+			err = 0;
+		}
+		break;
+
+	case 5:		/* unregister VPA */
+		dtl = tvcpu->arch.dtl;
+		if (tvcpu->arch.dtl_pending)
+			dtl = tvcpu->arch.next_dtl;
+		ss = tvcpu->arch.slb_shadow;
+		if (tvcpu->arch.slb_shadow_pending)
+			ss = tvcpu->arch.next_slb_shadow;
+		err = H_RESOURCE;
+		if (!dtl && !ss) {
+			free_va = tvcpu->arch.next_vpa;
+			tvcpu->arch.next_vpa = NULL;
+			tvcpu->arch.vpa_pending = 1;
+			err = 0;
 		}
+		break;
+	case 6:		/* unregister DTL */
+		free_va = tvcpu->arch.next_dtl;
+		tvcpu->arch.next_dtl = NULL;
+		tvcpu->arch.dtl_pending = 1;
+		err = 0;
+		break;
+	case 7:		/* unregister SLB shadow buffer */
+		free_va = tvcpu->arch.next_slb_shadow;
+		tvcpu->arch.next_slb_shadow = NULL;
+		tvcpu->arch.slb_shadow_pending = 1;
+		err = 0;
+		break;
 	}
-	return H_SUCCESS;
 
+	spin_unlock(&tvcpu->arch.vpa_update_lock);
  out_unpin:
-	kvmppc_unpin_guest_page(kvm, va);
+	if (free_va)
+		kvmppc_unpin_guest_page(kvm, free_va);
 	return err;
 }
 
+static void kvmppc_update_vpas(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+
+	spin_lock(&vcpu->arch.vpa_update_lock);
+	if (vcpu->arch.vpa_pending) {
+		if (vcpu->arch.vpa)
+			kvmppc_unpin_guest_page(kvm, vcpu->arch.vpa);
+		vcpu->arch.vpa = vcpu->arch.next_vpa;
+		vcpu->arch.next_vpa = NULL;
+		vcpu->arch.vpa_pending = 0;
+	}
+	if (vcpu->arch.slb_shadow_pending) {
+		if (vcpu->arch.slb_shadow)
+			kvmppc_unpin_guest_page(kvm, vcpu->arch.slb_shadow);
+		vcpu->arch.slb_shadow = vcpu->arch.next_slb_shadow;
+		vcpu->arch.next_slb_shadow = NULL;
+		vcpu->arch.slb_shadow_pending = 0;
+	}
+	if (vcpu->arch.dtl_pending) {
+		if (vcpu->arch.dtl)
+			kvmppc_unpin_guest_page(kvm, vcpu->arch.dtl);
+		vcpu->arch.dtl = vcpu->arch.dtl_ptr = vcpu->arch.next_dtl;
+		vcpu->arch.dtl_end = vcpu->arch.next_dtl_end;
+		vcpu->arch.next_dtl = NULL;
+		vcpu->arch.dtl_pending = 0;
+		if (vcpu->arch.vpa)	/* (should always be non-NULL) */
+			vcpu->arch.vpa->dtl_idx = 0;
+	}
+	spin_unlock(&vcpu->arch.vpa_update_lock);
+}
+
 int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 {
 	unsigned long req = kvmppc_get_gpr(vcpu, 3);
@@ -509,12 +568,20 @@ out:
 
 void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
+	spin_lock(&vcpu->arch.vpa_update_lock);
 	if (vcpu->arch.dtl)
 		kvmppc_unpin_guest_page(vcpu->kvm, vcpu->arch.dtl);
+	if (vcpu->arch.dtl_pending && vcpu->arch.next_dtl)
+		kvmppc_unpin_guest_page(vcpu->kvm, vcpu->arch.next_dtl);
 	if (vcpu->arch.slb_shadow)
 		kvmppc_unpin_guest_page(vcpu->kvm, vcpu->arch.slb_shadow);
+	if (vcpu->arch.slb_shadow_pending && vcpu->arch.next_slb_shadow)
+		kvmppc_unpin_guest_page(vcpu->kvm, vcpu->arch.next_slb_shadow);
 	if (vcpu->arch.vpa)
 		kvmppc_unpin_guest_page(vcpu->kvm, vcpu->arch.vpa);
+	if (vcpu->arch.vpa_pending && vcpu->arch.next_vpa)
+		kvmppc_unpin_guest_page(vcpu->kvm, vcpu->arch.next_vpa);
+	spin_unlock(&vcpu->arch.vpa_update_lock);
 	kvm_vcpu_uninit(vcpu);
 	kfree(vcpu);
 }
@@ -681,8 +748,12 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	vc->in_guest = 0;
 	vc->pcpu = smp_processor_id();
 	vc->napping_threads = 0;
-	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
+	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		kvmppc_start_thread(vcpu);
+		if (vcpu->arch.vpa_pending || vcpu->arch.slb_shadow_pending ||
+		    vcpu->arch.dtl_pending)
+			kvmppc_update_vpas(vcpu);
+	}
 
 	preempt_disable();
 	spin_unlock(&vc->lock);
-- 
1.7.7.3

^ permalink raw reply related

* [KVM PATCH 2/2] KVM: PPC: Book3S HV: Report stolen time to guest through dispatch trace log
From: Paul Mackerras @ 2011-12-20 10:37 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc
In-Reply-To: <20111220102142.GB5626@bloggs.ozlabs.ibm.com>

This adds code to measure "stolen" time per virtual core in units of
timebase ticks, and to report the stolen time to the guest using the
dispatch trace log (DTL).  The guest can register an area of memory
for the DTL for a given vcpu.  The DTL is a ring buffer where KVM
fills in one entry every time it enters the guest for that vcpu.

Stolen time is measured as time when the virtual core is not running,
either because the vcore is not runnable (e.g. some of its vcpus are
executing elsewhere in the kernel or in userspace), or when the vcpu
thread that is running the vcore is preempted.  This includes time
when all the vcpus are idle (i.e. have executed the H_CEDE hypercall),
which is OK because the guest accounts stolen time while idle as idle
time.

Each vcpu keeps a record of how much stolen time has been reported to
the guest for that vcpu so far.  When we are about to enter the guest,
we create a new DTL entry (if the guest vcpu has a DTL) and report the
difference between total stolen time for the vcore and stolen time
reported so far for the vcpu as the "enqueue to dispatch" time in the
DTL entry.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |    4 +++
 arch/powerpc/kvm/book3s_hv.c        |   43 ++++++++++++++++++++++++++++++++++-
 2 files changed, 46 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index b1126c1..3c5ec79 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -258,6 +258,9 @@ struct kvmppc_vcore {
 	struct list_head runnable_threads;
 	spinlock_t lock;
 	wait_queue_head_t wq;
+	u64 stolen_tb;
+	u64 preempt_tb;
+	struct kvm_vcpu *runner;
 };
 
 #define VCORE_ENTRY_COUNT(vc)	((vc)->entry_exit_count & 0xff)
@@ -461,6 +464,7 @@ struct kvm_vcpu_arch {
 	u8 slb_shadow_pending;
 	u8 dtl_pending;
 	spinlock_t vpa_update_lock;
+	u64 stolen_logged;
 
 	wait_queue_head_t *wqp;
 	struct kvmppc_vcore *vcore;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6f6e88d..b835df7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -60,12 +60,20 @@ static int kvmppc_hv_setup_rma(struct kvm_vcpu *vcpu);
 
 void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+	struct kvmppc_vcore *vc = vcpu->arch.vcore;
+
 	local_paca->kvm_hstate.kvm_vcpu = vcpu;
-	local_paca->kvm_hstate.kvm_vcore = vcpu->arch.vcore;
+	local_paca->kvm_hstate.kvm_vcore = vc;
+	if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE)
+		vc->stolen_tb += mftb() - vc->preempt_tb;
 }
 
 void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	struct kvmppc_vcore *vc = vcpu->arch.vcore;
+
+	if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE)
+		vc->preempt_tb = mftb();
 }
 
 void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
@@ -283,6 +291,31 @@ static void kvmppc_update_vpas(struct kvm_vcpu *vcpu)
 	spin_unlock(&vcpu->arch.vpa_update_lock);
 }
 
+static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
+				    struct kvmppc_vcore *vc)
+{
+	struct dtl_entry *dt;
+	struct lppaca *vpa;
+
+	dt = vcpu->arch.dtl_ptr;
+	vpa = vcpu->arch.vpa;
+	if (!dt || !vpa)
+		return;
+	memset(dt, 0, sizeof(struct dtl_entry));
+	dt->dispatch_reason = 7;
+	dt->processor_id = vc->pcpu + vcpu->arch.ptid;
+	dt->timebase = mftb();
+	dt->enqueue_to_dispatch_time = vc->stolen_tb - vcpu->arch.stolen_logged;
+	dt->srr0 = kvmppc_get_pc(vcpu);
+	dt->srr1 = vcpu->arch.shregs.msr;
+	vcpu->arch.stolen_logged = vc->stolen_tb;
+	++dt;
+	if (dt == vcpu->arch.dtl_end)
+		dt = vcpu->arch.dtl;
+	vcpu->arch.dtl_ptr = dt;
+	++vpa->dtl_idx;
+}
+
 int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 {
 	unsigned long req = kvmppc_get_gpr(vcpu, 3);
@@ -542,6 +575,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 			INIT_LIST_HEAD(&vcore->runnable_threads);
 			spin_lock_init(&vcore->lock);
 			init_waitqueue_head(&vcore->wq);
+			vcore->preempt_tb = mftb();
 		}
 		kvm->arch.vcores[core] = vcore;
 	}
@@ -554,6 +588,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 	++vcore->num_threads;
 	spin_unlock(&vcore->lock);
 	vcpu->arch.vcore = vcore;
+	vcpu->arch.stolen_logged = vcore->stolen_tb;
 
 	vcpu->arch.cpu_type = KVM_CPU_3S_64;
 	kvmppc_sanity_check(vcpu);
@@ -745,6 +780,7 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	vc->nap_count = 0;
 	vc->entry_exit_count = 0;
 	vc->vcore_state = VCORE_RUNNING;
+	vc->stolen_tb += mftb() - vc->preempt_tb;
 	vc->in_guest = 0;
 	vc->pcpu = smp_processor_id();
 	vc->napping_threads = 0;
@@ -753,6 +789,8 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 		if (vcpu->arch.vpa_pending || vcpu->arch.slb_shadow_pending ||
 		    vcpu->arch.dtl_pending)
 			kvmppc_update_vpas(vcpu);
+		if (vcpu->arch.dtl_ptr)
+			kvmppc_create_dtl_entry(vcpu, vc);
 	}
 
 	preempt_disable();
@@ -805,6 +843,7 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	spin_lock(&vc->lock);
  out:
 	vc->vcore_state = VCORE_INACTIVE;
+	vc->preempt_tb = mftb();
 	list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
 				 arch.run_list) {
 		if (vcpu->arch.ret != RESUME_GUEST) {
@@ -903,6 +942,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 			spin_lock(&vc->lock);
 			continue;
 		}
+		vc->runner = vcpu;
 		n_ceded = 0;
 		list_for_each_entry(v, &vc->runnable_threads, arch.run_list)
 			n_ceded += v->arch.ceded;
@@ -922,6 +962,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 				wake_up(&v->arch.cpu_run);
 			}
 		}
+		vc->runner = NULL;
 	}
 
 	if (signal_pending(current)) {
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH 1/3] powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board
From: Vinh Nguyen Huu Tuong @ 2011-12-20 12:43 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Josh Boyer, Matt Porter,
	Kumar Gala, Paul Gortmaker, Anton Blanchard, Dave Kleikamp,
	Grant Likely, Tony Breeds, Rob Herring, Jiri Kosina,
	Lucas De Marchi, Ayman El-Khashab, linuxppc-dev, linux-kernel
  Cc: Vinh Nguyen Huu Tuong

This patch consists of:
- Fix the pvr mask for checking pvr in cputable.c
- Fix the cpu name as consistent with cpu name is describled in dts file

Signed-off-by: Vinh Nguyen Huu Tuong <vhtnguyen@apm.com>
---
 arch/powerpc/kernel/cputable.c             |    2 +-
 arch/powerpc/platforms/44x/ppc44x_simple.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index edae5bb..6a5a9a8 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -1803,7 +1803,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.platform		= "ppc440",
 	},
 	{ /* 464 in APM821xx */
-		.pvr_mask		= 0xffffff00,
+		.pvr_mask		= 0xfffffff0,
 		.pvr_value		= 0x12C41C80,
 		.cpu_name		= "APM821XX",
 		.cpu_features		= CPU_FTRS_44X,
diff --git a/arch/powerpc/platforms/44x/ppc44x_simple.c b/arch/powerpc/platforms/44x/ppc44x_simple.c
index 8d22027..3ffb915 100644
--- a/arch/powerpc/platforms/44x/ppc44x_simple.c
+++ b/arch/powerpc/platforms/44x/ppc44x_simple.c
@@ -52,7 +52,7 @@ machine_device_initcall(ppc44x_simple, ppc44x_device_probe);
 static char *board[] __initdata = {
 	"amcc,arches",
 	"amcc,bamboo",
-	"amcc,bluestone",
+	"apm,bluestone",
 	"amcc,glacier",
 	"ibm,ebony",
 	"amcc,eiger",
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 2/3] powerpc/44x: Add additional support for APM821xx SoC and Bluestone board
From: Vinh Nguyen Huu Tuong @ 2011-12-20 12:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Josh Boyer, Matt Porter,
	Kumar Gala, Paul Gortmaker, Anton Blanchard, Dave Kleikamp,
	Grant Likely, Tony Breeds, Rob Herring, Jiri Kosina,
	Lucas De Marchi, Ayman El-Khashab, linuxppc-dev, linux-kernel
  Cc: Vinh Nguyen Huu Tuong

This patch updates the dts file for bluestone board with support:
- UART1
- L2 cache
- NAND with NDFC
- PCI-E

Signed-off-by: Vinh Nguyen Huu Tuong <vhtnguyen@apm.com>
---
 arch/powerpc/boot/dts/bluestone.dts |  127 ++++++++++++++++++++++++++++++++++-
 1 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/dts/bluestone.dts b/arch/powerpc/boot/dts/bluestone.dts
index 2a56a0d..cfa23bf 100644
--- a/arch/powerpc/boot/dts/bluestone.dts
+++ b/arch/powerpc/boot/dts/bluestone.dts
@@ -33,7 +33,7 @@
 	aliases {
 		ethernet0 = &EMAC0;
 		serial0 = &UART0;
-		//serial1 = &UART1; --gcl missing UART1 label
+		serial1 = &UART1;
 	};
 
 	cpus {
@@ -52,7 +52,7 @@
 			d-cache-size = <32768>;
 			dcr-controller;
 			dcr-access-method = "native";
-			//next-level-cache = <&L2C0>; --gcl missing L2C0 label
+			next-level-cache = <&L2C0>;
 		};
 	};
 
@@ -117,6 +117,16 @@
 		dcr-reg = <0x00c 0x002>;
 	};
 
+	L2C0: l2c {
+		compatible = "ibm,l2-cache-apm82181", "ibm,l2-cache";
+		dcr-reg = <0x020 0x008
+			   0x030 0x008>;
+		cache-line-size = <32>;
+		cache-size = <262144>;
+		interrupt-parent = <&UIC1>;
+		interrupts = <11 1>;
+	};
+
 	plb {
 		compatible = "ibm,plb4";
 		#address-cells = <2>;
@@ -182,6 +192,53 @@
 						reg = <0x001a0000 0x00060000>;
 					};
 				};
+
+				ndfc@1,0 {
+					compatible = "ibm,ndfc";
+					reg = <0x00000003 0x00000000 0x00002000>;
+					ccr = <0x00001000>;
+					bank-settings = <0x80002222>;
+					#address-cells = <1>;
+					#size-cells = <1>;
+					/* 2Gb Nand Flash */
+					nand {
+						#address-cells = <1>;
+						#size-cells = <1>;
+
+						partition@0 {
+							label = "firmware";
+							reg   = <0x00000000 0x00C00000>;
+						};
+						partition@c00000 {
+							label = "environment";
+							reg   = <0x00C00000 0x00B00000>;
+						};
+						partition@1700000 {
+							label = "kernel";
+							reg   = <0x01700000 0x00E00000>;
+						};
+						partition@2500000 {
+							label = "root";
+							reg   = <0x02500000 0x08200000>;
+						};
+						partition@a700000 {
+							label = "device-tree";
+							reg   = <0x0A700000 0x00B00000>;
+						};
+						partition@b200000 {
+							label = "config";
+							reg   = <0x0B200000 0x00D00000>;
+						};
+						partition@bf00000 {
+							label = "diag";
+							reg   = <0x0BF00000 0x00C00000>;
+						};
+						partition@cb00000 {
+							label = "vendor";
+							reg   = <0x0CB00000 0x3500000>;
+						};
+					};
+				};
 			};
 
 			UART0: serial@ef600300 {
@@ -195,11 +252,36 @@
 				interrupts = <0x1 0x4>;
 			};
 
+			UART1: serial@ef600400 {
+				device_type = "serial";
+				compatible = "ns16550";
+				reg = <0xef600400 0x00000008>;
+				virtual-reg = <0xef600400>;
+				clock-frequency = <0>; /* Filled in by U-Boot */
+				current-speed = <0>; /* Filled in by U-Boot */
+				interrupt-parent = <&UIC0>;
+				interrupts = <0x1 0x4>;
+			};
+
 			IIC0: i2c@ef600700 {
 				compatible = "ibm,iic";
 				reg = <0xef600700 0x00000014>;
 				interrupt-parent = <&UIC0>;
 				interrupts = <0x2 0x4>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+				rtc@68 {
+					compatible = "stm,m41t80";
+					reg = <0x68>;
+					interrupt-parent = <&UIC0>;
+					interrupts = <0x9 0x8>;
+				};
+				sttm@4C {
+					compatible = "adm,adm1032";
+					reg = <0x4C>;
+					interrupt-parent = <&UIC1>;
+					interrupts = <0x1E 0x8>; /* CPU_THERNAL_L */
+				};
 			};
 
 			IIC1: i2c@ef600800 {
@@ -250,5 +332,46 @@
 			};
 		};
 
+		PCIE0: pciex@d00000000 {
+			device_type = "pci";
+			#interrupt-cells = <1>;
+			#size-cells = <2>;
+			#address-cells = <3>;
+			compatible = "ibm,plb-pciex-apm821xx", "ibm,plb-pciex";
+			primary;
+			port = <0x0>; /* port number */
+			reg = <0x0000000d 0x00000000 0x20000000	/* Config space access */
+			       0x0000000c 0x08010000 0x00001000>;	/* Registers */
+			dcr-reg = <0x100 0x020>;
+			sdr-base = <0x300>;
+
+			/* Outbound ranges, one memory and one IO,
+			 * later cannot be changed
+			 */
+			ranges = <0x02000000 0x00000000 0x80000000 0x0000000e 0x00000000 0x00000000 0x80000000
+				  0x02000000 0x00000000 0x00000000 0x0000000f 0x00000000 0x00000000 0x00100000
+				  0x01000000 0x00000000 0x00000000 0x0000000f 0x80000000 0x00000000 0x00010000>;
+
+			/* Inbound 2GB range starting at 0 */
+			dma-ranges = <0x42000000 0x0 0x0 0x0 0x0 0x0 0x80000000>;
+
+			/* This drives busses 40 to 0x7f */
+			bus-range = <0x40 0x7f>;
+
+			/* Legacy interrupts (note the weird polarity, the bridge seems
+			 * to invert PCIe legacy interrupts).
+			 * We are de-swizzling here because the numbers are actually for
+			 * port of the root complex virtual P2P bridge. But I want
+			 * to avoid putting a node for it in the tree, so the numbers
+			 * below are basically de-swizzled numbers.
+			 * The real slot is on idsel 0, so the swizzling is 1:1
+			 */
+			interrupt-map-mask = <0x0 0x0 0x0 0x7>;
+			interrupt-map = <
+				0x0 0x0 0x0 0x1 &UIC3 0xc 0x4 /* swizzled int A */
+				0x0 0x0 0x0 0x2 &UIC3 0xd 0x4 /* swizzled int B */
+				0x0 0x0 0x0 0x3 &UIC3 0xe 0x4 /* swizzled int C */
+				0x0 0x0 0x0 0x4 &UIC3 0xf 0x4 /* swizzled int D */>;
+		};
 	};
 };
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 3/3] powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board
From: Vinh Nguyen Huu Tuong @ 2011-12-20 12:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Josh Boyer, Matt Porter,
	Kumar Gala, Paul Gortmaker, Anton Blanchard, Dave Kleikamp,
	Grant Likely, Tony Breeds, Rob Herring, Jiri Kosina,
	Lucas De Marchi, Ayman El-Khashab, linuxppc-dev, linux-kernel
  Cc: Vinh Nguyen Huu Tuong

This patch extends PCI-E driver to support PCI-E for APM821xx SoC on Bluestone board.

Signed-off-by: Vinh Nguyen Huu Tuong <vhtnguyen@apm.com>
---
 arch/powerpc/platforms/44x/Kconfig |    1 +
 arch/powerpc/sysdev/ppc4xx_pci.c   |  109 +++++++++++++++++++++++++++++++++++-
 arch/powerpc/sysdev/ppc4xx_pci.h   |    4 +
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/Kconfig
index 762322c..cd62377 100644
--- a/arch/powerpc/platforms/44x/Kconfig
+++ b/arch/powerpc/platforms/44x/Kconfig
@@ -23,6 +23,7 @@ config BLUESTONE
 	default n
 	select PPC44x_SIMPLE
 	select APM821xx
+	select PPC4xx_PCI_EXPRESS
 	select IBM_EMAC_RGMII
 	help
 	  This option enables support for the APM APM821xx Evaluation board.
diff --git a/arch/powerpc/sysdev/ppc4xx_pci.c b/arch/powerpc/sysdev/ppc4xx_pci.c
index 862f11b..4e866a5 100644
--- a/arch/powerpc/sysdev/ppc4xx_pci.c
+++ b/arch/powerpc/sysdev/ppc4xx_pci.c
@@ -1040,6 +1040,109 @@ static struct ppc4xx_pciex_hwops ppc460ex_pcie_hwops __initdata =
 	.check_link	= ppc4xx_pciex_check_link_sdr,
 };
 
+static int __init apm821xx_pciex_core_init(struct device_node *np)
+{
+	/* Return the number of pcie port */
+	return 1;
+}
+
+static int apm821xx_pciex_init_port_hw(struct ppc4xx_pciex_port *port)
+{
+	u32 val;
+	u32 utlset1;
+	u32 timeout;
+
+	/*
+	 * Do a software reset on PCIe ports.
+	 * This code is to fix the issue that pci drivers doesn't re-assign
+	 * bus number for PCIE devices after Uboot
+	 * scanned and configured all the buses (eg. PCIE NIC IntelPro/1000
+	 * PT quad port, SAS LSI 1064E)
+	 */
+
+	mtdcri(SDR0, PESDR0_460EX_PHY_CTL_RST + (port->index * 0x55), 0x0);
+	mdelay(10);
+
+	if (port->endpoint)
+		val = PTYPE_LEGACY_ENDPOINT << 20;
+	else
+		val = PTYPE_ROOT_PORT << 20;
+
+	if (port->index == 0) {
+		val |= LNKW_X1 << 12;
+		utlset1 = 0x00000000;
+	} else {
+		val |= LNKW_X4 << 12;
+		utlset1 = 0x20101101;
+	}
+
+	mtdcri(SDR0, port->sdr_base + PESDRn_DLPSET, val);
+	mtdcri(SDR0, port->sdr_base + PESDRn_UTLSET1, utlset1);
+	mtdcri(SDR0, port->sdr_base + PESDRn_UTLSET2, 0x01010000);
+
+	switch (port->index) {
+	case 0:
+		mtdcri(SDR0, PESDR0_460EX_L0CDRCTL, 0x00003230);
+		mtdcri(SDR0, PESDR0_460EX_L0DRV, 0x00000130);
+		mtdcri(SDR0, PESDR0_460EX_L0CLK, 0x00000006);
+
+		mtdcri(SDR0, PESDR0_460EX_PHY_CTL_RST, 0x10000000);
+		mdelay(50);
+		mtdcri(SDR0, PESDR0_460EX_PHY_CTL_RST, 0x30000000);
+		break;
+
+	case 1:
+		mtdcri(SDR0, PESDR1_460EX_L0CDRCTL, 0x00003230);
+		mtdcri(SDR0, PESDR1_460EX_L1CDRCTL, 0x00003230);
+		mtdcri(SDR0, PESDR1_460EX_L2CDRCTL, 0x00003230);
+		mtdcri(SDR0, PESDR1_460EX_L3CDRCTL, 0x00003230);
+		mtdcri(SDR0, PESDR1_460EX_L0DRV, 0x00000130);
+		mtdcri(SDR0, PESDR1_460EX_L1DRV, 0x00000130);
+		mtdcri(SDR0, PESDR1_460EX_L2DRV, 0x00000130);
+		mtdcri(SDR0, PESDR1_460EX_L3DRV, 0x00000130);
+		mtdcri(SDR0, PESDR1_460EX_L0CLK, 0x00000006);
+		mtdcri(SDR0, PESDR1_460EX_L1CLK, 0x00000006);
+		mtdcri(SDR0, PESDR1_460EX_L2CLK, 0x00000006);
+		mtdcri(SDR0, PESDR1_460EX_L3CLK, 0x00000006);
+
+		mtdcri(SDR0, PESDR1_460EX_PHY_CTL_RST, 0x10000000);
+		break;
+	}
+
+	mtdcri(SDR0, port->sdr_base + PESDRn_RCSSET,
+		mfdcri(SDR0, port->sdr_base + PESDRn_RCSSET) |
+		(PESDRx_RCSSET_RSTGU | PESDRx_RCSSET_RSTPYN));
+
+	/* Poll for PHY reset */
+	timeout = 0;
+	while ((!(mfdcri(SDR0, PESDR0_460EX_RSTSTA +
+			(port->index * 0x55)) & 0x1)) &&
+		 (timeout < PCIE_PHY_RESET_TIMEOUT)) {
+		udelay(10);
+		timeout++;
+	}
+
+	if (timeout < PCIE_PHY_RESET_TIMEOUT) {
+		mtdcri(SDR0, port->sdr_base + PESDRn_RCSSET,
+			(mfdcri(SDR0, port->sdr_base + PESDRn_RCSSET) &
+			~(PESDRx_RCSSET_RSTGU | PESDRx_RCSSET_RSTDL)) |
+			PESDRx_RCSSET_RSTPYN);
+
+		port->has_ibpre = 1;
+
+		return 0;
+	} else {
+		printk(KERN_INFO "PCIE: Can't reset PHY\n");
+		return -1;
+	}
+}
+
+static struct ppc4xx_pciex_hwops apm821xx_pcie_hwops __initdata = {
+	.core_init	= apm821xx_pciex_core_init,
+	.port_init_hw	= apm821xx_pciex_init_port_hw,
+	.setup_utl	= ppc460ex_pciex_init_utl,
+};
+
 static int __init ppc460sx_pciex_core_init(struct device_node *np)
 {
 	/* HSS drive amplitude */
@@ -1304,6 +1407,8 @@ static int __init ppc4xx_pciex_check_core_init(struct device_node *np)
 		ppc4xx_pciex_hwops = &ppc460ex_pcie_hwops;
 	if (of_device_is_compatible(np, "ibm,plb-pciex-460sx"))
 		ppc4xx_pciex_hwops = &ppc460sx_pcie_hwops;
+	if (of_device_is_compatible(np, "ibm,plb-pciex-apm821xx"))
+		ppc4xx_pciex_hwops = &apm821xx_pcie_hwops;
 #endif /* CONFIG_44x    */
 #ifdef CONFIG_40x
 	if (of_device_is_compatible(np, "ibm,plb-pciex-405ex"))
@@ -1751,9 +1856,9 @@ static void __init ppc4xx_configure_pciex_PIMs(struct ppc4xx_pciex_port *port,
 		 * if it works
 		 */
 		out_le32(mbase + PECFG_PIM0LAL, 0x00000000);
-		out_le32(mbase + PECFG_PIM0LAH, 0x00000000);
+		out_le32(mbase + PECFG_PIM0LAH, 0x00000008); /* Moving on HB */
 		out_le32(mbase + PECFG_PIM1LAL, 0x00000000);
-		out_le32(mbase + PECFG_PIM1LAH, 0x00000000);
+		out_le32(mbase + PECFG_PIM1LAH, 0x0000000c); /* Moving on HB */
 		out_le32(mbase + PECFG_PIM01SAH, 0xffff0000);
 		out_le32(mbase + PECFG_PIM01SAL, 0x00000000);
 
diff --git a/arch/powerpc/sysdev/ppc4xx_pci.h b/arch/powerpc/sysdev/ppc4xx_pci.h
index 32ce763..faf3017 100644
--- a/arch/powerpc/sysdev/ppc4xx_pci.h
+++ b/arch/powerpc/sysdev/ppc4xx_pci.h
@@ -441,6 +441,7 @@
 /*
  * Config space register offsets
  */
+#define PECFG_ECDEVCTL		0x060
 #define PECFG_ECRTCTL		0x074
 
 #define PECFG_BAR0LMPA		0x210
@@ -448,6 +449,7 @@
 #define PECFG_BAR1MPA		0x218
 #define PECFG_BAR2LMPA		0x220
 #define PECFG_BAR2HMPA		0x224
+#define PECFG_ECDEVCAPPA	0x25c
 
 #define PECFG_PIMEN		0x33c
 #define PECFG_PIM0LAL		0x340
@@ -494,5 +496,7 @@ enum
 	LNKW_X8			= 0x8
 };
 
+/* Timout for reset phy */
+#define PCIE_PHY_RESET_TIMEOUT 10
 
 #endif /* __PPC4XX_PCI_H__ */
-- 
1.7.2.5

^ permalink raw reply related

* Re: [PATCH 3/3] powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board
From: Josh Boyer @ 2011-12-20 15:31 UTC (permalink / raw)
  To: Vinh Nguyen Huu Tuong
  Cc: Ayman El-Khashab, Dave Kleikamp, Lucas De Marchi, Rob Herring,
	Paul Gortmaker, Paul Mackerras, Anton Blanchard, Jiri Kosina,
	linuxppc-dev, linux-kernel
In-Reply-To: <1324385081-30824-1-git-send-email-vhtnguyen@apm.com>

On Tue, Dec 20, 2011 at 7:44 AM, Vinh Nguyen Huu Tuong
<vhtnguyen@apm.com> wrote:
> This patch extends PCI-E driver to support PCI-E for APM821xx SoC on Blue=
stone board.
>
> Signed-off-by: Vinh Nguyen Huu Tuong <vhtnguyen@apm.com>

> +static int apm821xx_pciex_init_port_hw(struct ppc4xx_pciex_port *port)
> +{
> + =A0 =A0 =A0 u32 val;
> + =A0 =A0 =A0 u32 utlset1;
> + =A0 =A0 =A0 u32 timeout;
> +
> + =A0 =A0 =A0 /*
> + =A0 =A0 =A0 =A0* Do a software reset on PCIe ports.
> + =A0 =A0 =A0 =A0* This code is to fix the issue that pci drivers doesn't=
 re-assign
> + =A0 =A0 =A0 =A0* bus number for PCIE devices after Uboot
> + =A0 =A0 =A0 =A0* scanned and configured all the buses (eg. PCIE NIC Int=
elPro/1000
> + =A0 =A0 =A0 =A0* PT quad port, SAS LSI 1064E)
> + =A0 =A0 =A0 =A0*/
> +
> + =A0 =A0 =A0 mtdcri(SDR0, PESDR0_460EX_PHY_CTL_RST + (port->index * 0x55=
), 0x0);
> + =A0 =A0 =A0 mdelay(10);
> +
> + =A0 =A0 =A0 if (port->endpoint)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 val =3D PTYPE_LEGACY_ENDPOINT << 20;
> + =A0 =A0 =A0 else
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 val =3D PTYPE_ROOT_PORT << 20;
> +
> + =A0 =A0 =A0 if (port->index =3D=3D 0) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 val |=3D LNKW_X1 << 12;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 utlset1 =3D 0x00000000;
> + =A0 =A0 =A0 } else {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 val |=3D LNKW_X4 << 12;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 utlset1 =3D 0x20101101;
> + =A0 =A0 =A0 }
> +
> + =A0 =A0 =A0 mtdcri(SDR0, port->sdr_base + PESDRn_DLPSET, val);
> + =A0 =A0 =A0 mtdcri(SDR0, port->sdr_base + PESDRn_UTLSET1, utlset1);
> + =A0 =A0 =A0 mtdcri(SDR0, port->sdr_base + PESDRn_UTLSET2, 0x01010000);
> +
> + =A0 =A0 =A0 switch (port->index) {
> + =A0 =A0 =A0 case 0:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR0_460EX_L0CDRCTL, 0x00003=
230);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR0_460EX_L0DRV, 0x00000130=
);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR0_460EX_L0CLK, 0x00000006=
);
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR0_460EX_PHY_CTL_RST, 0x10=
000000);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mdelay(50);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR0_460EX_PHY_CTL_RST, 0x30=
000000);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> +
> + =A0 =A0 =A0 case 1:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L0CDRCTL, 0x00003=
230);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L1CDRCTL, 0x00003=
230);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L2CDRCTL, 0x00003=
230);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L3CDRCTL, 0x00003=
230);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L0DRV, 0x00000130=
);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L1DRV, 0x00000130=
);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L2DRV, 0x00000130=
);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L3DRV, 0x00000130=
);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L0CLK, 0x00000006=
);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L1CLK, 0x00000006=
);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L2CLK, 0x00000006=
);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_L3CLK, 0x00000006=
);
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, PESDR1_460EX_PHY_CTL_RST, 0x10=
000000);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> + =A0 =A0 =A0 }

Do we need a default case here to catch oddness and exit the function?

> +
> + =A0 =A0 =A0 mtdcri(SDR0, port->sdr_base + PESDRn_RCSSET,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mfdcri(SDR0, port->sdr_base + PESDRn_RCSSET=
) |
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 (PESDRx_RCSSET_RSTGU | PESDRx_RCSSET_RSTPYN=
));
> +
> + =A0 =A0 =A0 /* Poll for PHY reset */
> + =A0 =A0 =A0 timeout =3D 0;
> + =A0 =A0 =A0 while ((!(mfdcri(SDR0, PESDR0_460EX_RSTSTA +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (port->index * 0x55)) & 0x1=
)) &&
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(timeout < PCIE_PHY_RESET_TIMEOUT)) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 udelay(10);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 timeout++;
> + =A0 =A0 =A0 }
> +
> + =A0 =A0 =A0 if (timeout < PCIE_PHY_RESET_TIMEOUT) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtdcri(SDR0, port->sdr_base + PESDRn_RCSSET=
,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (mfdcri(SDR0, port->sdr_bas=
e + PESDRn_RCSSET) &
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ~(PESDRx_RCSSET_RSTGU | PES=
DRx_RCSSET_RSTDL)) |
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 PESDRx_RCSSET_RSTPYN);
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 port->has_ibpre =3D 1;
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 0;
> + =A0 =A0 =A0 } else {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 printk(KERN_INFO "PCIE: Can't reset PHY\n")=
;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -1;
> + =A0 =A0 =A0 }

If we can't reset the PHY, does this whole function essentially fail?
Do the devices not get renumbered, etc?  If so, you probably want to
make that KERN_ERR.

> @@ -1751,9 +1856,9 @@ static void __init ppc4xx_configure_pciex_PIMs(stru=
ct ppc4xx_pciex_port *port,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 * if it works
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 */
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0out_le32(mbase + PECFG_PIM0LAL, 0x00000000=
);
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 out_le32(mbase + PECFG_PIM0LAH, 0x00000000)=
;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 out_le32(mbase + PECFG_PIM0LAH, 0x00000008)=
; /* Moving on HB */
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0out_le32(mbase + PECFG_PIM1LAL, 0x00000000=
);
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 out_le32(mbase + PECFG_PIM1LAH, 0x00000000)=
;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 out_le32(mbase + PECFG_PIM1LAH, 0x0000000c)=
; /* Moving on HB */
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0out_le32(mbase + PECFG_PIM01SAH, 0xffff000=
0);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0out_le32(mbase + PECFG_PIM01SAL, 0x0000000=
0);

Why are these values changed, and are those changes only needed on APM821xx=
?

> diff --git a/arch/powerpc/sysdev/ppc4xx_pci.h b/arch/powerpc/sysdev/ppc4x=
x_pci.h
> index 32ce763..faf3017 100644
> --- a/arch/powerpc/sysdev/ppc4xx_pci.h
> +++ b/arch/powerpc/sysdev/ppc4xx_pci.h
> @@ -441,6 +441,7 @@
> =A0/*
> =A0* Config space register offsets
> =A0*/
> +#define PECFG_ECDEVCTL =A0 =A0 =A0 =A0 0x060
> =A0#define PECFG_ECRTCTL =A0 =A0 =A0 =A0 =A00x074
>
> =A0#define PECFG_BAR0LMPA =A0 =A0 =A0 =A0 0x210
> @@ -448,6 +449,7 @@
> =A0#define PECFG_BAR1MPA =A0 =A0 =A0 =A0 =A00x218
> =A0#define PECFG_BAR2LMPA =A0 =A0 =A0 =A0 0x220
> =A0#define PECFG_BAR2HMPA =A0 =A0 =A0 =A0 0x224
> +#define PECFG_ECDEVCAPPA =A0 =A0 =A0 0x25c
>
> =A0#define PECFG_PIMEN =A0 =A0 =A0 =A0 =A0 =A00x33c
> =A0#define PECFG_PIM0LAL =A0 =A0 =A0 =A0 =A00x340
> @@ -494,5 +496,7 @@ enum
> =A0 =A0 =A0 =A0LNKW_X8 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 0x8
> =A0};
>
> +/* Timout for reset phy */
> +#define PCIE_PHY_RESET_TIMEOUT 10

Is this value applicable to all the 44x devices with PCI-e?

josh

^ permalink raw reply

* Please pull 'next' branch of 4xx tree
From: Josh Boyer @ 2011-12-20 16:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Artem Bityutskiy, linuxppc-dev

Hi Ben,

This is the relocatable series from Suzie that has been brewing for quite a
while.  I also included a small fix for currituck that I hit when I was building
various kernels.

NOTE: To build anything config that includes the NDFC driver, you need the fix
Tony posted and is in linux-next to the ndfc driver.  I didn't include it here
since it is already queued up in the MTD tree.

josh

The following changes since commit 3f53638c805f75989f4b4be07efcfd173cdd5e2d:

  powerpc: Fix old bug in prom_init setting of the color (2011-12-19
14:41:25 +1100)

are available in the git repository at:
  git://git.infradead.org/users/jwboyer/powerpc-4xx.git next

Josh Boyer (1):
      powerpc/44x: Fix build error on currituck platform

Suzuki Poulose (7):
      powerpc: Rename mapping based RELOCATABLE to DYNAMIC_MEMSTART for BookE
      powerpc/44x: Enable DYNAMIC_MEMSTART for 440x
      powerpc: Process dynamic relocations for kernel
      powerpc: Define virtual-physical translations for RELOCATABLE
      powerpc/44x: Enable CONFIG_RELOCATABLE for PPC44x
      powerpc/44x: Enable CRASH_DUMP for 440x
      powerpc/boot: Change the load address for the wrapper to fit the kernel

 arch/powerpc/Kconfig                          |   45 +++++-
 arch/powerpc/Makefile                         |    6 +-
 arch/powerpc/boot/wrapper                     |   20 +++
 arch/powerpc/configs/44x/iss476-smp_defconfig |    3 +-
 arch/powerpc/include/asm/kdump.h              |    4 +-
 arch/powerpc/include/asm/page.h               |   89 ++++++++++-
 arch/powerpc/kernel/Makefile                  |    2 +
 arch/powerpc/kernel/crash_dump.c              |    4 +-
 arch/powerpc/kernel/head_44x.S                |  105 +++++++++++++
 arch/powerpc/kernel/head_fsl_booke.S          |    2 +-
 arch/powerpc/kernel/machine_kexec.c           |    2 +-
 arch/powerpc/kernel/prom_init.c               |    2 +-
 arch/powerpc/kernel/reloc_32.S                |  208 +++++++++++++++++++++++++
 arch/powerpc/kernel/vmlinux.lds.S             |    8 +-
 arch/powerpc/mm/44x_mmu.c                     |    2 +-
 arch/powerpc/mm/init_32.c                     |    7 +
 arch/powerpc/platforms/44x/currituck.c        |    2 +-
 arch/powerpc/relocs_check.pl                  |   14 ++-
 18 files changed, 496 insertions(+), 29 deletions(-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox