LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap
From: Yasuaki Ishimatsu @ 2012-07-11  5:52 UTC (permalink / raw)
  To: Wen Congyang
  Cc: len.brown, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <4FFD09D5.8010605@cn.fujitsu.com>

2012/07/11 14:06, Wen Congyang wrote:
Hi Wen,

> At 07/09/2012 06:33 PM, Yasuaki Ishimatsu Wrote:
>> I don't think that all pages of virtual mapping in removed memory can be
>> freed, since page which type is MIX_SECTION_INFO is difficult to free.
>> So, the patch only frees page which type is SECTION_INFO at first.
>>
>> CC: David Rientjes <rientjes@google.com>
>> CC: Jiang Liu <liuj97@gmail.com>
>> CC: Len Brown <len.brown@intel.com>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Christoph Lameter <cl@linux.com>
>> Cc: Minchan Kim <minchan.kim@gmail.com>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> CC: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> ---
>>   arch/x86/mm/init_64.c |   91 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/mm.h    |    2 +
>>   mm/memory_hotplug.c   |    5 ++
>>   mm/sparse.c           |    5 +-
>>   4 files changed, 101 insertions(+), 2 deletions(-)
>>
>> Index: linux-3.5-rc4/include/linux/mm.h
>> ===================================================================
>> --- linux-3.5-rc4.orig/include/linux/mm.h	2012-07-03 14:22:18.530011567 +0900
>> +++ linux-3.5-rc4/include/linux/mm.h	2012-07-03 14:22:20.999983872 +0900
>> @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
>>   void vmemmap_populate_print_last(void);
>>   void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
>>   				  unsigned long size);
>> +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
>> +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
>>
>>   enum mf_flags {
>>   	MF_COUNT_INCREASED = 1 << 0,
>> Index: linux-3.5-rc4/mm/sparse.c
>> ===================================================================
>> --- linux-3.5-rc4.orig/mm/sparse.c	2012-07-03 14:21:45.071429805 +0900
>> +++ linux-3.5-rc4/mm/sparse.c	2012-07-03 14:22:21.000983767 +0900
>> @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
>>   	/* This will make the necessary allocations eventually. */
>>   	return sparse_mem_map_populate(pnum, nid);
>>   }
>> -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
>> +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
>>   {
>> -	return; /* XXX: Not implemented yet */
>> +	vmemmap_kfree(page, nr_pages);
> 
> Hmm, I think you try to free the memory allocated in kmalloc_section_memmap().

Yes.

> 
>>   }
>>   static void free_map_bootmem(struct page *page, unsigned long nr_pages)
>>   {
>> +	vmemmap_free_bootmem(page, nr_pages);
>>   }
> 
> Hmm, which function is the memory you try to free allocated in?

The function try to free memory allocated from bootmem. The memory has
been registered by get_page_bootmem(). So we can free the memory by
put_page_bootmem().

> 
>>   #else
>>   static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
>> Index: linux-3.5-rc4/arch/x86/mm/init_64.c
>> ===================================================================
>> --- linux-3.5-rc4.orig/arch/x86/mm/init_64.c	2012-07-03 14:22:18.538011465 +0900
>> +++ linux-3.5-rc4/arch/x86/mm/init_64.c	2012-07-03 14:22:21.007983103 +0900
>> @@ -978,6 +978,97 @@ vmemmap_populate(struct page *start_page
>>   	return 0;
>>   }
>>
>> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
>> +				      struct page **pp)
>> +{
>> +	pgd_t *pgd;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	pte_t *pte;
>> +	unsigned long next;
>> +
>> +	*pp = NULL;
>> +
>> +	pgd = pgd_offset_k(addr);
>> +	if (pgd_none(*pgd))
>> +		return (addr + PAGE_SIZE) & PAGE_MASK;
> 
> Hmm, why not goto next pgd?

Does it mean "return (addr + PGDIR_SIZE) & PGDIR_MASK"?

> 
>> +
>> +	pud = pud_offset(pgd, addr);
>> +	if (pud_none(*pud))
>> +		return (addr + PAGE_SIZE) & PAGE_MASK;
>> +
>> +	if (!cpu_has_pse) {
>> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
>> +		pmd = pmd_offset(pud, addr);
>> +		if (pmd_none(*pmd))
>> +			return next;
>> +
>> +		pte = pte_offset_kernel(pmd, addr);
>> +		if (pte_none(*pte))
>> +			return next;
>> +
>> +		*pp = pte_page(*pte);
>> +		pte_clear(&init_mm, addr, pte);
> 
> I think you should flush tlb here.

Thanks, I'll update it.

> 
>> +	} else {
>> +		next = pmd_addr_end(addr, end);
>> +
>> +		pmd = pmd_offset(pud, addr);
>> +		if (pmd_none(*pmd))
>> +			return next;
>> +
>> +		*pp = pmd_page(*pmd);
>> +		pmd_clear(pmd);
>> +	}
>> +
>> +	return next;
>> +}
>> +
>> +void __meminit
>> +vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +	unsigned long addr = (unsigned long)memmap;
>> +	unsigned long end = (unsigned long)(memmap + nr_pages);
>> +	unsigned long next;
>> +	unsigned int order;
>> +	struct page *page;
>> +
>> +	for (; addr < end; addr = next) {
>> +		page = NULL;
>> +		next = find_and_clear_pte_page(addr, end, &page);
>> +		if (!page)
>> +			continue;
>> +
>> +		if (is_vmalloc_addr(page_address(page)))
>> +			vfree(page_address(page));
> 
> Hmm, the memory is allocated in vmemmap_alloc_block(), and the address
> can not be vmalloc address.

Does it mean the if sentence is unnecessary?

> 
>> +		else {
>> +			order = next - addr;
>> +			free_pages((unsigned long)page_address(page),
>> +				   get_order(order));
> 
> OOPS. I think we cannot free pages here.
> 
> sizeof(struct page) is less than PAGE_SIZE. We store more than one struct
> page in the same page. If you free it here while the other struct page
> is in use, it is very dangerous.

The memory has page structures for hot-removed memory. So nobody is using
these pages, since the hot-removed memory has been offlined.

>> +		}
>> +	}
>> +}
>> +
>> +void __meminit
>> +vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +	unsigned long addr = (unsigned long)memmap;
>> +	unsigned long end = (unsigned long)(memmap + nr_pages);
>> +	unsigned long next;
>> +	struct page *page;
>> +	unsigned long magic;
>> +
>> +	for (; addr < end; addr = next) {
>> +		page = NULL;
>> +		next = find_and_clear_pte_page(addr, end, &page);
>> +		if (!page)
>> +			continue;
>> +
>> +		magic = (unsigned long) page->lru.next;
>> +		if (magic == SECTION_INFO)
>> +			put_page_bootmem(page);
>> +	}
>> +}
>> +
>>   void __meminit
>>   register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page,
>>   			     unsigned long size)
>> Index: linux-3.5-rc4/mm/memory_hotplug.c
>> ===================================================================
>> --- linux-3.5-rc4.orig/mm/memory_hotplug.c	2012-07-03 14:22:18.522011667 +0900
>> +++ linux-3.5-rc4/mm/memory_hotplug.c	2012-07-03 14:22:21.012982694 +0900
>> @@ -303,6 +303,8 @@ static int __meminit __add_section(int n
>>   #ifdef CONFIG_SPARSEMEM_VMEMMAP
> 
> I think this line can be removed now.

I'll update it.

Thanks,
Yasuaki Ishimatsu

> 
> Thanks
> Wen Congyang
> 
>>   static int __remove_section(struct zone *zone, struct mem_section *ms)
>>   {
>> +	unsigned long flags;
>> +	struct pglist_data *pgdat = zone->zone_pgdat;
>>   	int ret;
>>
>>   	if (!valid_section(ms))
>> @@ -310,6 +312,9 @@ static int __remove_section(struct zone
>>
>>   	ret = unregister_memory_section(ms);
>>
>> +	pgdat_resize_lock(pgdat, &flags);
>> +	sparse_remove_one_section(zone, ms);
>> +	pgdat_resize_unlock(pgdat, &flags);
>>   	return ret;
>>   }
>>   #else
>>
>>
> 

^ permalink raw reply

* Re: [PATCH] [powerpc] Export memory limit via device tree
From: Benjamin Herrenschmidt @ 2012-07-11  5:36 UTC (permalink / raw)
  To: Suzuki K. Poulose; +Cc: mahesh, linuxppc-dev, linux-kernel
In-Reply-To: <20120702114855.22333.95335.stgit@suzukikp.in.ibm.com>

> diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c
> index c957b12..0c9695d 100644
> --- a/arch/powerpc/kernel/machine_kexec.c
> +++ b/arch/powerpc/kernel/machine_kexec.c
> @@ -207,6 +207,12 @@ static struct property crashk_size_prop = {
>  	.value = &crashk_size,
>  };
>  
> +static struct property memory_limit_prop = {
> +	.name = "linux,memory-limit",
> +	.length = sizeof(phys_addr_t),
> +	.value = &memory_limit,
> +};
> +

AFAIK. phys_addr_t can change size, so instead make it point to a known
fixes size quantity (a u64).

> +
> +	/* memory-limit is needed for constructing the crash regions */
> +	prop = of_find_property(node, memory_limit_prop.name, NULL);
> +	if (prop)
> +		prom_remove_property(node, prop);
> +
> +	if (memory_limit)
> +		prom_add_property(node, &memory_limit_prop);
> +

There's a patch floating around making prom_update_property properly
handle both pre-existing and non-pre-existing props, you should probably
base yourself on top of it. I'm about to stick that patch in powerpc
-next

Cheers,
Ben.

^ permalink raw reply

* Re: [RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap
From: Wen Congyang @ 2012-07-11  5:06 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: len.brown, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <4FFAB37F.1060105@jp.fujitsu.com>

At 07/09/2012 06:33 PM, Yasuaki Ishimatsu Wrote:
> I don't think that all pages of virtual mapping in removed memory can be
> freed, since page which type is MIX_SECTION_INFO is difficult to free.
> So, the patch only frees page which type is SECTION_INFO at first.
> 
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> ---
>  arch/x86/mm/init_64.c |   91 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/mm.h    |    2 +
>  mm/memory_hotplug.c   |    5 ++
>  mm/sparse.c           |    5 +-
>  4 files changed, 101 insertions(+), 2 deletions(-)
> 
> Index: linux-3.5-rc4/include/linux/mm.h
> ===================================================================
> --- linux-3.5-rc4.orig/include/linux/mm.h	2012-07-03 14:22:18.530011567 +0900
> +++ linux-3.5-rc4/include/linux/mm.h	2012-07-03 14:22:20.999983872 +0900
> @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
>  void vmemmap_populate_print_last(void);
>  void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
>  				  unsigned long size);
> +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
> +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
> 
>  enum mf_flags {
>  	MF_COUNT_INCREASED = 1 << 0,
> Index: linux-3.5-rc4/mm/sparse.c
> ===================================================================
> --- linux-3.5-rc4.orig/mm/sparse.c	2012-07-03 14:21:45.071429805 +0900
> +++ linux-3.5-rc4/mm/sparse.c	2012-07-03 14:22:21.000983767 +0900
> @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
>  	/* This will make the necessary allocations eventually. */
>  	return sparse_mem_map_populate(pnum, nid);
>  }
> -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
> +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
>  {
> -	return; /* XXX: Not implemented yet */
> +	vmemmap_kfree(page, nr_pages);

Hmm, I think you try to free the memory allocated in kmalloc_section_memmap().

>  }
>  static void free_map_bootmem(struct page *page, unsigned long nr_pages)
>  {
> +	vmemmap_free_bootmem(page, nr_pages);
>  }

Hmm, which function is the memory you try to free allocated in?

>  #else
>  static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
> Index: linux-3.5-rc4/arch/x86/mm/init_64.c
> ===================================================================
> --- linux-3.5-rc4.orig/arch/x86/mm/init_64.c	2012-07-03 14:22:18.538011465 +0900
> +++ linux-3.5-rc4/arch/x86/mm/init_64.c	2012-07-03 14:22:21.007983103 +0900
> @@ -978,6 +978,97 @@ vmemmap_populate(struct page *start_page
>  	return 0;
>  }
> 
> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
> +				      struct page **pp)
> +{
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +	unsigned long next;
> +
> +	*pp = NULL;
> +
> +	pgd = pgd_offset_k(addr);
> +	if (pgd_none(*pgd))
> +		return (addr + PAGE_SIZE) & PAGE_MASK;

Hmm, why not goto next pgd?

> +
> +	pud = pud_offset(pgd, addr);
> +	if (pud_none(*pud))
> +		return (addr + PAGE_SIZE) & PAGE_MASK;
> +
> +	if (!cpu_has_pse) {
> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
> +		pmd = pmd_offset(pud, addr);
> +		if (pmd_none(*pmd))
> +			return next;
> +
> +		pte = pte_offset_kernel(pmd, addr);
> +		if (pte_none(*pte))
> +			return next;
> +
> +		*pp = pte_page(*pte);
> +		pte_clear(&init_mm, addr, pte);

I think you should flush tlb here.

> +	} else {
> +		next = pmd_addr_end(addr, end);
> +
> +		pmd = pmd_offset(pud, addr);
> +		if (pmd_none(*pmd))
> +			return next;
> +
> +		*pp = pmd_page(*pmd);
> +		pmd_clear(pmd);
> +	}
> +
> +	return next;
> +}
> +
> +void __meminit
> +vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +	unsigned long addr = (unsigned long)memmap;
> +	unsigned long end = (unsigned long)(memmap + nr_pages);
> +	unsigned long next;
> +	unsigned int order;
> +	struct page *page;
> +
> +	for (; addr < end; addr = next) {
> +		page = NULL;
> +		next = find_and_clear_pte_page(addr, end, &page);
> +		if (!page)
> +			continue;
> +
> +		if (is_vmalloc_addr(page_address(page)))
> +			vfree(page_address(page));

Hmm, the memory is allocated in vmemmap_alloc_block(), and the address
can not be vmalloc address.

> +		else {
> +			order = next - addr;
> +			free_pages((unsigned long)page_address(page),
> +				   get_order(order));

OOPS. I think we cannot free pages here.

sizeof(struct page) is less than PAGE_SIZE. We store more than one struct
page in the same page. If you free it here while the other struct page
is in use, it is very dangerous.

> +		}
> +	}
> +}
> +
> +void __meminit
> +vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +	unsigned long addr = (unsigned long)memmap;
> +	unsigned long end = (unsigned long)(memmap + nr_pages);
> +	unsigned long next;
> +	struct page *page;
> +	unsigned long magic;
> +
> +	for (; addr < end; addr = next) {
> +		page = NULL;
> +		next = find_and_clear_pte_page(addr, end, &page);
> +		if (!page)
> +			continue;
> +
> +		magic = (unsigned long) page->lru.next;
> +		if (magic == SECTION_INFO)
> +			put_page_bootmem(page);
> +	}
> +}
> +
>  void __meminit
>  register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page,
>  			     unsigned long size)
> Index: linux-3.5-rc4/mm/memory_hotplug.c
> ===================================================================
> --- linux-3.5-rc4.orig/mm/memory_hotplug.c	2012-07-03 14:22:18.522011667 +0900
> +++ linux-3.5-rc4/mm/memory_hotplug.c	2012-07-03 14:22:21.012982694 +0900
> @@ -303,6 +303,8 @@ static int __meminit __add_section(int n
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP

I think this line can be removed now.

Thanks
Wen Congyang

>  static int __remove_section(struct zone *zone, struct mem_section *ms)
>  {
> +	unsigned long flags;
> +	struct pglist_data *pgdat = zone->zone_pgdat;
>  	int ret;
> 
>  	if (!valid_section(ms))
> @@ -310,6 +312,9 @@ static int __remove_section(struct zone
> 
>  	ret = unregister_memory_section(ms);
> 
> +	pgdat_resize_lock(pgdat, &flags);
> +	sparse_remove_one_section(zone, ms);
> +	pgdat_resize_unlock(pgdat, &flags);
>  	return ret;
>  }
>  #else
> 
> 

^ permalink raw reply

* Re: [PATCH] Using alloc_coherent for caam job rings
From: Herbert Xu @ 2012-07-11  3:24 UTC (permalink / raw)
  To: Kim Phillips; +Cc: Bharat Bhushan, linuxppc-dev, linux-crypto, Bharat Bhushan
In-Reply-To: <20120627143411.b33d3ddab40e67b511297dcd@freescale.com>

On Wed, Jun 27, 2012 at 07:34:11PM +0000, Kim Phillips wrote:
> On Wed, 27 Jun 2012 10:58:32 +0530
> Bharat Bhushan <r65777@freescale.com> wrote:
> 
> > This resolves the Linux boot crash issue when "swiotlb=force" is set
> > in bootargs on systems which have memory more than 4G.
> 
> Acked-by: Kim Phillips <kim.phillips@freescale.com>

Patch applied.  Thanks!
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* RE: [PATCH 1/4] Talitos: move the data structure into header file
From: Liu Qiang-B32616 @ 2012-07-11  3:13 UTC (permalink / raw)
  To: Phillips Kim-R1AAHA
  Cc: Li Yang-R58472, Geanta Neag Horia Ioan-B05471, Herbert Xu,
	linux-crypto@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	David S. Miller
In-Reply-To: <20120710191107.28347d94783d8c001c67ce57@freescale.com>

> -----Original Message-----
> From: Phillips Kim-R1AAHA
> Sent: Wednesday, July 11, 2012 8:11 AM
> To: Liu Qiang-B32616
> Cc: linux-crypto@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Li Yang-
> R58472; Herbert Xu; David S. Miller; Geanta Neag Horia Ioan-B05471
> Subject: Re: [PATCH 1/4] Talitos: move the data structure into header
> file
>=20
> On Tue, 10 Jul 2012 13:56:46 +0800
> Qiang Liu <qiang.liu@freescale.com> wrote:
>=20
> > Move the declaration of talitos data structure into talitos.h.
> >
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: David S. Miller <davem@davemloft.net>
> > Signed-off-by: Qiang Liu <qiang.liu@freescale.com>
> > ---
>=20
> this patch has already been submitted [1].
>=20
> Subsequent patches in this series also don't apply cleanly:  can
> you rebase onto [2], which is based on Herbert's cryptodev tree and
> contain's Horia's four patches, and re-send?
Kim, Thanks for your note, I will rebase the cryptodev tree and resend
the patch again.

>=20
> Also note that upstream talitos does not yet contain NAPI support
> [3].
Thanks.
>=20
> Thanks,
>=20
> Kim
>=20
> [1] http://www.mail-archive.com/linux-
> crypto@vger.kernel.org/msg07299.html
> [2] git://git.freescale.com/crypto/cryptodev.git
> [3] http://www.mail-archive.com/linux-
> crypto@vger.kernel.org/msg07289.html

^ permalink raw reply

* Re: [RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory
From: Wen Congyang @ 2012-07-11  2:24 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: len.brown, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <4FFCDC6A.9070704@cn.fujitsu.com>

At 07/11/2012 09:52 AM, Wen Congyang Wrote:
> At 07/09/2012 06:21 PM, Yasuaki Ishimatsu Wrote:
>> This patch series aims to support physical memory hot-remove.
>>
>>   [RFC PATCH v3 1/13] memory-hotplug : rename remove_memory to offline_memory
>>   [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
>>   [RFC PATCH v3 3/13] memory-hotplug : unify argument of firmware_map_add_early/hotplug
>>   [RFC PATCH v3 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs
>>   [RFC PATCH v3 5/13] memory-hotplug : does not release memory region in PAGES_PER_SECTION chunks
>>   [RFC PATCH v3 6/13] memory-hotplug : add memory_block_release
>>   [RFC PATCH v3 7/13] memory-hotplug : remove_memory calls __remove_pages
>>   [RFC PATCH v3 8/13] memory-hotplug : check page type in get_page_bootmem
>>   [RFC PATCH v3 9/13] memory-hotplug : move register_page_bootmem_info_node and put_page_bootmem for
>> sparse-vmemmap
>>   [RFC PATCH v3 10/13] memory-hotplug : implement register_page_bootmem_info_section of sparse-vmemmap
>>   [RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap
>>   [RFC PATCH v3 12/13] memory-hotplug : add node_device_release
>>   [RFC PATCH v3 13/13] memory-hotplug : remove sysfs file of node
>>
>> Even if you apply these patches, you cannot remove the physical memory
>> completely since these patches are still under development. I want you to
>> cooperate to improve the physical memory hot-remove. So please review these
>> patches and give your comment/idea.
>>
>> The patches can free/remove following things:
>>
>>   - acpi_memory_info                          : [RFC PATCH 2/13]
>>   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 4/13]
>>   - iomem_resource                            : [RFC PATCH 5/13]
>>   - mem_section and related sysfs files       : [RFC PATCH 6-11/13]
>>   - node and related sysfs files              : [RFC PATCH 12-13/13]
>>
>> The patches cannot do following things yet:
>>
>>   - page table of removed memory
>>
>> If you find lack of function for physical memory hot-remove, please let me
>> know.
>>
>> change log of v3:
>>  * rebase to 3.5.0-rc6
>>
>>  [RFC PATCH v2 2/13]
>>    * remove extra kobject_put()
>>
>>    * The patch was commented by Wen. Wen's comment is
>>      "acpi_memory_device_remove() should ignore a return value of
>>      remove_memory() since caller does not care the return value".
>>      But I did not change it since I think caller should care the
>>      return value. And I am trying to fix it as follow:
>>
>>      https://lkml.org/lkml/2012/7/5/624
> 
> acpi_memory_device_remove() will be called not only when we write
> 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. When we unbind it
> from the driver or remove the module acpi_memhotplug, this function
> will be called too.
> 
> I will check whether your patch can work for these two cases.

I have checked it, and I think your patch can not work for these 2 cases.

When we unbind the device from the driver(write device name to
/sys/bus/acpi/drivers/acpi_memhotplug/unbind), driver_unbind()
will be called. This function does not care the return value.

When we remove the module acpi_memhotplug, acpi_memory_device_exit()
will be called. This function does not care the return value too.

I don't know whether there are some other cases that acpi_memory_device_remove()
will be called.

Thanks
Wen Congyang


> 
> Thanks
> Wen Congyang
> 
>>
>>  [RFC PATCH v2 4/13]
>>    * remove a firmware_memmap_entry allocated by kzmalloc()
>>
>> change log of v2:
>>  [RFC PATCH v2 2/13]
>>    * check whether memory block is offline or not before calling offline_memory()
>>    * check whether section is valid or not in is_memblk_offline()
>>    * call kobject_put() for each memory_block in is_memblk_offline()
>>
>>  [RFC PATCH v2 3/13]
>>    * unify the end argument of firmware_map_add_early/hotplug
>>
>>  [RFC PATCH v2 4/13]
>>    * add release_firmware_map_entry() for freeing firmware_map_entry
>>
>>  [RFC PATCH v2 6/13]
>>   * add release_memory_block() for freeing memory_block
>>
>>  [RFC PATCH v2 11/13]
>>   * fix wrong arguments of free_pages()
>>
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-memory.c |   16 +-
>>  arch/x86/mm/init_64.c                           |  144 ++++++++++++++++++++++++
>>  drivers/acpi/acpi_memhotplug.c                  |   28 ++++
>>  drivers/base/memory.c                           |   54 ++++++++-
>>  drivers/base/node.c                             |    7 +
>>  drivers/firmware/memmap.c                       |   78 ++++++++++++-
>>  include/linux/firmware-map.h                    |    6 +
>>  include/linux/memory.h                          |    5
>>  include/linux/memory_hotplug.h                  |   17 --
>>  include/linux/mm.h                              |    5
>>  mm/memory_hotplug.c                             |   98 ++++++++++++----
>>  mm/sparse.c                                     |    5
>>  12 files changed, 414 insertions(+), 49 deletions(-)
>>
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* RE: [linuxppc-release] [PATCH 4/4] Talitos: fix the issue of dma memory leak
From: Liu Qiang-B32616 @ 2012-07-11  2:30 UTC (permalink / raw)
  To: Tabi Timur-B04825
  Cc: Herbert Xu, David S. Miller, linuxppc-dev@lists.ozlabs.org,
	linux-crypto@vger.kernel.org, Li Yang-R58472
In-Reply-To: <4FFC9DE1.5060502@freescale.com>

> -----Original Message-----
> From: Tabi Timur-B04825
> Sent: Wednesday, July 11, 2012 5:26 AM
> To: Liu Qiang-B32616
> Cc: linux-crypto@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Herbert
> Xu; Li Yang-R58472; David S. Miller
> Subject: Re: [linuxppc-release] [PATCH 4/4] Talitos: fix the issue of dma
> memory leak
>=20
> Qiang Liu wrote:
> > An error will be happened when test with mass data:
>=20
> Please don't use the phrase "fix the issue" in patch summaries.  It's
> redundant.
>=20
> This patch should be titled,
>=20
> "drivers/crypto: fix memory leak in Talitos driver"
>=20
> > diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index
> > 81f8497..a7da48c 100644
> > --- a/drivers/crypto/talitos.c
> > +++ b/drivers/crypto/talitos.c
> > @@ -264,7 +264,7 @@ static void flush_channel(struct device *dev, int
> ch, int error, int reset_ch)
> >  			else
> >  				status =3D error;
> >
> > -		dma_unmap_single(dev, request->dma_desc,
> > +		 dma_unmap_single(priv->dev, request->dma_desc,
>=20
> You have an indentation problem here.
My fault, I will correct it and resend again. Thanks.

>=20
> --
> Timur Tabi
> Linux kernel developer at Freescale

^ permalink raw reply

* RE: [PATCH 3/4] fsl-dma: support attribute of DMA_MEMORY when async_tx enabled
From: Liu Qiang-B32616 @ 2012-07-11  2:27 UTC (permalink / raw)
  To: Dan Williams
  Cc: Phillips Kim-R1AAHA, Vinod Koul, linuxppc-dev@lists.ozlabs.org,
	linux-crypto@vger.kernel.org, Li Yang-R58472
In-Reply-To: <CABE8wws80A3ne7ay+i5+WeuAnZcwLQoyM4RC7B8UMoHJknGLSQ@mail.gmail.com>

> -----Original Message-----
> From: Dan Williams [mailto:dan.j.williams@intel.com]
> Sent: Wednesday, July 11, 2012 3:39 AM
> To: Liu Qiang-B32616
> Cc: linux-crypto@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Li Yang-
> R58472; Phillips Kim-R1AAHA; Vinod Koul
> Subject: Re: [PATCH 3/4] fsl-dma: support attribute of DMA_MEMORY when
> async_tx enabled
>=20
> On Mon, Jul 9, 2012 at 10:59 PM, Qiang Liu <qiang.liu@freescale.com>
> wrote:
> > - delete attribute of DMA_INTERRUPT because fsl-dma doesn't support
> > this function, exception will be thrown if talitos is used to compute
> xor
> > at the same time;
> > - change the release process of dma descriptor for avoiding exception
> when
> > enable config NET_DMA, release dma descriptor from 1st to last second,
> the
> > last descriptor which is reserved in current descriptor register may
> not be
> > completed, race condition will be raised if free current descriptor;
> > - use spin_lock_bh to instead of spin_lock_irqsave for improving
> performance;
> >
> > A race condition which is raised when use both of talitos and dmaengine
> to
> > offload xor is because napi scheduler will sync all pending requests in
> dma
> > channels, it will affect the process of raid operations. The descriptor
> is
> > freed which is submitted just now, but async_tx must check whether this
> depend
> > tx descriptor is acked, there are poison contents in the invalid
> address,
> > then BUG_ON() is thrown, so this descriptor will be freed in the next
> time.
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Vinod Koul <vinod.koul@intel.com>
> > Cc: Li Yang <leoli@freescale.com>
> > Signed-off-by: Qiang Liu <qiang.liu@freescale.com>
> > ---
>=20
> From the description this sounds like 3 or 4 patches.  Can you split it
> up?
I will split this patch according to my description and resend again. Thank=
s.=20

^ permalink raw reply

* RE: [PATCH 1/2] powerpc/mpc8572ds: Fix eTSEC is not available on core1 of AMP boot issue
From: Jia Hongtao-B38951 @ 2012-07-11  2:07 UTC (permalink / raw)
  To: Wood Scott-B07421; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <4FFC7139.3080303@freescale.com>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogV29vZCBTY290dC1CMDc0
MjENCj4gU2VudDogV2VkbmVzZGF5LCBKdWx5IDExLCAyMDEyIDI6MTUgQU0NCj4gVG86IEppYSBI
b25ndGFvLUIzODk1MQ0KPiBDYzogbGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmc7IGdhbGFr
QGtlcm5lbC5jcmFzaGluZy5vcmcNCj4gU3ViamVjdDogUmU6IFtQQVRDSCAxLzJdIHBvd2VycGMv
bXBjODU3MmRzOiBGaXggZVRTRUMgaXMgbm90IGF2YWlsYWJsZSBvbg0KPiBjb3JlMSBvZiBBTVAg
Ym9vdCBpc3N1ZQ0KPiANCj4gT24gMDcvMTAvMjAxMiAwMTowOCBBTSwgSmlhIEhvbmd0YW8gd3Jv
dGU6DQo+ID4gVGhlIGlzc3VlIGxvZyBvbiBjb3JlMSBpczoNCj4gPiByb290QG1wYzg1NzJkczp+
IyBpZmNvbmZpZyBldGgwIDEwLjE5Mi4yMDguMjQ0DQo+ID4gbmV0IGV0aDA6IGNvdWxkIG5vdCBh
dHRhY2ggdG8gUEhZDQo+ID4gU0lPQ1NJRkZMQUdTOiBObyBzdWNoIGRldmljZQ0KPiA+DQo+ID4g
VG8gYXR0YWNoIFBIWSBub2RlIG1kaW9AMjQ1MjAgc2hvdWxkIG5vdCBiZSBkaXNhYmxlZCBpbiBk
dHMgb2YgY29yZTEuDQo+ID4gQmVjYXVzZSBhbGwgUEhZcyBhcmUgY29udHJvbGxlZCB0aHJvdWdo
IHRoaXMgbm9kZSBhcyBmb2xsb3dzOg0KPiANCj4gU28geW91IGdyYW50IGl0IHRvIGJvdGggcGFy
dGl0aW9ucz8gIEhvdyBkbyB5b3UgZGVhbCB3aXRoIHN5bmNocm9uaXphdGlvbj8NCj4gDQo+IC1T
Y290dA0KDQpQSFkgbm9kZXMgYXJlIG9ubHkgdXNlZCBieSBldGhlcm5ldC4gRWFjaCBldGhlcm5l
dCBpcyB1c2VkIGJ5IG9ubHkgb25lIHBhcnRpdGlvbg0KKGRpc2FibGVkIGluIHRoZSBvdGhlciBw
YXJ0aXRpb24pLiBTbyBJIHRoaW5rIHRoZXJlIGlzIG5vIHN5bmNocm9uaXphdGlvbiBpc3N1ZS4N
Cg0KVGhhbmtzLg0KLUppYSBIb25ndGFvLiANCg==

^ permalink raw reply

* Re: [RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory
From: Wen Congyang @ 2012-07-11  1:52 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: len.brown, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <4FFAB0A2.8070304@jp.fujitsu.com>

At 07/09/2012 06:21 PM, Yasuaki Ishimatsu Wrote:
> This patch series aims to support physical memory hot-remove.
> 
>   [RFC PATCH v3 1/13] memory-hotplug : rename remove_memory to offline_memory
>   [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
>   [RFC PATCH v3 3/13] memory-hotplug : unify argument of firmware_map_add_early/hotplug
>   [RFC PATCH v3 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs
>   [RFC PATCH v3 5/13] memory-hotplug : does not release memory region in PAGES_PER_SECTION chunks
>   [RFC PATCH v3 6/13] memory-hotplug : add memory_block_release
>   [RFC PATCH v3 7/13] memory-hotplug : remove_memory calls __remove_pages
>   [RFC PATCH v3 8/13] memory-hotplug : check page type in get_page_bootmem
>   [RFC PATCH v3 9/13] memory-hotplug : move register_page_bootmem_info_node and put_page_bootmem for
> sparse-vmemmap
>   [RFC PATCH v3 10/13] memory-hotplug : implement register_page_bootmem_info_section of sparse-vmemmap
>   [RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap
>   [RFC PATCH v3 12/13] memory-hotplug : add node_device_release
>   [RFC PATCH v3 13/13] memory-hotplug : remove sysfs file of node
> 
> Even if you apply these patches, you cannot remove the physical memory
> completely since these patches are still under development. I want you to
> cooperate to improve the physical memory hot-remove. So please review these
> patches and give your comment/idea.
> 
> The patches can free/remove following things:
> 
>   - acpi_memory_info                          : [RFC PATCH 2/13]
>   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 4/13]
>   - iomem_resource                            : [RFC PATCH 5/13]
>   - mem_section and related sysfs files       : [RFC PATCH 6-11/13]
>   - node and related sysfs files              : [RFC PATCH 12-13/13]
> 
> The patches cannot do following things yet:
> 
>   - page table of removed memory
> 
> If you find lack of function for physical memory hot-remove, please let me
> know.
> 
> change log of v3:
>  * rebase to 3.5.0-rc6
> 
>  [RFC PATCH v2 2/13]
>    * remove extra kobject_put()
> 
>    * The patch was commented by Wen. Wen's comment is
>      "acpi_memory_device_remove() should ignore a return value of
>      remove_memory() since caller does not care the return value".
>      But I did not change it since I think caller should care the
>      return value. And I am trying to fix it as follow:
> 
>      https://lkml.org/lkml/2012/7/5/624

acpi_memory_device_remove() will be called not only when we write
1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. When we unbind it
from the driver or remove the module acpi_memhotplug, this function
will be called too.

I will check whether your patch can work for these two cases.

Thanks
Wen Congyang

> 
>  [RFC PATCH v2 4/13]
>    * remove a firmware_memmap_entry allocated by kzmalloc()
> 
> change log of v2:
>  [RFC PATCH v2 2/13]
>    * check whether memory block is offline or not before calling offline_memory()
>    * check whether section is valid or not in is_memblk_offline()
>    * call kobject_put() for each memory_block in is_memblk_offline()
> 
>  [RFC PATCH v2 3/13]
>    * unify the end argument of firmware_map_add_early/hotplug
> 
>  [RFC PATCH v2 4/13]
>    * add release_firmware_map_entry() for freeing firmware_map_entry
> 
>  [RFC PATCH v2 6/13]
>   * add release_memory_block() for freeing memory_block
> 
>  [RFC PATCH v2 11/13]
>   * fix wrong arguments of free_pages()
> 
> ---
>  arch/powerpc/platforms/pseries/hotplug-memory.c |   16 +-
>  arch/x86/mm/init_64.c                           |  144 ++++++++++++++++++++++++
>  drivers/acpi/acpi_memhotplug.c                  |   28 ++++
>  drivers/base/memory.c                           |   54 ++++++++-
>  drivers/base/node.c                             |    7 +
>  drivers/firmware/memmap.c                       |   78 ++++++++++++-
>  include/linux/firmware-map.h                    |    6 +
>  include/linux/memory.h                          |    5
>  include/linux/memory_hotplug.h                  |   17 --
>  include/linux/mm.h                              |    5
>  mm/memory_hotplug.c                             |   98 ++++++++++++----
>  mm/sparse.c                                     |    5
>  12 files changed, 414 insertions(+), 49 deletions(-)
> 
> 

^ permalink raw reply

* Re: [RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory
From: Yasuaki Ishimatsu @ 2012-07-11  0:54 UTC (permalink / raw)
  To: Jiang Liu
  Cc: len.brown, wency, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, Christoph Lameter,
	linuxppc-dev, akpm
In-Reply-To: <4FFCC6F1.5060908@gmail.com>

Hi Jiang,

2012/07/11 9:21, Jiang Liu wrote:
> On 07/11/2012 08:09 AM, Yasuaki Ishimatsu wrote:
>> Hi Jiang,
>>
>> 2012/07/11 1:50, Jiang Liu wrote:
>>> On 07/10/2012 05:58 PM, Yasuaki Ishimatsu wrote:
>>>> Hi Christoph,
>>>>
>>>> 2012/07/10 0:18, Christoph Lameter wrote:
>>>>>
>>>>> On Mon, 9 Jul 2012, Yasuaki Ishimatsu wrote:
>>>>>
>>>>>> Even if you apply these patches, you cannot remove the physical memory
>>>>>> completely since these patches are still under development. I want you to
>>>>>> cooperate to improve the physical memory hot-remove. So please review these
>>>>>> patches and give your comment/idea.
>>>>>
>>>>> Could you at least give a method on how you want to do physical memory
>>>>> removal?
>>>>
>>>> We plan to release a dynamic hardware partitionable system. It will be
>>>> able to hot remove/add a system board which included memory and cpu.
>>>> But as you know, Linux does not support memory hot-remove on x86 box.
>>>> So I try to develop it.
>>>>
>>>> Current plan to hot remove system board is to use container driver.
>>>> Thus I define the system board in ACPI DSDT table as a container device.
>>>> It have supported hot-add a container device. And if container device
>>>> has _EJ0 ACPI method, "eject" file to remove the container device is
>>>> prepared as follow:
>>>>
>>>> # ls -l /sys/bus/acpi/devices/ACPI0004\:01/eject
>>>> --w-------. 1 root root 4096 Jul 10 18:19 /sys/bus/acpi/devices/ACPI0004:01/eject
>>>>
>>>> When I hot-remove the container device, I echo 1 to the file as follow:
>>>>
>>>> #echo 1 > /sys/bus/acpi/devices/ACPI0004\:02/eject
>>>>
>>>> Then acpi_bus_trim() is called. And it calls acpi_memory_device_remove()
>>>> for removing memory device. But the code does not do nothing.
>>>> So I developed the continuation of the function.
>>>>
>>>>> You would have to remove all objects from the range you want to
>>>>> physically remove. That is only possible under special circumstances and
>>>>> with a limited set of objects. Even if you exclusively use ZONE_MOVEABLE
>>>>> you still may get cases where pages are pinned for a long time.
>>>>
>>>> I know it. So my memory hot-remove plan is as follows:
>>>>
>>>> 1. hot-added a system board
>>>>      All memory which included the system board is offline.
>>>>
>>>> 2. online the memory as removable page
>>>>      The function has not supported yet. It is being developed by Lai as follow:
>>>>      http://lkml.indiana.edu/hypermail/linux/kernel/1207.0/01478.html
>>>>      If it is supported, I will be able to create movable memory.
>>>>
>>>> 3. hot-remove the memory by container device's eject file
>>> We have implemented a prototype to do physical node (mem + CPU + IOH) hotplug
>>> for Itanium and is now porting it to x86. But with currently solution, memory
>>> hotplug functionality may cause 10-20% performance decrease because we concentrate
>>> all DMA/Normal memory to the first NUMA node, and all other NUMA nodes only
>>> hosts ZONE_MOVABLE. We are working on solution to minimize the performance
>>> drop now.
>>
>> Thank you for your interesting response.
>>
>> I have a question. How do you move all other NUMA nodes to ZONE_MOVABLE?
>> To use ZONE_MOVABLE, we need to use boot options like kernelcore or movablecore.
>> But it is not enough, since the requested amount is spread evenly throughout
>> all nodes in the system. So I think we do not have way to move all other NUMA
>> node to ZONE_MOVABLE.
> We have modified the ZONE_MOVABLE spreading and bootmem allocation. If the kernelcore
> or movablecore kernel parameters are present, we follow current behavior. If those
> parameter are absent and the platform supports physical hotplug, we will concentrate
> DMA/NORMAL memory to specific nodes.

That's interesting. I want to know more details, if you do not mind.
Current kernel doesn't do the behavior, does it? So I think you have some
patches for changing the behavior. Will you merge these patches into
community kernel?

Thanks,
Yasuaki Ishimatsu

>
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
>>>
>>>>
>>>> Thanks,
>>>> Yasuaki Ishimatsu
>>>>
>>>>>
>>>>> I am not sure that these patches are useful unless we know where you are
>>>>> going with this. If we end up with a situation where we still cannot
>>>>> remove physical memory then this patchset is not helpful.
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>

^ permalink raw reply

* Re: [PATCH] powerpc/booke: Eliminate rfi from exception entry path.
From: Benjamin Herrenschmidt @ 2012-07-11  0:54 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Alexander Graf, Stuart Yoder
In-Reply-To: <4FFCCD2D.20407@freescale.com>

On Tue, 2012-07-10 at 19:47 -0500, Scott Wood wrote:
> On 07/10/2012 07:44 PM, Alexander Graf wrote:
> > 
> > On 11.07.2012, at 02:34, Scott Wood wrote:
> >> +#ifdef CONFIG_BOOKE
> >> +	/*
> >> +	 * We're not changing address space on Book E, and the extra rfi
> >> +	 * can hurt when virtualized without hardware support -- whereas
> >> +	 * mtmsr can be paravirtualized.
> > 
> > We can always paravirtualize RFI as well if it makes sense.
> 
> I'm not sure that's possible.  We thought about it a while back, but
> IIRC the difficulty was not leaving a register clobbered.

Besides mtmsr is slow on real HW as well. Also paravirt as done today
for complex instructions like mtmsr is racy :-) (I already had a chat
about that with Alex a while back, we might want to re-consider what
kind of fix can be done at some point).

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc/booke: Eliminate rfi from exception entry path.
From: Benjamin Herrenschmidt @ 2012-07-11  0:53 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Alexander Graf, Stuart Yoder
In-Reply-To: <4FFCCBAD.40504@freescale.com>

On Tue, 2012-07-10 at 19:41 -0500, Scott Wood wrote:
> On 07/10/2012 07:36 PM, Benjamin Herrenschmidt wrote:
> > On Tue, 2012-07-10 at 19:34 -0500, Scott Wood wrote:
> >> Unlike classic, we don't really need the MSR change to be atomic with the
> >> branch.  This eliminates a trap as a KVM guest (in the absence of
> >> hardware hypervisor extensions), where mtmsr is paravirtualized but rfi
> >> is not.  For a virtualized guest without any paravirtualization, this
> >> eliminates an additional two traps (SRR0/1).
> > 
> > In fact, I wonder, what do we write into the MSR at this point that
> > wasn't already in it in BookE ? RI ? I wonder if we could get away
> > without the mtmsr alltogether...
> 
> Doesn't EE get set there for some exceptions?

It does, tho arguably it shouldn't in most cases :-) I'm happy to turn a
bunch of these into explicit local_irq_enable() in the C code though
which will turn into a wrteei which is more efficient on BookE.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc/booke: Eliminate rfi from exception entry path.
From: Scott Wood @ 2012-07-11  0:47 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, Stuart Yoder
In-Reply-To: <E139BF8E-A72C-4977-8FB9-6EED5803AB9A@suse.de>

On 07/10/2012 07:44 PM, Alexander Graf wrote:
> 
> On 11.07.2012, at 02:34, Scott Wood wrote:
>> +#ifdef CONFIG_BOOKE
>> +	/*
>> +	 * We're not changing address space on Book E, and the extra rfi
>> +	 * can hurt when virtualized without hardware support -- whereas
>> +	 * mtmsr can be paravirtualized.
> 
> We can always paravirtualize RFI as well if it makes sense.

I'm not sure that's possible.  We thought about it a while back, but
IIRC the difficulty was not leaving a register clobbered.

-Scott

^ permalink raw reply

* Re: [PATCH] powerpc/booke: Eliminate rfi from exception entry path.
From: Alexander Graf @ 2012-07-11  0:44 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Stuart Yoder
In-Reply-To: <20120711003454.GA22757@tyr.buserror.net>


On 11.07.2012, at 02:34, Scott Wood wrote:

> Unlike classic, we don't really need the MSR change to be atomic with =
the
> branch.  This eliminates a trap as a KVM guest (in the absence of
> hardware hypervisor extensions), where mtmsr is paravirtualized but =
rfi
> is not.  For a virtualized guest without any paravirtualization, this
> eliminates an additional two traps (SRR0/1).
>=20
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
> arch/powerpc/kernel/entry_32.S |   16 ++++++++++++++++
> 1 files changed, 16 insertions(+), 0 deletions(-)
>=20
> diff --git a/arch/powerpc/kernel/entry_32.S =
b/arch/powerpc/kernel/entry_32.S
> index ba3aeb4..6bb637c 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -193,6 +193,9 @@ transfer_to_handler_cont:
> 	lwz	r11,0(r9)		/* virtual address of handler */
> 	lwz	r9,4(r9)		/* where to go when done */
> #ifdef CONFIG_TRACE_IRQFLAGS
> +#ifdef CONFIG_BOOKE
> +	mtmsr	r10
> +#else
> 	lis	r12,reenable_mmu@h
> 	ori	r12,r12,reenable_mmu@l
> 	mtspr	SPRN_SRR0,r12
> @@ -201,6 +204,7 @@ transfer_to_handler_cont:
> 	RFI
> reenable_mmu:				/* re-enable mmu so we can */
> 	mfmsr	r10
> +#endif /* !CONFIG_BOOKE */
> 	lwz	r12,_MSR(r1)
> 	xor	r10,r10,r12
> 	andi.	r10,r10,MSR_EE		/* Did EE change? */
> @@ -247,11 +251,23 @@ reenable_mmu:				/* =
re-enable mmu so we can */
> 	mtlr	r9
> 	bctr				/* jump to handler */
> #else /* CONFIG_TRACE_IRQFLAGS */
> +#ifdef CONFIG_BOOKE
> +	/*
> +	 * We're not changing address space on Book E, and the extra rfi
> +	 * can hurt when virtualized without hardware support -- whereas
> +	 * mtmsr can be paravirtualized.

We can always paravirtualize RFI as well if it makes sense.


Alex

^ permalink raw reply

* Re: [PATCH] powerpc/booke: Eliminate rfi from exception entry path.
From: Scott Wood @ 2012-07-11  0:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Alexander Graf, Stuart Yoder
In-Reply-To: <1341967010.18850.19.camel@pasglop>

On 07/10/2012 07:36 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2012-07-10 at 19:34 -0500, Scott Wood wrote:
>> Unlike classic, we don't really need the MSR change to be atomic with the
>> branch.  This eliminates a trap as a KVM guest (in the absence of
>> hardware hypervisor extensions), where mtmsr is paravirtualized but rfi
>> is not.  For a virtualized guest without any paravirtualization, this
>> eliminates an additional two traps (SRR0/1).
> 
> In fact, I wonder, what do we write into the MSR at this point that
> wasn't already in it in BookE ? RI ? I wonder if we could get away
> without the mtmsr alltogether...

Doesn't EE get set there for some exceptions?

-Scott

^ permalink raw reply

* Re: [PATCH] powerpc/booke: Eliminate rfi from exception entry path.
From: Benjamin Herrenschmidt @ 2012-07-11  0:36 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Alexander Graf, Stuart Yoder
In-Reply-To: <20120711003454.GA22757@tyr.buserror.net>

On Tue, 2012-07-10 at 19:34 -0500, Scott Wood wrote:
> Unlike classic, we don't really need the MSR change to be atomic with the
> branch.  This eliminates a trap as a KVM guest (in the absence of
> hardware hypervisor extensions), where mtmsr is paravirtualized but rfi
> is not.  For a virtualized guest without any paravirtualization, this
> eliminates an additional two traps (SRR0/1).

In fact, I wonder, what do we write into the MSR at this point that
wasn't already in it in BookE ? RI ? I wonder if we could get away
without the mtmsr alltogether...

Cheers,
Ben.

> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
>  arch/powerpc/kernel/entry_32.S |   16 ++++++++++++++++
>  1 files changed, 16 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index ba3aeb4..6bb637c 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -193,6 +193,9 @@ transfer_to_handler_cont:
>  	lwz	r11,0(r9)		/* virtual address of handler */
>  	lwz	r9,4(r9)		/* where to go when done */
>  #ifdef CONFIG_TRACE_IRQFLAGS
> +#ifdef CONFIG_BOOKE
> +	mtmsr	r10
> +#else
>  	lis	r12,reenable_mmu@h
>  	ori	r12,r12,reenable_mmu@l
>  	mtspr	SPRN_SRR0,r12
> @@ -201,6 +204,7 @@ transfer_to_handler_cont:
>  	RFI
>  reenable_mmu:				/* re-enable mmu so we can */
>  	mfmsr	r10
> +#endif /* !CONFIG_BOOKE */
>  	lwz	r12,_MSR(r1)
>  	xor	r10,r10,r12
>  	andi.	r10,r10,MSR_EE		/* Did EE change? */
> @@ -247,11 +251,23 @@ reenable_mmu:				/* re-enable mmu so we can */
>  	mtlr	r9
>  	bctr				/* jump to handler */
>  #else /* CONFIG_TRACE_IRQFLAGS */
> +#ifdef CONFIG_BOOKE
> +	/*
> +	 * We're not changing address space on Book E, and the extra rfi
> +	 * can hurt when virtualized without hardware support -- whereas
> +	 * mtmsr can be paravirtualized.
> +	 */
> +	mtmsr	r10
> +	mtctr	r11
> +	mtlr	r9
> +	bctr
> +#else
>  	mtspr	SPRN_SRR0,r11
>  	mtspr	SPRN_SRR1,r10
>  	mtlr	r9
>  	SYNC
>  	RFI				/* jump to handler, enable MMU */
> +#endif /* !CONFIG_BOOKE */
>  #endif /* CONFIG_TRACE_IRQFLAGS */
>  
>  #if defined (CONFIG_6xx) || defined(CONFIG_E500)

^ permalink raw reply

* [PATCH] powerpc/booke: Eliminate rfi from exception entry path.
From: Scott Wood @ 2012-07-11  0:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Alexander Graf, Stuart Yoder

Unlike classic, we don't really need the MSR change to be atomic with the
branch.  This eliminates a trap as a KVM guest (in the absence of
hardware hypervisor extensions), where mtmsr is paravirtualized but rfi
is not.  For a virtualized guest without any paravirtualization, this
eliminates an additional two traps (SRR0/1).

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/kernel/entry_32.S |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index ba3aeb4..6bb637c 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -193,6 +193,9 @@ transfer_to_handler_cont:
 	lwz	r11,0(r9)		/* virtual address of handler */
 	lwz	r9,4(r9)		/* where to go when done */
 #ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_BOOKE
+	mtmsr	r10
+#else
 	lis	r12,reenable_mmu@h
 	ori	r12,r12,reenable_mmu@l
 	mtspr	SPRN_SRR0,r12
@@ -201,6 +204,7 @@ transfer_to_handler_cont:
 	RFI
 reenable_mmu:				/* re-enable mmu so we can */
 	mfmsr	r10
+#endif /* !CONFIG_BOOKE */
 	lwz	r12,_MSR(r1)
 	xor	r10,r10,r12
 	andi.	r10,r10,MSR_EE		/* Did EE change? */
@@ -247,11 +251,23 @@ reenable_mmu:				/* re-enable mmu so we can */
 	mtlr	r9
 	bctr				/* jump to handler */
 #else /* CONFIG_TRACE_IRQFLAGS */
+#ifdef CONFIG_BOOKE
+	/*
+	 * We're not changing address space on Book E, and the extra rfi
+	 * can hurt when virtualized without hardware support -- whereas
+	 * mtmsr can be paravirtualized.
+	 */
+	mtmsr	r10
+	mtctr	r11
+	mtlr	r9
+	bctr
+#else
 	mtspr	SPRN_SRR0,r11
 	mtspr	SPRN_SRR1,r10
 	mtlr	r9
 	SYNC
 	RFI				/* jump to handler, enable MMU */
+#endif /* !CONFIG_BOOKE */
 #endif /* CONFIG_TRACE_IRQFLAGS */
 
 #if defined (CONFIG_6xx) || defined(CONFIG_E500)
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH v2 3/3] powerpc/mpc85xx_ds: convert to unified PCI init
From: Scott Wood @ 2012-07-11  0:26 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Alexander Graf
In-Reply-To: <1341966409-22660-1-git-send-email-scottwood@freescale.com>

Similar to how the primary PCI bridge is identified by looking
for an isa subnode, we determine whether to apply uli exclusions
by looking for a uli subnode.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
v2: Rebased on Kumar's next

 arch/powerpc/platforms/85xx/mpc85xx_ds.c |   97 +++++++++---------------------
 1 files changed, 29 insertions(+), 68 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
index d30f6c4..6d3265f 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
@@ -114,71 +114,53 @@ void __init mpc85xx_ds_pic_init(void)
 }
 
 #ifdef CONFIG_PCI
-static int primary_phb_addr;
 extern int uli_exclude_device(struct pci_controller *hose,
 				u_char bus, u_char devfn);
 
+static struct device_node *pci_with_uli;
+
 static int mpc85xx_exclude_device(struct pci_controller *hose,
 				   u_char bus, u_char devfn)
 {
-	struct device_node* node;
-	struct resource rsrc;
-
-	node = hose->dn;
-	of_address_to_resource(node, 0, &rsrc);
-
-	if ((rsrc.start & 0xfffff) == primary_phb_addr) {
+	if (hose->dn == pci_with_uli)
 		return uli_exclude_device(hose, bus, devfn);
-	}
 
 	return PCIBIOS_SUCCESSFUL;
 }
 #endif	/* CONFIG_PCI */
 
-/*
- * Setup the architecture
- */
-static void __init mpc85xx_ds_setup_arch(void)
+static void __init mpc85xx_ds_pci_init(void)
 {
 #ifdef CONFIG_PCI
-	struct device_node *np;
-	struct pci_controller *hose;
-#endif
-	dma_addr_t max = 0xffffffff;
+	struct device_node *node;
 
-	if (ppc_md.progress)
-		ppc_md.progress("mpc85xx_ds_setup_arch()", 0);
+	fsl_pci_init();
 
-#ifdef CONFIG_PCI
-	for_each_node_by_type(np, "pci") {
-		if (of_device_is_compatible(np, "fsl,mpc8540-pci") ||
-		    of_device_is_compatible(np, "fsl,mpc8548-pcie") ||
-		    of_device_is_compatible(np, "fsl,p2020-pcie")) {
-			struct resource rsrc;
-			of_address_to_resource(np, 0, &rsrc);
-			if ((rsrc.start & 0xfffff) == primary_phb_addr)
-				fsl_add_bridge(np, 1);
-			else
-				fsl_add_bridge(np, 0);
-
-			hose = pci_find_hose_for_OF_device(np);
-			max = min(max, hose->dma_window_base_cur +
-					hose->dma_window_size);
+	/* See if we have a ULI under the primary */
+
+	node = of_find_node_by_name(NULL, "uli1575");
+	while ((pci_with_uli = of_get_parent(node))) {
+		of_node_put(node);
+		node = pci_with_uli;
+
+		if (pci_with_uli == fsl_pci_primary) {
+			ppc_md.pci_exclude_device = mpc85xx_exclude_device;
+			break;
 		}
 	}
-
-	ppc_md.pci_exclude_device = mpc85xx_exclude_device;
 #endif
+}
 
-	mpc85xx_smp_init();
+/*
+ * Setup the architecture
+ */
+static void __init mpc85xx_ds_setup_arch(void)
+{
+	if (ppc_md.progress)
+		ppc_md.progress("mpc85xx_ds_setup_arch()", 0);
 
-#ifdef CONFIG_SWIOTLB
-	if ((memblock_end_of_DRAM() - 1) > max) {
-		ppc_swiotlb_enable = 1;
-		set_pci_dma_ops(&swiotlb_dma_ops);
-		ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_swiotlb;
-	}
-#endif
+	mpc85xx_ds_pci_init();
+	mpc85xx_smp_init();
 
 	printk("MPC85xx DS board from Freescale Semiconductor\n");
 }
@@ -190,14 +172,7 @@ static int __init mpc8544_ds_probe(void)
 {
 	unsigned long root = of_get_flat_dt_root();
 
-	if (of_flat_dt_is_compatible(root, "MPC8544DS")) {
-#ifdef CONFIG_PCI
-		primary_phb_addr = 0xb000;
-#endif
-		return 1;
-	}
-
-	return 0;
+	return !!of_flat_dt_is_compatible(root, "MPC8544DS");
 }
 
 machine_device_initcall(mpc8544_ds, mpc85xx_common_publish_devices);
@@ -215,14 +190,7 @@ static int __init mpc8572_ds_probe(void)
 {
 	unsigned long root = of_get_flat_dt_root();
 
-	if (of_flat_dt_is_compatible(root, "fsl,MPC8572DS")) {
-#ifdef CONFIG_PCI
-		primary_phb_addr = 0x8000;
-#endif
-		return 1;
-	}
-
-	return 0;
+	return !!of_flat_dt_is_compatible(root, "fsl,MPC8572DS");
 }
 
 /*
@@ -232,14 +200,7 @@ static int __init p2020_ds_probe(void)
 {
 	unsigned long root = of_get_flat_dt_root();
 
-	if (of_flat_dt_is_compatible(root, "fsl,P2020DS")) {
-#ifdef CONFIG_PCI
-		primary_phb_addr = 0x9000;
-#endif
-		return 1;
-	}
-
-	return 0;
+	return !!of_flat_dt_is_compatible(root, "fsl,P2020DS");
 }
 
 define_machine(mpc8544_ds) {
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH v2 1/3] powerpc/fsl-pci: get PCI init out of board files
From: Scott Wood @ 2012-07-11  0:26 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Alexander Graf
In-Reply-To: <1341966409-22660-1-git-send-email-scottwood@freescale.com>

As an alternative incremental starting point to Jia Hongtao's patchset,
get the FSL PCI init out of the board files, but do not yet convert to a
platform driver.

Rather than having each board supply a magic register offset for
determining the "primary" bus, we look for which PCI host bridge
contains an ISA node within its subtree.  If there is no ISA node,
normally that would mean there is no primary bus, but until certain
bugs are fixed we arbitrarily designate a primary in this case.

Conversion to a platform driver and related improvements can happen
after this, as the ordering issues are sorted out.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
v2: rebased on Kumar's next

 arch/powerpc/sysdev/fsl_pci.c |   71 ++++++++++++++++++++++++++++++++++++++++-
 arch/powerpc/sysdev/fsl_pci.h |    8 +++++
 2 files changed, 78 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index edbf794..a7b2a60 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1,7 +1,7 @@
 /*
  * MPC83xx/85xx/86xx PCI/PCIE support routing.
  *
- * Copyright 2007-2011 Freescale Semiconductor, Inc.
+ * Copyright 2007-2012 Freescale Semiconductor, Inc.
  * Copyright 2008-2009 MontaVista Software, Inc.
  *
  * Initial author: Xianghua Xiao <x.xiao@freescale.com>
@@ -807,3 +807,72 @@ u64 fsl_pci_immrbar_base(struct pci_controller *hose)
 
 	return 0;
 }
+
+#if defined(CONFIG_FSL_SOC_BOOKE) || defined(CONFIG_PPC_86xx)
+static const struct of_device_id pci_ids[] = {
+	{ .compatible = "fsl,mpc8540-pci", },
+	{ .compatible = "fsl,mpc8548-pcie", },
+	{ .compatible = "fsl,mpc8610-pci", },
+	{ .compatible = "fsl,mpc8641-pcie", },
+	{ .compatible = "fsl,p1022-pcie", },
+	{ .compatible = "fsl,p1010-pcie", },
+	{ .compatible = "fsl,p1023-pcie", },
+	{ .compatible = "fsl,p4080-pcie", },
+	{ .compatible = "fsl,qoriq-pcie-v2.3", },
+	{ .compatible = "fsl,qoriq-pcie-v2.2", },
+	{},
+};
+
+struct device_node *fsl_pci_primary;
+
+void __devinit fsl_pci_init(void)
+{
+	struct device_node *node;
+	struct pci_controller *hose;
+	dma_addr_t max = 0xffffffff;
+
+	/* Callers can specify the primary bus using other means. */
+	if (!fsl_pci_primary) {
+		/* If a PCI host bridge contains an ISA node, it's primary. */
+		node = of_find_node_by_type(NULL, "isa");
+		while ((fsl_pci_primary = of_get_parent(node))) {
+			of_node_put(node);
+			node = fsl_pci_primary;
+
+			if (of_match_node(pci_ids, node))
+				break;
+		}
+	}
+
+	node = NULL;
+	for_each_node_by_type(node, "pci") {
+		if (of_match_node(pci_ids, node)) {
+			/*
+			 * If there's no PCI host bridge with ISA, arbitrarily
+			 * designate one as primary.  This can go away once
+			 * various bugs with primary-less systems are fixed.
+			 */
+			if (!fsl_pci_primary)
+				fsl_pci_primary = node;
+
+			fsl_add_bridge(node, fsl_pci_primary == node);
+			hose = pci_find_hose_for_OF_device(node);
+			max = min(max, hose->dma_window_base_cur +
+					hose->dma_window_size);
+		}
+	}
+
+#ifdef CONFIG_SWIOTLB
+	/*
+	 * if we couldn't map all of DRAM via the dma windows
+	 * we need SWIOTLB to handle buffers located outside of
+	 * dma capable memory region
+	 */
+	if (memblock_end_of_DRAM() - 1 > max) {
+		ppc_swiotlb_enable = 1;
+		set_pci_dma_ops(&swiotlb_dma_ops);
+		ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_swiotlb;
+	}
+#endif
+}
+#endif
diff --git a/arch/powerpc/sysdev/fsl_pci.h b/arch/powerpc/sysdev/fsl_pci.h
index a39ed5c..baa0fd1 100644
--- a/arch/powerpc/sysdev/fsl_pci.h
+++ b/arch/powerpc/sysdev/fsl_pci.h
@@ -93,5 +93,13 @@ extern void fsl_pcibios_fixup_bus(struct pci_bus *bus);
 extern int mpc83xx_add_bridge(struct device_node *dev);
 u64 fsl_pci_immrbar_base(struct pci_controller *hose);
 
+extern struct device_node *fsl_pci_primary;
+
+#ifdef CONFIG_FSL_PCI
+void fsl_pci_init(void);
+#else
+static inline void fsl_pci_init(void) {}
+#endif
+
 #endif /* __POWERPC_FSL_PCI_H */
 #endif /* __KERNEL__ */
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH v2 2/3] powerpc/e500: add paravirt QEMU platform
From: Scott Wood @ 2012-07-11  0:26 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Alexander Graf
In-Reply-To: <1341966409-22660-1-git-send-email-scottwood@freescale.com>

This gives the kernel a paravirtualized machine to target, without
requiring both sides to pretend to be targeting a specific board
that likely has little to do with the host in KVM scenarios.  This
avoids the need to add new boards to QEMU just to be able to
run KVM on new CPUs.

As this is the first platform that can run with either e500v2 or
e500mc, CONFIG_PPC_E500MC is now a legitimately user configurable
option, so add a help text.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
v2: Added a comment about the flexible nature of this platform, and
rebased on Kumar's next

 arch/powerpc/platforms/85xx/Kconfig     |   16 +++++++
 arch/powerpc/platforms/85xx/Makefile    |    1 +
 arch/powerpc/platforms/85xx/qemu_e500.c |   72 +++++++++++++++++++++++++++++++
 arch/powerpc/platforms/Kconfig.cputype  |    4 ++
 4 files changed, 93 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/platforms/85xx/qemu_e500.c

diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
index dddb3e5..159c01e 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -254,6 +254,22 @@ config P5020_DS
 	help
 	  This option enables support for the P5020 DS board
 
+config PPC_QEMU_E500
+	bool "QEMU generic e500 platform"
+	depends on EXPERIMENTAL
+	select DEFAULT_UIMAGE
+	help
+	  This option enables support for running as a QEMU guest using
+	  QEMU's generic e500 machine.  This is not required if you're
+	  using a QEMU machine that targets a specific board, such as
+	  mpc8544ds.
+
+	  Unlike most e500 boards that target a specific CPU, this
+	  platform works with any e500-family CPU that QEMU supports.
+	  Thus, you'll need to make sure CONFIG_PPC_E500MC is set or
+	  unset based on the emulated CPU (or actual host CPU in the case
+	  of KVM).
+
 endif # FSL_SOC_BOOKE
 
 config TQM85xx
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 30652e0..3dfe811 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_SOCRATES)    += socrates.o socrates_fpga_pic.o
 obj-$(CONFIG_KSI8560)	  += ksi8560.o
 obj-$(CONFIG_XES_MPC85xx) += xes_mpc85xx.o
 obj-$(CONFIG_GE_IMP3A)	  += ge_imp3a.o
+obj-$(CONFIG_PPC_QEMU_E500) += qemu_e500.o
diff --git a/arch/powerpc/platforms/85xx/qemu_e500.c b/arch/powerpc/platforms/85xx/qemu_e500.c
new file mode 100644
index 0000000..95a2e53
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/qemu_e500.c
@@ -0,0 +1,72 @@
+/*
+ * Paravirt target for a generic QEMU e500 machine
+ *
+ * This is intended to be a flexible device-tree-driven platform, not fixed
+ * to a particular piece of hardware or a particular spec of virtual hardware,
+ * beyond the assumption of an e500-family CPU.  Some things are still hardcoded
+ * here, such as MPIC, but this is a limitation of the current code rather than
+ * an interface contract with QEMU.
+ *
+ * Copyright 2012 Freescale Semiconductor Inc.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/of_fdt.h>
+#include <asm/machdep.h>
+#include <asm/time.h>
+#include <asm/udbg.h>
+#include <asm/mpic.h>
+#include <sysdev/fsl_soc.h>
+#include <sysdev/fsl_pci.h>
+#include "smp.h"
+#include "mpc85xx.h"
+
+void __init qemu_e500_pic_init(void)
+{
+	struct mpic *mpic;
+
+	mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN | MPIC_SINGLE_DEST_CPU,
+			0, 256, " OpenPIC  ");
+
+	BUG_ON(mpic == NULL);
+	mpic_init(mpic);
+}
+
+static void __init qemu_e500_setup_arch(void)
+{
+	ppc_md.progress("qemu_e500_setup_arch()", 0);
+
+	fsl_pci_init();
+	mpc85xx_smp_init();
+}
+
+/*
+ * Called very early, device-tree isn't unflattened
+ */
+static int __init qemu_e500_probe(void)
+{
+	unsigned long root = of_get_flat_dt_root();
+
+	return !!of_flat_dt_is_compatible(root, "fsl,qemu-e500");
+}
+
+machine_device_initcall(qemu_e500, mpc85xx_common_publish_devices);
+
+define_machine(qemu_e500) {
+	.name			= "QEMU e500",
+	.probe			= qemu_e500_probe,
+	.setup_arch		= qemu_e500_setup_arch,
+	.init_IRQ		= qemu_e500_pic_init,
+#ifdef CONFIG_PCI
+	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+#endif
+	.get_irq		= mpic_get_irq,
+	.restart		= fsl_rstcr_restart,
+	.calibrate_decr		= generic_calibrate_decr,
+	.progress		= udbg_progress,
+};
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 61c9550..30fd01d 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -159,6 +159,10 @@ config PPC_E500MC
 	bool "e500mc Support"
 	select PPC_FPU
 	depends on E500
+	help
+	  This must be enabled for running on e500mc (and derivatives
+	  such as e5500/e6500), and must be disabled for running on
+	  e500v1 or e500v2.
 
 config PPC_FPU
 	bool
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH v2 0/3] powerpc/fsl: PCI refactoring and QEMU paravirt platform
From: Scott Wood @ 2012-07-11  0:26 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Alexander Graf

The QEMU stuff is related to the PCI refactoring because currently
we have a hard time selecting a primary bus under QEMU, and also because
the generic qemu e500 platform wants a full list of FSL PCI compatibles
to check.

Patchset rebased on Kumar's next branch.

Scott Wood (3):
  powerpc/fsl-pci: get PCI init out of board files
  powerpc/e500: add paravirt QEMU platform
  powerpc/mpc85xx_ds: convert to unified PCI init

 arch/powerpc/platforms/85xx/Kconfig      |   16 +++++
 arch/powerpc/platforms/85xx/Makefile     |    1 +
 arch/powerpc/platforms/85xx/mpc85xx_ds.c |   97 +++++++++---------------------
 arch/powerpc/platforms/85xx/qemu_e500.c  |   72 ++++++++++++++++++++++
 arch/powerpc/platforms/Kconfig.cputype   |    4 +
 arch/powerpc/sysdev/fsl_pci.c            |   71 +++++++++++++++++++++-
 arch/powerpc/sysdev/fsl_pci.h            |    8 +++
 7 files changed, 200 insertions(+), 69 deletions(-)
 create mode 100644 arch/powerpc/platforms/85xx/qemu_e500.c

-- 
1.7.5.4

^ permalink raw reply

* Re: [RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory
From: Jiang Liu @ 2012-07-11  0:21 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: len.brown, wency, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, Christoph Lameter,
	linuxppc-dev, akpm
In-Reply-To: <4FFCC438.4080004@jp.fujitsu.com>

On 07/11/2012 08:09 AM, Yasuaki Ishimatsu wrote:
> Hi Jiang,
> 
> 2012/07/11 1:50, Jiang Liu wrote:
>> On 07/10/2012 05:58 PM, Yasuaki Ishimatsu wrote:
>>> Hi Christoph,
>>>
>>> 2012/07/10 0:18, Christoph Lameter wrote:
>>>>
>>>> On Mon, 9 Jul 2012, Yasuaki Ishimatsu wrote:
>>>>
>>>>> Even if you apply these patches, you cannot remove the physical memory
>>>>> completely since these patches are still under development. I want you to
>>>>> cooperate to improve the physical memory hot-remove. So please review these
>>>>> patches and give your comment/idea.
>>>>
>>>> Could you at least give a method on how you want to do physical memory
>>>> removal?
>>>
>>> We plan to release a dynamic hardware partitionable system. It will be
>>> able to hot remove/add a system board which included memory and cpu.
>>> But as you know, Linux does not support memory hot-remove on x86 box.
>>> So I try to develop it.
>>>
>>> Current plan to hot remove system board is to use container driver.
>>> Thus I define the system board in ACPI DSDT table as a container device.
>>> It have supported hot-add a container device. And if container device
>>> has _EJ0 ACPI method, "eject" file to remove the container device is
>>> prepared as follow:
>>>
>>> # ls -l /sys/bus/acpi/devices/ACPI0004\:01/eject
>>> --w-------. 1 root root 4096 Jul 10 18:19 /sys/bus/acpi/devices/ACPI0004:01/eject
>>>
>>> When I hot-remove the container device, I echo 1 to the file as follow:
>>>
>>> #echo 1 > /sys/bus/acpi/devices/ACPI0004\:02/eject
>>>
>>> Then acpi_bus_trim() is called. And it calls acpi_memory_device_remove()
>>> for removing memory device. But the code does not do nothing.
>>> So I developed the continuation of the function.
>>>
>>>> You would have to remove all objects from the range you want to
>>>> physically remove. That is only possible under special circumstances and
>>>> with a limited set of objects. Even if you exclusively use ZONE_MOVEABLE
>>>> you still may get cases where pages are pinned for a long time.
>>>
>>> I know it. So my memory hot-remove plan is as follows:
>>>
>>> 1. hot-added a system board
>>>     All memory which included the system board is offline.
>>>
>>> 2. online the memory as removable page
>>>     The function has not supported yet. It is being developed by Lai as follow:
>>>     http://lkml.indiana.edu/hypermail/linux/kernel/1207.0/01478.html
>>>     If it is supported, I will be able to create movable memory.
>>>
>>> 3. hot-remove the memory by container device's eject file
>> We have implemented a prototype to do physical node (mem + CPU + IOH) hotplug
>> for Itanium and is now porting it to x86. But with currently solution, memory
>> hotplug functionality may cause 10-20% performance decrease because we concentrate
>> all DMA/Normal memory to the first NUMA node, and all other NUMA nodes only
>> hosts ZONE_MOVABLE. We are working on solution to minimize the performance
>> drop now.
> 
> Thank you for your interesting response.
> 
> I have a question. How do you move all other NUMA nodes to ZONE_MOVABLE?
> To use ZONE_MOVABLE, we need to use boot options like kernelcore or movablecore.
> But it is not enough, since the requested amount is spread evenly throughout
> all nodes in the system. So I think we do not have way to move all other NUMA
> node to ZONE_MOVABLE.
We have modified the ZONE_MOVABLE spreading and bootmem allocation. If the kernelcore
or movablecore kernel parameters are present, we follow current behavior. If those
parameter are absent and the platform supports physical hotplug, we will concentrate
DMA/NORMAL memory to specific nodes.

> 
> Thanks,
> Yasuaki Ishimatsu
> 
>>
>>>
>>> Thanks,
>>> Yasuaki Ishimatsu
>>>
>>>>
>>>> I am not sure that these patches are useful unless we know where you are
>>>> going with this. If we end up with a situation where we still cannot
>>>> remove physical memory then this patchset is not helpful.
>>>
>>>
>>>
>>
>>
> 
> 
> 

^ permalink raw reply

* Re: [PATCH 1/4] Talitos: move the data structure into header file
From: Kim Phillips @ 2012-07-11  0:11 UTC (permalink / raw)
  To: Qiang Liu
  Cc: Herbert Xu, linux-crypto, linuxppc-dev, David S. Miller,
	horia.geanta
In-Reply-To: <1341899809-20630-2-git-send-email-qiang.liu@freescale.com>

On Tue, 10 Jul 2012 13:56:46 +0800
Qiang Liu <qiang.liu@freescale.com> wrote:

> Move the declaration of talitos data structure into talitos.h.
> 
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Qiang Liu <qiang.liu@freescale.com>
> ---

this patch has already been submitted [1].

Subsequent patches in this series also don't apply cleanly:  can
you rebase onto [2], which is based on Herbert's cryptodev tree and
contain's Horia's four patches, and re-send?

Also note that upstream talitos does not yet contain NAPI support
[3].

Thanks,

Kim

[1] http://www.mail-archive.com/linux-crypto@vger.kernel.org/msg07299.html
[2] git://git.freescale.com/crypto/cryptodev.git
[3] http://www.mail-archive.com/linux-crypto@vger.kernel.org/msg07289.html

^ permalink raw reply

* Re: [RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory
From: Yasuaki Ishimatsu @ 2012-07-11  0:09 UTC (permalink / raw)
  To: Jiang Liu
  Cc: len.brown, wency, linux-acpi, linux-kernel, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, Christoph Lameter,
	linuxppc-dev, akpm
In-Reply-To: <4FFC5D43.7040206@gmail.com>

Hi Jiang,

2012/07/11 1:50, Jiang Liu wrote:
> On 07/10/2012 05:58 PM, Yasuaki Ishimatsu wrote:
>> Hi Christoph,
>>
>> 2012/07/10 0:18, Christoph Lameter wrote:
>>>
>>> On Mon, 9 Jul 2012, Yasuaki Ishimatsu wrote:
>>>
>>>> Even if you apply these patches, you cannot remove the physical memory
>>>> completely since these patches are still under development. I want you to
>>>> cooperate to improve the physical memory hot-remove. So please review these
>>>> patches and give your comment/idea.
>>>
>>> Could you at least give a method on how you want to do physical memory
>>> removal?
>>
>> We plan to release a dynamic hardware partitionable system. It will be
>> able to hot remove/add a system board which included memory and cpu.
>> But as you know, Linux does not support memory hot-remove on x86 box.
>> So I try to develop it.
>>
>> Current plan to hot remove system board is to use container driver.
>> Thus I define the system board in ACPI DSDT table as a container device.
>> It have supported hot-add a container device. And if container device
>> has _EJ0 ACPI method, "eject" file to remove the container device is
>> prepared as follow:
>>
>> # ls -l /sys/bus/acpi/devices/ACPI0004\:01/eject
>> --w-------. 1 root root 4096 Jul 10 18:19 /sys/bus/acpi/devices/ACPI0004:01/eject
>>
>> When I hot-remove the container device, I echo 1 to the file as follow:
>>
>> #echo 1 > /sys/bus/acpi/devices/ACPI0004\:02/eject
>>
>> Then acpi_bus_trim() is called. And it calls acpi_memory_device_remove()
>> for removing memory device. But the code does not do nothing.
>> So I developed the continuation of the function.
>>
>>> You would have to remove all objects from the range you want to
>>> physically remove. That is only possible under special circumstances and
>>> with a limited set of objects. Even if you exclusively use ZONE_MOVEABLE
>>> you still may get cases where pages are pinned for a long time.
>>
>> I know it. So my memory hot-remove plan is as follows:
>>
>> 1. hot-added a system board
>>     All memory which included the system board is offline.
>>
>> 2. online the memory as removable page
>>     The function has not supported yet. It is being developed by Lai as follow:
>>     http://lkml.indiana.edu/hypermail/linux/kernel/1207.0/01478.html
>>     If it is supported, I will be able to create movable memory.
>>
>> 3. hot-remove the memory by container device's eject file
> We have implemented a prototype to do physical node (mem + CPU + IOH) hotplug
> for Itanium and is now porting it to x86. But with currently solution, memory
> hotplug functionality may cause 10-20% performance decrease because we concentrate
> all DMA/Normal memory to the first NUMA node, and all other NUMA nodes only
> hosts ZONE_MOVABLE. We are working on solution to minimize the performance
> drop now.

Thank you for your interesting response.

I have a question. How do you move all other NUMA nodes to ZONE_MOVABLE?
To use ZONE_MOVABLE, we need to use boot options like kernelcore or movablecore.
But it is not enough, since the requested amount is spread evenly throughout
all nodes in the system. So I think we do not have way to move all other NUMA
node to ZONE_MOVABLE.

Thanks,
Yasuaki Ishimatsu

>
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
>>>
>>> I am not sure that these patches are useful unless we know where you are
>>> going with this. If we end up with a situation where we still cannot
>>> remove physical memory then this patchset is not helpful.
>>
>>
>>
>
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox