* [PATCH v5 11/14] memory-hotplug: Integrated __remove_section() of CONFIG_SPARSEMEM_VMEMMAP.
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
Currently __remove_section for SPARSEMEM_VMEMMAP does nothing. But even if
we use SPARSEMEM_VMEMMAP, we can unregister the memory_section.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
mm/memory_hotplug.c | 11 -----------
1 files changed, 0 insertions(+), 11 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c12bd55..71cb656 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -430,16 +430,6 @@ static int __meminit __add_section(int nid, struct zone *zone,
return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
}
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static int __remove_section(struct zone *zone, struct mem_section *ms)
-{
- /*
- * XXX: Freeing memmap with vmemmap is not implement yet.
- * This should be removed later.
- */
- return -EBUSY;
-}
-#else
static int __remove_section(struct zone *zone, struct mem_section *ms)
{
int ret = -EINVAL;
@@ -454,7 +444,6 @@ static int __remove_section(struct zone *zone, struct mem_section *ms)
sparse_remove_one_section(zone, ms);
return 0;
}
-#endif
/*
* Reasonably generic function for adding memory. It is
--
1.7.1
^ permalink raw reply related
* [PATCH v5 14/14] memory-hotplug: free node_data when a node is offlined
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
From: Wen Congyang <wency@cn.fujitsu.com>
We call hotadd_new_pgdat() to allocate memory to store node_data. So we
should free it when removing a node.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
mm/memory_hotplug.c | 20 +++++++++++++++++++-
1 files changed, 19 insertions(+), 1 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f8a1d2f..447fa24 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1680,9 +1680,12 @@ static int check_cpu_on_node(void *data)
/* offline the node if all memory sections of this node are removed */
static void try_offline_node(int nid)
{
+ pg_data_t *pgdat = NODE_DATA(nid);
unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
- unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
+ unsigned long end_pfn = start_pfn + pgdat->node_spanned_pages;
unsigned long pfn;
+ struct page *pgdat_page = virt_to_page(pgdat);
+ int i;
for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
unsigned long section_nr = pfn_to_section_nr(pfn);
@@ -1709,6 +1712,21 @@ static void try_offline_node(int nid)
*/
node_set_offline(nid);
unregister_one_node(nid);
+
+ if (!PageSlab(pgdat_page) && !PageCompound(pgdat_page))
+ /* node data is allocated from boot memory */
+ return;
+
+ /* free waittable in each zone */
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ struct zone *zone = pgdat->node_zones + i;
+
+ if (zone->wait_table)
+ vfree(zone->wait_table);
+ }
+
+ arch_refresh_nodedata(nid, NULL);
+ arch_free_nodedata(pgdat);
}
int __ref remove_memory(int nid, u64 start, u64 size)
--
1.7.1
^ permalink raw reply related
* [PATCH v5 08/14] memory-hotplug: Common APIs to support page tables hot-remove
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
From: Wen Congyang <wency@cn.fujitsu.com>
When memory is removed, the corresponding pagetables should alse be removed.
This patch introduces some common APIs to support vmemmap pagetable and x86_64
architecture pagetable removing.
All pages of virtual mapping in removed memory cannot be freedi if some pages
used as PGD/PUD includes not only removed memory but also other memory. So the
patch uses the following way to check whether page can be freed or not.
1. When removing memory, the page structs of the revmoved memory are filled
with 0FD.
2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
In this case, the page used as PT/PMD can be freed.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/include/asm/pgtable_types.h | 1 +
arch/x86/mm/init_64.c | 297 ++++++++++++++++++++++++++++++++++
arch/x86/mm/pageattr.c | 47 +++---
include/linux/bootmem.h | 1 +
4 files changed, 324 insertions(+), 22 deletions(-)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 3c32db8..4b6fd2a 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -352,6 +352,7 @@ static inline void update_page_count(int level, unsigned long pages) { }
* as a pte too.
*/
extern pte_t *lookup_address(unsigned long address, unsigned int *level);
+extern int __split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase);
#endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index aeaa27e..b30df3c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -682,6 +682,303 @@ int arch_add_memory(int nid, u64 start, u64 size)
}
EXPORT_SYMBOL_GPL(arch_add_memory);
+#define PAGE_INUSE 0xFD
+
+static void __meminit free_pagetable(struct page *page, int order)
+{
+ struct zone *zone;
+ bool bootmem = false;
+ unsigned long magic;
+
+ /* bootmem page has reserved flag */
+ if (PageReserved(page)) {
+ __ClearPageReserved(page);
+ bootmem = true;
+
+ magic = (unsigned long)page->lru.next;
+ if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
+ put_page_bootmem(page);
+ else
+ __free_pages_bootmem(page, order);
+ } else
+ free_pages((unsigned long)page_address(page), order);
+
+ /*
+ * SECTION_INFO pages and MIX_SECTION_INFO pages
+ * are all allocated by bootmem.
+ */
+ if (bootmem) {
+ zone = page_zone(page);
+ zone_span_writelock(zone);
+ zone->present_pages++;
+ zone_span_writeunlock(zone);
+ totalram_pages++;
+ }
+}
+
+static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
+{
+ pte_t *pte;
+ int i;
+
+ for (i = 0; i < PTRS_PER_PTE; i++) {
+ pte = pte_start + i;
+ if (pte_val(*pte))
+ return;
+ }
+
+ /* free a pte talbe */
+ free_pagetable(pmd_page(*pmd), 0);
+ spin_lock(&init_mm.page_table_lock);
+ pmd_clear(pmd);
+ spin_unlock(&init_mm.page_table_lock);
+}
+
+static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
+{
+ pmd_t *pmd;
+ int i;
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd = pmd_start + i;
+ if (pmd_val(*pmd))
+ return;
+ }
+
+ /* free a pmd talbe */
+ free_pagetable(pud_page(*pud), 0);
+ spin_lock(&init_mm.page_table_lock);
+ pud_clear(pud);
+ spin_unlock(&init_mm.page_table_lock);
+}
+
+/* Return true if pgd is changed, otherwise return false. */
+static bool __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd)
+{
+ pud_t *pud;
+ int i;
+
+ for (i = 0; i < PTRS_PER_PUD; i++) {
+ pud = pud_start + i;
+ if (pud_val(*pud))
+ return false;
+ }
+
+ /* free a pud table */
+ free_pagetable(pgd_page(*pgd), 0);
+ spin_lock(&init_mm.page_table_lock);
+ pgd_clear(pgd);
+ spin_unlock(&init_mm.page_table_lock);
+
+ return true;
+}
+
+static void __meminit
+remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
+ bool direct)
+{
+ unsigned long next, pages = 0;
+ pte_t *pte;
+ void *page_addr;
+ phys_addr_t phys_addr;
+
+ pte = pte_start + pte_index(addr);
+ for (; addr < end; addr = next, pte++) {
+ next = (addr + PAGE_SIZE) & PAGE_MASK;
+ if (next > end)
+ next = end;
+
+ if (!pte_present(*pte))
+ continue;
+
+ /*
+ * We mapped [0,1G) memory as identity mapping when
+ * initializing, in arch/x86/kernel/head_64.S. These
+ * pagetables cannot be removed.
+ */
+ phys_addr = pte_val(*pte) + (addr & PAGE_MASK);
+ if (phys_addr < (phys_addr_t)0x40000000)
+ return;
+
+ if (IS_ALIGNED(addr, PAGE_SIZE) &&
+ IS_ALIGNED(next, PAGE_SIZE)) {
+ if (!direct) {
+ free_pagetable(pte_page(*pte), 0);
+ pages++;
+ }
+
+ spin_lock(&init_mm.page_table_lock);
+ pte_clear(&init_mm, addr, pte);
+ spin_unlock(&init_mm.page_table_lock);
+ } else {
+ /*
+ * If we are not removing the whole page, it means
+ * other ptes in this page are being used and we canot
+ * remove them. So fill the unused ptes with 0xFD, and
+ * remove the page when it is wholly filled with 0xFD.
+ */
+ memset((void *)addr, PAGE_INUSE, next - addr);
+ page_addr = page_address(pte_page(*pte));
+
+ if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
+ free_pagetable(pte_page(*pte), 0);
+ pages++;
+
+ spin_lock(&init_mm.page_table_lock);
+ pte_clear(&init_mm, addr, pte);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+ }
+ }
+
+ /* Call free_pte_table() in remove_pmd_table(). */
+ flush_tlb_all();
+ if (direct)
+ update_page_count(PG_LEVEL_4K, -pages);
+}
+
+static void __meminit
+remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
+ bool direct)
+{
+ unsigned long pte_phys, next, pages = 0;
+ pte_t *pte_base;
+ pmd_t *pmd;
+
+ pmd = pmd_start + pmd_index(addr);
+ for (; addr < end; addr = next, pmd++) {
+ next = pmd_addr_end(addr, end);
+
+ if (!pmd_present(*pmd))
+ continue;
+
+ if (pmd_large(*pmd)) {
+ if (IS_ALIGNED(addr, PMD_SIZE) &&
+ IS_ALIGNED(next, PMD_SIZE)) {
+ if (!direct) {
+ free_pagetable(pmd_page(*pmd),
+ get_order(PMD_SIZE));
+ pages++;
+ }
+
+ spin_lock(&init_mm.page_table_lock);
+ pmd_clear(pmd);
+ spin_unlock(&init_mm.page_table_lock);
+ continue;
+ }
+
+ /*
+ * We use 2M page, but we need to remove part of them,
+ * so split 2M page to 4K page.
+ */
+ pte_base = (pte_t *)alloc_low_page(&pte_phys);
+ BUG_ON(!pte_base);
+ __split_large_page((pte_t *)pmd, addr,
+ (pte_t *)pte_base);
+
+ spin_lock(&init_mm.page_table_lock);
+ pmd_populate_kernel(&init_mm, pmd, __va(pte_phys));
+ spin_unlock(&init_mm.page_table_lock);
+
+ flush_tlb_all();
+ }
+
+ pte_base = (pte_t *)map_low_page((pte_t *)pmd_page_vaddr(*pmd));
+ remove_pte_table(pte_base, addr, next, direct);
+ free_pte_table(pte_base, pmd);
+ unmap_low_page(pte_base);
+ }
+
+ /* Call free_pmd_table() in remove_pud_table(). */
+ if (direct)
+ update_page_count(PG_LEVEL_2M, -pages);
+}
+
+static void __meminit
+remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
+ bool direct)
+{
+ unsigned long pmd_phys, next, pages = 0;
+ pmd_t *pmd_base;
+ pud_t *pud;
+
+ pud = pud_start + pud_index(addr);
+ for (; addr < end; addr = next, pud++) {
+ next = pud_addr_end(addr, end);
+
+ if (!pud_present(*pud))
+ continue;
+
+ if (pud_large(*pud)) {
+ if (IS_ALIGNED(addr, PUD_SIZE) &&
+ IS_ALIGNED(next, PUD_SIZE)) {
+ if (!direct) {
+ free_pagetable(pud_page(*pud),
+ get_order(PUD_SIZE));
+ pages++;
+ }
+
+ spin_lock(&init_mm.page_table_lock);
+ pud_clear(pud);
+ spin_unlock(&init_mm.page_table_lock);
+ continue;
+ }
+
+ /*
+ * We use 1G page, but we need to remove part of them,
+ * so split 1G page to 2M page.
+ */
+ pmd_base = (pmd_t *)alloc_low_page(&pmd_phys);
+ BUG_ON(!pmd_base);
+ __split_large_page((pte_t *)pud, addr,
+ (pte_t *)pmd_base);
+
+ spin_lock(&init_mm.page_table_lock);
+ pud_populate(&init_mm, pud, __va(pmd_phys));
+ spin_unlock(&init_mm.page_table_lock);
+
+ flush_tlb_all();
+ }
+
+ pmd_base = (pmd_t *)map_low_page((pmd_t *)pud_page_vaddr(*pud));
+ remove_pmd_table(pmd_base, addr, next, direct);
+ free_pmd_table(pmd_base, pud);
+ unmap_low_page(pmd_base);
+ }
+
+ if (direct)
+ update_page_count(PG_LEVEL_1G, -pages);
+}
+
+/* start and end are both virtual address. */
+static void __meminit
+remove_pagetable(unsigned long start, unsigned long end, bool direct)
+{
+ unsigned long next;
+ pgd_t *pgd;
+ pud_t *pud;
+ bool pgd_changed = false;
+
+ for (; start < end; start = next) {
+ pgd = pgd_offset_k(start);
+ if (!pgd_present(*pgd))
+ continue;
+
+ next = pgd_addr_end(start, end);
+
+ pud = (pud_t *)map_low_page((pud_t *)pgd_page_vaddr(*pgd));
+ remove_pud_table(pud, start, next, direct);
+ if (free_pud_table(pud, pgd))
+ pgd_changed = true;
+ unmap_low_page(pud);
+ }
+
+ if (pgd_changed)
+ sync_global_pgds(start, end - 1);
+
+ flush_tlb_all();
+}
+
#ifdef CONFIG_MEMORY_HOTREMOVE
int __ref arch_remove_memory(u64 start, u64 size)
{
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index a718e0d..7dcb6f9 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -501,21 +501,13 @@ out_unlock:
return do_split;
}
-static int split_large_page(pte_t *kpte, unsigned long address)
+int __split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase)
{
unsigned long pfn, pfninc = 1;
unsigned int i, level;
- pte_t *pbase, *tmp;
+ pte_t *tmp;
pgprot_t ref_prot;
- struct page *base;
-
- if (!debug_pagealloc)
- spin_unlock(&cpa_lock);
- base = alloc_pages(GFP_KERNEL | __GFP_NOTRACK, 0);
- if (!debug_pagealloc)
- spin_lock(&cpa_lock);
- if (!base)
- return -ENOMEM;
+ struct page *base = virt_to_page(pbase);
spin_lock(&pgd_lock);
/*
@@ -523,10 +515,11 @@ static int split_large_page(pte_t *kpte, unsigned long address)
* up for us already:
*/
tmp = lookup_address(address, &level);
- if (tmp != kpte)
- goto out_unlock;
+ if (tmp != kpte) {
+ spin_unlock(&pgd_lock);
+ return 1;
+ }
- pbase = (pte_t *)page_address(base);
paravirt_alloc_pte(&init_mm, page_to_pfn(base));
ref_prot = pte_pgprot(pte_clrhuge(*kpte));
/*
@@ -579,17 +572,27 @@ static int split_large_page(pte_t *kpte, unsigned long address)
* going on.
*/
__flush_tlb_all();
+ spin_unlock(&pgd_lock);
- base = NULL;
+ return 0;
+}
-out_unlock:
- /*
- * If we dropped out via the lookup_address check under
- * pgd_lock then stick the page back into the pool:
- */
- if (base)
+static int split_large_page(pte_t *kpte, unsigned long address)
+{
+ pte_t *pbase;
+ struct page *base;
+
+ if (!debug_pagealloc)
+ spin_unlock(&cpa_lock);
+ base = alloc_pages(GFP_KERNEL | __GFP_NOTRACK, 0);
+ if (!debug_pagealloc)
+ spin_lock(&cpa_lock);
+ if (!base)
+ return -ENOMEM;
+
+ pbase = (pte_t *)page_address(base);
+ if (__split_large_page(kpte, address, pbase))
__free_page(base);
- spin_unlock(&pgd_lock);
return 0;
}
diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index 3f778c2..190ff06 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -53,6 +53,7 @@ extern void free_bootmem_node(pg_data_t *pgdat,
unsigned long size);
extern void free_bootmem(unsigned long physaddr, unsigned long size);
extern void free_bootmem_late(unsigned long physaddr, unsigned long size);
+extern void __free_pages_bootmem(struct page *page, unsigned int order);
/*
* Flags for reserve_bootmem (also if CONFIG_HAVE_ARCH_BOOTMEM_NODE,
--
1.7.1
^ permalink raw reply related
* [PATCH v5 10/14] memory-hotplug: remove memmap of sparse-vmemmap
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
This patch introduces a new API vmemmap_free() to free and remove
vmemmap pagetables. Since pagetable implements are different, each
architecture has to provide its own version of vmemmap_free(), just
like vmemmap_populate().
Note: vmemmap_free() are not implemented for ia64, ppc, s390, and sparc.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/arm64/mm/mmu.c | 3 +++
arch/ia64/mm/discontig.c | 4 ++++
arch/powerpc/mm/init_64.c | 4 ++++
arch/s390/mm/vmem.c | 4 ++++
arch/sparc/mm/init_64.c | 4 ++++
arch/x86/mm/init_64.c | 8 ++++++++
include/linux/mm.h | 1 +
mm/sparse.c | 3 ++-
8 files changed, 30 insertions(+), 1 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a6885d8..9834886 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -392,4 +392,7 @@ int __meminit vmemmap_populate(struct page *start_page,
return 0;
}
#endif /* CONFIG_ARM64_64K_PAGES */
+void vmemmap_free(struct page *memmap, unsigned long nr_pages)
+{
+}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 33943db..882a0fd 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -823,6 +823,10 @@ int __meminit vmemmap_populate(struct page *start_page,
return vmemmap_populate_basepages(start_page, size, node);
}
+void vmemmap_free(struct page *memmap, unsigned long nr_pages)
+{
+}
+
void register_page_bootmem_memmap(unsigned long section_nr,
struct page *start_page, unsigned long size)
{
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 6466440..2969591 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -298,6 +298,10 @@ int __meminit vmemmap_populate(struct page *start_page,
return 0;
}
+void vmemmap_free(struct page *memmap, unsigned long nr_pages)
+{
+}
+
void register_page_bootmem_memmap(unsigned long section_nr,
struct page *start_page, unsigned long size)
{
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 2c14bc2..81e6ba3 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -272,6 +272,10 @@ out:
return ret;
}
+void vmemmap_free(struct page *memmap, unsigned long nr_pages)
+{
+}
+
void register_page_bootmem_memmap(unsigned long section_nr,
struct page *start_page, unsigned long size)
{
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 7e28c9e..9cd1ec0 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2232,6 +2232,10 @@ void __meminit vmemmap_populate_print_last(void)
}
}
+void vmemmap_free(struct page *memmap, unsigned long nr_pages)
+{
+}
+
void register_page_bootmem_memmap(unsigned long section_nr,
struct page *start_page, unsigned long size)
{
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 4b160d8..029c0b9 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1307,6 +1307,14 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
return 0;
}
+void __ref vmemmap_free(struct page *memmap, unsigned long nr_pages)
+{
+ unsigned long start = (unsigned long)memmap;
+ unsigned long end = (unsigned long)(memmap + nr_pages);
+
+ remove_pagetable(start, end, false);
+}
+
void register_page_bootmem_memmap(unsigned long section_nr,
struct page *start_page, unsigned long size)
{
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1eca498..31d5e5d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1709,6 +1709,7 @@ int vmemmap_populate_basepages(struct page *start_page,
unsigned long pages, int node);
int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
void vmemmap_populate_print_last(void);
+void vmemmap_free(struct page *memmap, unsigned long nr_pages);
void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
unsigned long size);
diff --git a/mm/sparse.c b/mm/sparse.c
index 05ca73a..cff9796 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -615,10 +615,11 @@ static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
}
static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
{
- return; /* XXX: Not implemented yet */
+ vmemmap_free(memmap, nr_pages);
}
static void free_map_bootmem(struct page *memmap, unsigned long nr_pages)
{
+ vmemmap_free(memmap, nr_pages);
}
#else
static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
--
1.7.1
^ permalink raw reply related
* [PATCH v5 07/14] memory-hotplug: move pgdat_resize_lock into sparse_remove_one_section()
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
In __remove_section(), we locked pgdat_resize_lock when calling
sparse_remove_one_section(). This lock will disable irq. But we don't need
to lock the whole function. If we do some work to free pagetables in
free_section_usemap(), we need to call flush_tlb_all(), which need
irq enabled. Otherwise the WARN_ON_ONCE() in smp_call_function_many()
will be triggered.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
mm/memory_hotplug.c | 4 ----
mm/sparse.c | 5 ++++-
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 34c656b..c12bd55 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -442,8 +442,6 @@ static int __remove_section(struct zone *zone, struct mem_section *ms)
#else
static int __remove_section(struct zone *zone, struct mem_section *ms)
{
- unsigned long flags;
- struct pglist_data *pgdat = zone->zone_pgdat;
int ret = -EINVAL;
if (!valid_section(ms))
@@ -453,9 +451,7 @@ static int __remove_section(struct zone *zone, struct mem_section *ms)
if (ret)
return ret;
- pgdat_resize_lock(pgdat, &flags);
sparse_remove_one_section(zone, ms);
- pgdat_resize_unlock(pgdat, &flags);
return 0;
}
#endif
diff --git a/mm/sparse.c b/mm/sparse.c
index aadbb2a..05ca73a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -796,8 +796,10 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
{
struct page *memmap = NULL;
- unsigned long *usemap = NULL;
+ unsigned long *usemap = NULL, flags;
+ struct pglist_data *pgdat = zone->zone_pgdat;
+ pgdat_resize_lock(pgdat, &flags);
if (ms->section_mem_map) {
usemap = ms->pageblock_flags;
memmap = sparse_decode_mem_map(ms->section_mem_map,
@@ -805,6 +807,7 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
ms->section_mem_map = 0;
ms->pageblock_flags = NULL;
}
+ pgdat_resize_unlock(pgdat, &flags);
clear_hwpoisoned_pages(memmap, PAGES_PER_SECTION);
free_section_usemap(memmap, usemap);
--
1.7.1
^ permalink raw reply related
* [PATCH v5 06/14] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by get_page_bootmem().
So the patch searches pages of virtual mapping and registers the pages by
get_page_bootmem().
Note: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390,
and sparc.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
arch/ia64/mm/discontig.c | 6 ++++
arch/powerpc/mm/init_64.c | 6 ++++
arch/s390/mm/vmem.c | 6 ++++
arch/sparc/mm/init_64.c | 6 ++++
arch/x86/mm/init_64.c | 52 ++++++++++++++++++++++++++++++++++++++++
include/linux/memory_hotplug.h | 11 +-------
include/linux/mm.h | 3 +-
mm/memory_hotplug.c | 33 ++++++++++++++++++++++---
8 files changed, 109 insertions(+), 14 deletions(-)
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index c641333..33943db 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -822,4 +822,10 @@ int __meminit vmemmap_populate(struct page *start_page,
{
return vmemmap_populate_basepages(start_page, size, node);
}
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+ /* TODO */
+}
#endif
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 95a4529..6466440 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -297,5 +297,11 @@ int __meminit vmemmap_populate(struct page *start_page,
return 0;
}
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+ /* TODO */
+}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 6ed1426..2c14bc2 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -272,6 +272,12 @@ out:
return ret;
}
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+ /* TODO */
+}
+
/*
* Add memory segment to the segment list if it doesn't overlap with
* an already present segment.
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 85be1ca..7e28c9e 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2231,6 +2231,12 @@ void __meminit vmemmap_populate_print_last(void)
node_start = 0;
}
}
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+ /* TODO */
+}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
static void prot_init_common(unsigned long page_none,
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index f78509c..aeaa27e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1000,6 +1000,58 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
return 0;
}
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+ unsigned long addr = (unsigned long)start_page;
+ unsigned long end = (unsigned long)(start_page + size);
+ unsigned long next;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ for (; addr < end; addr = next) {
+ pte_t *pte = NULL;
+
+ pgd = pgd_offset_k(addr);
+ if (pgd_none(*pgd)) {
+ next = (addr + PAGE_SIZE) & PAGE_MASK;
+ continue;
+ }
+ get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO);
+
+ pud = pud_offset(pgd, addr);
+ if (pud_none(*pud)) {
+ next = (addr + PAGE_SIZE) & PAGE_MASK;
+ continue;
+ }
+ get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
+
+ if (!cpu_has_pse) {
+ next = (addr + PAGE_SIZE) & PAGE_MASK;
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd))
+ continue;
+ get_page_bootmem(section_nr, pmd_page(*pmd),
+ MIX_SECTION_INFO);
+
+ pte = pte_offset_kernel(pmd, addr);
+ if (pte_none(*pte))
+ continue;
+ get_page_bootmem(section_nr, pte_page(*pte),
+ SECTION_INFO);
+ } else {
+ next = pmd_addr_end(addr, end);
+
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd))
+ continue;
+ get_page_bootmem(section_nr, pmd_page(*pmd),
+ SECTION_INFO);
+ }
+ }
+}
+
void __meminit vmemmap_populate_print_last(void)
{
if (p_start) {
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 31a563b..2441f36 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -174,17 +174,10 @@ static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
#endif /* CONFIG_NUMA */
#endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
-{
-}
-static inline void put_page_bootmem(struct page *page)
-{
-}
-#else
extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
extern void put_page_bootmem(struct page *page);
-#endif
+extern void get_page_bootmem(unsigned long ingo, struct page *page,
+ unsigned long type);
/*
* Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6320407..1eca498 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1709,7 +1709,8 @@ int vmemmap_populate_basepages(struct page *start_page,
unsigned long pages, int node);
int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
void vmemmap_populate_print_last(void);
-
+void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
+ unsigned long size);
enum mf_flags {
MF_COUNT_INCREASED = 1 << 0,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2c5d734..34c656b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -91,9 +91,8 @@ static void release_memory_resource(struct resource *res)
}
#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
-static void get_page_bootmem(unsigned long info, struct page *page,
- unsigned long type)
+void get_page_bootmem(unsigned long info, struct page *page,
+ unsigned long type)
{
page->lru.next = (struct list_head *) type;
SetPagePrivate(page);
@@ -128,6 +127,7 @@ void __ref put_page_bootmem(struct page *page)
}
+#ifndef CONFIG_SPARSEMEM_VMEMMAP
static void register_page_bootmem_info_section(unsigned long start_pfn)
{
unsigned long *usemap, mapsize, section_nr, i;
@@ -161,6 +161,32 @@ static void register_page_bootmem_info_section(unsigned long start_pfn)
get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
}
+#else
+static void register_page_bootmem_info_section(unsigned long start_pfn)
+{
+ unsigned long *usemap, mapsize, section_nr, i;
+ struct mem_section *ms;
+ struct page *page, *memmap;
+
+ if (!pfn_valid(start_pfn))
+ return;
+
+ section_nr = pfn_to_section_nr(start_pfn);
+ ms = __nr_to_section(section_nr);
+
+ memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+
+ register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION);
+
+ usemap = __nr_to_section(section_nr)->pageblock_flags;
+ page = virt_to_page(usemap);
+
+ mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT;
+
+ for (i = 0; i < mapsize; i++, page++)
+ get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
+}
+#endif
void register_page_bootmem_info_node(struct pglist_data *pgdat)
{
@@ -203,7 +229,6 @@ void register_page_bootmem_info_node(struct pglist_data *pgdat)
register_page_bootmem_info_section(pfn);
}
}
-#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
unsigned long end_pfn)
--
1.7.1
^ permalink raw reply related
* [PATCH v5 05/14] memory-hotplug: introduce new function arch_remove_memory() for removing page table depends on architecture
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
From: Wen Congyang <wency@cn.fujitsu.com>
For removing memory, we need to remove page table. But it depends
on architecture. So the patch introduce arch_remove_memory() for
removing page table. Now it only calls __remove_pages().
Note: __remove_pages() for some archtecuture is not implemented
(I don't know how to implement it for s390).
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
arch/ia64/mm/init.c | 18 ++++++++++++++++++
arch/powerpc/mm/mem.c | 12 ++++++++++++
arch/s390/mm/init.c | 12 ++++++++++++
arch/sh/mm/init.c | 17 +++++++++++++++++
arch/tile/mm/init.c | 8 ++++++++
arch/x86/mm/init_32.c | 12 ++++++++++++
arch/x86/mm/init_64.c | 15 +++++++++++++++
include/linux/memory_hotplug.h | 1 +
mm/memory_hotplug.c | 2 ++
9 files changed, 97 insertions(+), 0 deletions(-)
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 082e383..e333822 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -689,6 +689,24 @@ int arch_add_memory(int nid, u64 start, u64 size)
return ret;
}
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+ struct zone *zone;
+ int ret;
+
+ zone = page_zone(pfn_to_page(start_pfn));
+ ret = __remove_pages(zone, start_pfn, nr_pages);
+ if (ret)
+ pr_warn("%s: Problem encountered in __remove_pages() as"
+ " ret=%d\n", __func__, ret);
+
+ return ret;
+}
+#endif
#endif
/*
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 0dba506..09c6451 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -133,6 +133,18 @@ int arch_add_memory(int nid, u64 start, u64 size)
return __add_pages(nid, zone, start_pfn, nr_pages);
}
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+ struct zone *zone;
+
+ zone = page_zone(pfn_to_page(start_pfn));
+ return __remove_pages(zone, start_pfn, nr_pages);
+}
+#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
/*
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index ae672f4..49ce6bb 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -228,4 +228,16 @@ int arch_add_memory(int nid, u64 start, u64 size)
vmem_remove_mapping(start, size);
return rc;
}
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+ /*
+ * There is no hardware or firmware interface which could trigger a
+ * hot memory remove on s390. So there is nothing that needs to be
+ * implemented.
+ */
+ return -EBUSY;
+}
+#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 82cc576..1057940 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -558,4 +558,21 @@ int memory_add_physaddr_to_nid(u64 addr)
EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
#endif
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+ struct zone *zone;
+ int ret;
+
+ zone = page_zone(pfn_to_page(start_pfn));
+ ret = __remove_pages(zone, start_pfn, nr_pages);
+ if (unlikely(ret))
+ pr_warn("%s: Failed, __remove_pages() == %d\n", __func__,
+ ret);
+
+ return ret;
+}
+#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index ef29d6c..2749515 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -935,6 +935,14 @@ int remove_memory(u64 start, u64 size)
{
return -EINVAL;
}
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+ /* TODO */
+ return -EBUSY;
+}
+#endif
#endif
struct kmem_cache *pgd_cache;
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 745d66b..3166e78 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -836,6 +836,18 @@ int arch_add_memory(int nid, u64 start, u64 size)
return __add_pages(nid, zone, start_pfn, nr_pages);
}
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+ struct zone *zone;
+
+ zone = page_zone(pfn_to_page(start_pfn));
+ return __remove_pages(zone, start_pfn, nr_pages);
+}
+#endif
#endif
/*
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e779e0b..f78509c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -682,6 +682,21 @@ int arch_add_memory(int nid, u64 start, u64 size)
}
EXPORT_SYMBOL_GPL(arch_add_memory);
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int __ref arch_remove_memory(u64 start, u64 size)
+{
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+ struct zone *zone;
+ int ret;
+
+ zone = page_zone(pfn_to_page(start_pfn));
+ ret = __remove_pages(zone, start_pfn, nr_pages);
+ WARN_ON_ONCE(ret);
+
+ return ret;
+}
+#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
static struct kcore_list kcore_vsyscall;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 8dd0950..31a563b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -96,6 +96,7 @@ extern void __online_page_free(struct page *page);
#ifdef CONFIG_MEMORY_HOTREMOVE
extern bool is_pageblock_removable_nolock(struct page *page);
+extern int arch_remove_memory(u64 start, u64 size);
#endif /* CONFIG_MEMORY_HOTREMOVE */
/* reasonably generic interface to expand the physical pages in a zone */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1f5b5bb..2c5d734 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1487,6 +1487,8 @@ repeat:
/* remove memmap entry */
firmware_map_remove(start, start + size, "System RAM");
+ arch_remove_memory(start, size);
+
unlock_memory_hotplug();
return 0;
--
1.7.1
^ permalink raw reply related
* [PATCH v5 09/14] memory-hotplug: remove page table of x86_64 architecture
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
This patch searches a page table about the removed memory, and clear
page table for x86_64 architecture.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/mm/init_64.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index b30df3c..4b160d8 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -979,6 +979,15 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct)
flush_tlb_all();
}
+void __meminit
+kernel_physical_mapping_remove(unsigned long start, unsigned long end)
+{
+ start = (unsigned long)__va(start);
+ end = (unsigned long)__va(end);
+
+ remove_pagetable(start, end, true);
+}
+
#ifdef CONFIG_MEMORY_HOTREMOVE
int __ref arch_remove_memory(u64 start, u64 size)
{
@@ -988,6 +997,7 @@ int __ref arch_remove_memory(u64 start, u64 size)
int ret;
zone = page_zone(pfn_to_page(start_pfn));
+ kernel_physical_mapping_remove(start, start + size);
ret = __remove_pages(zone, start_pfn, nr_pages);
WARN_ON_ONCE(ret);
--
1.7.1
^ permalink raw reply related
* [PATCH v5 03/14] memory-hotplug: remove redundant codes
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
From: Wen Congyang <wency@cn.fujitsu.com>
offlining memory blocks and checking whether memory blocks are offlined
are very similar. This patch introduces a new function to remove
redundant codes.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
mm/memory_hotplug.c | 101 ++++++++++++++++++++++++++++-----------------------
1 files changed, 55 insertions(+), 46 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d43d97b..dbb04d8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1381,20 +1381,14 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
}
-int remove_memory(u64 start, u64 size)
+static int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
+ void *arg, int (*func)(struct memory_block *, void *))
{
struct memory_block *mem = NULL;
struct mem_section *section;
- unsigned long start_pfn, end_pfn;
unsigned long pfn, section_nr;
int ret;
- int return_on_error = 0;
- int retry = 0;
-
- start_pfn = PFN_DOWN(start);
- end_pfn = start_pfn + PFN_DOWN(size);
-repeat:
for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
section_nr = pfn_to_section_nr(pfn);
if (!present_section_nr(section_nr))
@@ -1411,22 +1405,61 @@ repeat:
if (!mem)
continue;
- ret = offline_memory_block(mem);
+ ret = func(mem, arg);
if (ret) {
- if (return_on_error) {
- kobject_put(&mem->dev.kobj);
- return ret;
- } else {
- retry = 1;
- }
+ kobject_put(&mem->dev.kobj);
+ return ret;
}
}
if (mem)
kobject_put(&mem->dev.kobj);
- if (retry) {
- return_on_error = 1;
+ return 0;
+}
+
+static int offline_memory_block_cb(struct memory_block *mem, void *arg)
+{
+ int *ret = arg;
+ int error = offline_memory_block(mem);
+
+ if (error != 0 && *ret == 0)
+ *ret = error;
+
+ return 0;
+}
+
+static int is_memblock_offlined_cb(struct memory_block *mem, void *arg)
+{
+ int ret = !is_memblock_offlined(mem);
+
+ if (unlikely(ret))
+ pr_warn("removing memory fails, because memory "
+ "[%#010llx-%#010llx] is onlined\n",
+ PFN_PHYS(section_nr_to_pfn(mem->start_section_nr)),
+ PFN_PHYS(section_nr_to_pfn(mem->end_section_nr + 1))-1);
+
+ return ret;
+}
+
+int remove_memory(u64 start, u64 size)
+{
+ unsigned long start_pfn, end_pfn;
+ int ret = 0;
+ int retry = 1;
+
+ start_pfn = PFN_DOWN(start);
+ end_pfn = start_pfn + PFN_DOWN(size);
+
+repeat:
+ walk_memory_range(start_pfn, end_pfn, &ret,
+ offline_memory_block_cb);
+ if (ret) {
+ if (!retry)
+ return ret;
+
+ retry = 0;
+ ret = 0;
goto repeat;
}
@@ -1444,37 +1477,13 @@ repeat:
* memory blocks are offlined.
*/
- for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
- section_nr = pfn_to_section_nr(pfn);
- if (!present_section_nr(section_nr))
- continue;
-
- section = __nr_to_section(section_nr);
- /* same memblock? */
- if (mem)
- if ((section_nr >= mem->start_section_nr) &&
- (section_nr <= mem->end_section_nr))
- continue;
-
- mem = find_memory_block_hinted(section, mem);
- if (!mem)
- continue;
-
- ret = is_memblock_offlined(mem);
- if (!ret) {
- pr_warn("removing memory fails, because memory "
- "[%#010llx-%#010llx] is onlined\n",
- PFN_PHYS(section_nr_to_pfn(mem->start_section_nr)),
- PFN_PHYS(section_nr_to_pfn(mem->end_section_nr + 1)) - 1);
-
- kobject_put(&mem->dev.kobj);
- unlock_memory_hotplug();
- return ret;
- }
+ ret = walk_memory_range(start_pfn, end_pfn, NULL,
+ is_memblock_offlined_cb);
+ if (ret) {
+ unlock_memory_hotplug();
+ return ret;
}
- if (mem)
- kobject_put(&mem->dev.kobj);
unlock_memory_hotplug();
return 0;
--
1.7.1
^ permalink raw reply related
* [PATCH v5 02/14] memory-hotplug: check whether all memory blocks are offlined or not when removing memory
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
We remove the memory like this:
1. lock memory hotplug
2. offline a memory block
3. unlock memory hotplug
4. repeat 1-3 to offline all memory blocks
5. lock memory hotplug
6. remove memory(TODO)
7. unlock memory hotplug
All memory blocks must be offlined before removing memory. But we don't hold
the lock in the whole operation. So we should check whether all memory blocks
are offlined before step6. Otherwise, kernel maybe panicked.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
drivers/base/memory.c | 6 +++++
include/linux/memory_hotplug.h | 1 +
mm/memory_hotplug.c | 47 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 54 insertions(+), 0 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 987604d..8300a18 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -693,6 +693,12 @@ int offline_memory_block(struct memory_block *mem)
return ret;
}
+/* return true if the memory block is offlined, otherwise, return false */
+bool is_memblock_offlined(struct memory_block *mem)
+{
+ return mem->state == MEM_OFFLINE;
+}
+
/*
* Initialize the sysfs support for memory devices...
*/
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 4a45c4e..8dd0950 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -247,6 +247,7 @@ extern int add_memory(int nid, u64 start, u64 size);
extern int arch_add_memory(int nid, u64 start, u64 size);
extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
extern int offline_memory_block(struct memory_block *mem);
+extern bool is_memblock_offlined(struct memory_block *mem);
extern int remove_memory(u64 start, u64 size);
extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 62e04c9..d43d97b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1430,6 +1430,53 @@ repeat:
goto repeat;
}
+ lock_memory_hotplug();
+
+ /*
+ * we have offlined all memory blocks like this:
+ * 1. lock memory hotplug
+ * 2. offline a memory block
+ * 3. unlock memory hotplug
+ *
+ * repeat step1-3 to offline the memory block. All memory blocks
+ * must be offlined before removing memory. But we don't hold the
+ * lock in the whole operation. So we should check whether all
+ * memory blocks are offlined.
+ */
+
+ for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+ section_nr = pfn_to_section_nr(pfn);
+ if (!present_section_nr(section_nr))
+ continue;
+
+ section = __nr_to_section(section_nr);
+ /* same memblock? */
+ if (mem)
+ if ((section_nr >= mem->start_section_nr) &&
+ (section_nr <= mem->end_section_nr))
+ continue;
+
+ mem = find_memory_block_hinted(section, mem);
+ if (!mem)
+ continue;
+
+ ret = is_memblock_offlined(mem);
+ if (!ret) {
+ pr_warn("removing memory fails, because memory "
+ "[%#010llx-%#010llx] is onlined\n",
+ PFN_PHYS(section_nr_to_pfn(mem->start_section_nr)),
+ PFN_PHYS(section_nr_to_pfn(mem->end_section_nr + 1)) - 1);
+
+ kobject_put(&mem->dev.kobj);
+ unlock_memory_hotplug();
+ return ret;
+ }
+ }
+
+ if (mem)
+ kobject_put(&mem->dev.kobj);
+ unlock_memory_hotplug();
+
return 0;
}
#else
--
1.7.1
^ permalink raw reply related
* [PATCH v5 01/14] memory-hotplug: try to offline the memory twice to avoid dependence
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
From: Wen Congyang <wency@cn.fujitsu.com>
memory can't be offlined when CONFIG_MEMCG is selected.
For example: there is a memory device on node 1. The address range
is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
and memory11 under the directory /sys/devices/system/memory/.
If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
when we online pages. When we online memory8, the memory stored page cgroup
is not provided by this memory device. But when we online memory9, the memory
stored page cgroup may be provided by memory8. So we can't offline memory8
now. We should offline the memory in the reversed order.
When the memory device is hotremoved, we will auto offline memory provided
by this memory device. But we don't know which memory is onlined first, so
offlining memory may fail. In such case, iterate twice to offline the memory.
1st iterate: offline every non primary memory block.
2nd iterate: offline primary (i.e. first added) memory block.
This idea is suggested by KOSAKI Motohiro.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
mm/memory_hotplug.c | 16 ++++++++++++++--
1 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d04ed87..62e04c9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1388,10 +1388,13 @@ int remove_memory(u64 start, u64 size)
unsigned long start_pfn, end_pfn;
unsigned long pfn, section_nr;
int ret;
+ int return_on_error = 0;
+ int retry = 0;
start_pfn = PFN_DOWN(start);
end_pfn = start_pfn + PFN_DOWN(size);
+repeat:
for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
section_nr = pfn_to_section_nr(pfn);
if (!present_section_nr(section_nr))
@@ -1410,14 +1413,23 @@ int remove_memory(u64 start, u64 size)
ret = offline_memory_block(mem);
if (ret) {
- kobject_put(&mem->dev.kobj);
- return ret;
+ if (return_on_error) {
+ kobject_put(&mem->dev.kobj);
+ return ret;
+ } else {
+ retry = 1;
+ }
}
}
if (mem)
kobject_put(&mem->dev.kobj);
+ if (retry) {
+ return_on_error = 1;
+ goto repeat;
+ }
+
return 0;
}
#else
--
1.7.1
^ permalink raw reply related
* [PATCH v5 04/14] memory-hotplug: remove /sys/firmware/memmap/X sysfs
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
In-Reply-To: <1356350964-13437-1-git-send-email-tangchen@cn.fujitsu.com>
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
sysfs files are created. But there is no code to remove these files. The patch
implements the function to remove them.
Note: The code does not free firmware_map_entry which is allocated by bootmem.
So the patch makes memory leak. But I think the memory leak size is
very samll. And it does not affect the system.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
drivers/firmware/memmap.c | 98 +++++++++++++++++++++++++++++++++++++++++-
include/linux/firmware-map.h | 6 +++
mm/memory_hotplug.c | 5 ++-
3 files changed, 106 insertions(+), 3 deletions(-)
diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
index 90723e6..49be12a 100644
--- a/drivers/firmware/memmap.c
+++ b/drivers/firmware/memmap.c
@@ -21,6 +21,7 @@
#include <linux/types.h>
#include <linux/bootmem.h>
#include <linux/slab.h>
+#include <linux/mm.h>
/*
* Data types ------------------------------------------------------------------
@@ -41,6 +42,7 @@ struct firmware_map_entry {
const char *type; /* type of the memory range */
struct list_head list; /* entry for the linked list */
struct kobject kobj; /* kobject for each entry */
+ unsigned int bootmem:1; /* allocated from bootmem */
};
/*
@@ -79,7 +81,26 @@ static const struct sysfs_ops memmap_attr_ops = {
.show = memmap_attr_show,
};
+
+static inline struct firmware_map_entry *
+to_memmap_entry(struct kobject *kobj)
+{
+ return container_of(kobj, struct firmware_map_entry, kobj);
+}
+
+static void release_firmware_map_entry(struct kobject *kobj)
+{
+ struct firmware_map_entry *entry = to_memmap_entry(kobj);
+
+ if (entry->bootmem)
+ /* There is no way to free memory allocated from bootmem */
+ return;
+
+ kfree(entry);
+}
+
static struct kobj_type memmap_ktype = {
+ .release = release_firmware_map_entry,
.sysfs_ops = &memmap_attr_ops,
.default_attrs = def_attrs,
};
@@ -94,6 +115,7 @@ static struct kobj_type memmap_ktype = {
* in firmware initialisation code in one single thread of execution.
*/
static LIST_HEAD(map_entries);
+static DEFINE_SPINLOCK(map_entries_lock);
/**
* firmware_map_add_entry() - Does the real work to add a firmware memmap entry.
@@ -118,11 +140,25 @@ static int firmware_map_add_entry(u64 start, u64 end,
INIT_LIST_HEAD(&entry->list);
kobject_init(&entry->kobj, &memmap_ktype);
+ spin_lock(&map_entries_lock);
list_add_tail(&entry->list, &map_entries);
+ spin_unlock(&map_entries_lock);
return 0;
}
+/**
+ * firmware_map_remove_entry() - Does the real work to remove a firmware
+ * memmap entry.
+ * @entry: removed entry.
+ **/
+static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
+{
+ spin_lock(&map_entries_lock);
+ list_del(&entry->list);
+ spin_unlock(&map_entries_lock);
+}
+
/*
* Add memmap entry on sysfs
*/
@@ -144,6 +180,35 @@ static int add_sysfs_fw_map_entry(struct firmware_map_entry *entry)
return 0;
}
+/*
+ * Remove memmap entry on sysfs
+ */
+static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
+{
+ kobject_put(&entry->kobj);
+}
+
+/*
+ * Search memmap entry
+ */
+
+static struct firmware_map_entry * __meminit
+firmware_map_find_entry(u64 start, u64 end, const char *type)
+{
+ struct firmware_map_entry *entry;
+
+ spin_lock(&map_entries_lock);
+ list_for_each_entry(entry, &map_entries, list)
+ if ((entry->start == start) && (entry->end == end) &&
+ (!strcmp(entry->type, type))) {
+ spin_unlock(&map_entries_lock);
+ return entry;
+ }
+
+ spin_unlock(&map_entries_lock);
+ return NULL;
+}
+
/**
* firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
* memory hotplug.
@@ -193,9 +258,36 @@ int __init firmware_map_add_early(u64 start, u64 end, const char *type)
if (WARN_ON(!entry))
return -ENOMEM;
+ entry->bootmem = 1;
return firmware_map_add_entry(start, end, type, entry);
}
+/**
+ * firmware_map_remove() - remove a firmware mapping entry
+ * @start: Start of the memory range.
+ * @end: End of the memory range.
+ * @type: Type of the memory range.
+ *
+ * removes a firmware mapping entry.
+ *
+ * Returns 0 on success, or -EINVAL if no entry.
+ **/
+int __meminit firmware_map_remove(u64 start, u64 end, const char *type)
+{
+ struct firmware_map_entry *entry;
+
+ entry = firmware_map_find_entry(start, end - 1, type);
+ if (!entry)
+ return -EINVAL;
+
+ firmware_map_remove_entry(entry);
+
+ /* remove the memmap entry */
+ remove_sysfs_fw_map_entry(entry);
+
+ return 0;
+}
+
/*
* Sysfs functions -------------------------------------------------------------
*/
@@ -217,8 +309,10 @@ static ssize_t type_show(struct firmware_map_entry *entry, char *buf)
return snprintf(buf, PAGE_SIZE, "%s\n", entry->type);
}
-#define to_memmap_attr(_attr) container_of(_attr, struct memmap_attribute, attr)
-#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
+static inline struct memmap_attribute *to_memmap_attr(struct attribute *attr)
+{
+ return container_of(attr, struct memmap_attribute, attr);
+}
static ssize_t memmap_attr_show(struct kobject *kobj,
struct attribute *attr, char *buf)
diff --git a/include/linux/firmware-map.h b/include/linux/firmware-map.h
index 43fe52f..71d4fa7 100644
--- a/include/linux/firmware-map.h
+++ b/include/linux/firmware-map.h
@@ -25,6 +25,7 @@
int firmware_map_add_early(u64 start, u64 end, const char *type);
int firmware_map_add_hotplug(u64 start, u64 end, const char *type);
+int firmware_map_remove(u64 start, u64 end, const char *type);
#else /* CONFIG_FIRMWARE_MEMMAP */
@@ -38,6 +39,11 @@ static inline int firmware_map_add_hotplug(u64 start, u64 end, const char *type)
return 0;
}
+static inline int firmware_map_remove(u64 start, u64 end, const char *type)
+{
+ return 0;
+}
+
#endif /* CONFIG_FIRMWARE_MEMMAP */
#endif /* _LINUX_FIRMWARE_MAP_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index dbb04d8..1f5b5bb 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1442,7 +1442,7 @@ static int is_memblock_offlined_cb(struct memory_block *mem, void *arg)
return ret;
}
-int remove_memory(u64 start, u64 size)
+int __ref remove_memory(u64 start, u64 size)
{
unsigned long start_pfn, end_pfn;
int ret = 0;
@@ -1484,6 +1484,9 @@ repeat:
return ret;
}
+ /* remove memmap entry */
+ firmware_map_remove(start, start + size, "System RAM");
+
unlock_memory_hotplug();
return 0;
--
1.7.1
^ permalink raw reply related
* [PATCH v5 00/14] memory-hotplug: hot-remove physical memory
From: Tang Chen @ 2012-12-24 12:09 UTC (permalink / raw)
To: akpm, rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
Hi Andrew,
Here is the physical memory hot-remove patch-set based on 3.8rc-1.
This patch-set aims to implement physical memory hot-removing.
The patches can free/remove the following things:
- /sys/firmware/memmap/X/{end, start, type} : [PATCH 4/14]
- memmap of sparse-vmemmap : [PATCH 6,7,8,10/14]
- page table of removed memory : [RFC PATCH 7,8,10/14]
- node and related sysfs files : [RFC PATCH 13-14/14]
How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug
3. hotplug the memory device(it depends on your hardware)
You will see the memory device under the directory /sys/bus/acpi/devices/.
Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
You can write online/offline to /sys/devices/system/memory/memoryX/state to
online/offline pages provided by this memory device
5. hotremove the memory device
You can hotremove the memory device by the hardware, or writing 1 to
/sys/bus/acpi/devices/PNP0C80:XX/eject.
Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.
Changelogs from v4 to v5:
Patch7: new patch, move pgdat_resize_lock into sparse_remove_one_section() to
avoid disabling irq because we need flush tlb when free pagetables.
Patch8: new patch, pick up some common APIs that are used to free direct mapping
and vmemmap pagetables.
Patch9: free direct mapping pagetables on x86_64 arch.
Patch10: free vmemmap pagetables.
Patch11: since freeing memmap with vmemmap has been implemented, the config
macro CONFIG_SPARSEMEM_VMEMMAP when defining __remove_section() is
no longer needed.
Patch13: no need to modify acpi_memory_disable_device() since it was removed,
and add nid parameter when calling remove_memory().
Changelogs from v3 to v4:
Patch7: remove unused codes.
Patch8: fix nr_pages that is passed to free_map_bootmem()
Changelogs from v2 to v3:
Patch9: call sync_global_pgds() if pgd is changed
Patch10: fix a problem int the patch
Changelogs from v1 to v2:
Patch1: new patch, offline memory twice. 1st iterate: offline every non primary
memory block. 2nd iterate: offline primary (i.e. first added) memory
block.
Patch3: new patch, no logical change, just remove reduntant codes.
Patch9: merge the patch from wujianguo into this patch. flush tlb on all cpu
after the pagetable is changed.
Patch12: new patch, free node_data when a node is offlined.
Tang Chen (5):
memory-hotplug: move pgdat_resize_lock into
sparse_remove_one_section()
memory-hotplug: remove page table of x86_64 architecture
memory-hotplug: remove memmap of sparse-vmemmap
memory-hotplug: Integrated __remove_section() of
CONFIG_SPARSEMEM_VMEMMAP.
memory-hotplug: remove sysfs file of node
Wen Congyang (5):
memory-hotplug: try to offline the memory twice to avoid dependence
memory-hotplug: remove redundant codes
memory-hotplug: introduce new function arch_remove_memory() for
removing page table depends on architecture
memory-hotplug: Common APIs to support page tables hot-remove
memory-hotplug: free node_data when a node is offlined
Yasuaki Ishimatsu (4):
memory-hotplug: check whether all memory blocks are offlined or not
when removing memory
memory-hotplug: remove /sys/firmware/memmap/X sysfs
memory-hotplug: implement register_page_bootmem_info_section of
sparse-vmemmap
memory-hotplug: memory_hotplug: clear zone when removing the memory
arch/arm64/mm/mmu.c | 3 +
arch/ia64/mm/discontig.c | 10 +
arch/ia64/mm/init.c | 18 ++
arch/powerpc/mm/init_64.c | 10 +
arch/powerpc/mm/mem.c | 12 +
arch/s390/mm/init.c | 12 +
arch/s390/mm/vmem.c | 10 +
arch/sh/mm/init.c | 17 ++
arch/sparc/mm/init_64.c | 10 +
arch/tile/mm/init.c | 8 +
arch/x86/include/asm/pgtable_types.h | 1 +
arch/x86/mm/init_32.c | 12 +
arch/x86/mm/init_64.c | 382 ++++++++++++++++++++++++++++++++
arch/x86/mm/pageattr.c | 47 ++--
drivers/acpi/acpi_memhotplug.c | 8 +-
drivers/base/memory.c | 6 +
drivers/firmware/memmap.c | 98 ++++++++-
include/linux/bootmem.h | 1 +
include/linux/firmware-map.h | 6 +
include/linux/memory_hotplug.h | 15 +-
include/linux/mm.h | 4 +-
mm/memory_hotplug.c | 406 ++++++++++++++++++++++++++++++++--
mm/sparse.c | 8 +-
23 files changed, 1043 insertions(+), 61 deletions(-)
^ permalink raw reply
* Re: [REGRESSION][3.8.-rc1][ INFO: possible circular locking dependency detected ]
From: Christian Kujau @ 2012-12-23 21:34 UTC (permalink / raw)
To: Maciej Rutecki; +Cc: Cong Wang, linuxppc-dev, LKML, zhong
In-Reply-To: <201212221628.26640.maciej.rutecki@gmail.com>
On Sat, 22 Dec 2012 at 16:28, Maciej Rutecki wrote:
> Got during suspend to disk:
I got a similar message on a powerpc G4 system, right after bootup (no
suspend involved):
http://nerdbynature.de/bits/3.8.0-rc1/
[ 97.803049] ======================================================
[ 97.803051] [ INFO: possible circular locking dependency detected ]
[ 97.803059] 3.8.0-rc1-dirty #2 Not tainted
[ 97.803060] -------------------------------------------------------
[ 97.803066] kworker/0:1/235 is trying to acquire lock:
[ 97.803097] ((fb_notifier_list).rwsem){.+.+.+}, at: [<c00606a0>] __blocking_notifier_call_chain+0x44/0x88
[ 97.803099]
[ 97.803099] but task is already holding lock:
[ 97.803110] (console_lock){+.+.+.}, at: [<c03b9fd0>] console_callback+0x20/0x194
[ 97.803112]
[ 97.803112] which lock already depends on the new lock.
...and on it goes. Please see the URL above for the whole dmesg and
.config.
@Li Zhong: I have applied your fix for the "MAX_STACK_TRACE_ENTRIES too
low" warning[0] to 3.8-rc1 (hence the -dirty flag), but in the
backtrace "ret_from_kernel_thread" shows up again. FWIW, your
patch helped to make the "MAX_STACK_TRACE_ENTRIES too low"
warning go away in 3.7.0-rc7 and it did not re-appear ever
since.
Thanks,
Christian.
[0] http://lkml.indiana.edu/hypermail/linux/kernel/1211.3/01917.html
> [ 269.784867] [ INFO: possible circular locking dependency detected ]
> [ 269.784869] 3.8.0-rc1 #1 Not tainted
> [ 269.784870] -------------------------------------------------------
> [ 269.784871] kworker/u:3/56 is trying to acquire lock:
> [ 269.784878] ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffffff81062a1d>]
> __blocking_notifier_call_chain+0x49/0x80
> [ 269.784879]
> [ 269.784879] but task is already holding lock:
> [ 269.784884] (console_lock){+.+.+.}, at: [<ffffffff812ee4ce>]
> i915_drm_freeze+0x9e/0xbb
> [ 269.784884]
> [ 269.784884] which lock already depends on the new lock.
> [ 269.784884]
> [ 269.784885]
> [ 269.784885] the existing dependency chain (in reverse order) is:
> [ 269.784887]
> [ 269.784887] -> #1 (console_lock){+.+.+.}:
> [ 269.784890] [<ffffffff810890e4>] lock_acquire+0x95/0x105
> [ 269.784893] [<ffffffff810405a1>] console_lock+0x59/0x5b
> [ 269.784897] [<ffffffff812ba125>] register_con_driver+0x36/0x128
> [ 269.784899] [<ffffffff812bb27e>] take_over_console+0x1e/0x45
> [ 269.784903] [<ffffffff81257a04>] fbcon_takeover+0x56/0x98
> [ 269.784906] [<ffffffff8125b857>] fbcon_event_notify+0x2c1/0x5ea
> [ 269.784909] [<ffffffff8149a211>] notifier_call_chain+0x67/0x92
> [ 269.784911] [<ffffffff81062a33>] __blocking_notifier_call_chain+0x5f/0x80
> [ 269.784912] [<ffffffff81062a63>] blocking_notifier_call_chain+0xf/0x11
> [ 269.784915] [<ffffffff8124e85e>] fb_notifier_call_chain+0x16/0x18
> [ 269.784917] [<ffffffff812505d7>] register_framebuffer+0x20a/0x26e
> [ 269.784920] [<ffffffff812d3ca0>]
> drm_fb_helper_single_fb_probe+0x1ce/0x297
> [ 269.784922] [<ffffffff812d3f40>] drm_fb_helper_initial_config+0x1d7/0x1ef
> [ 269.784924] [<ffffffff8132cee2>] intel_fbdev_init+0x6f/0x82
> [ 269.784927] [<ffffffff812f22f6>] i915_driver_load+0xa9e/0xc78
> [ 269.784929] [<ffffffff812e020c>] drm_get_pci_dev+0x165/0x26d
> [ 269.784931] [<ffffffff812ee8da>] i915_pci_probe+0x60/0x69
> [ 269.784933] [<ffffffff8123fe8e>] local_pci_probe+0x39/0x61
> [ 269.784935] [<ffffffff812400f5>] pci_device_probe+0xba/0xe0
> [ 269.784938] [<ffffffff8133d3b6>] driver_probe_device+0x99/0x1c4
> [ 269.784940] [<ffffffff8133d52f>] __driver_attach+0x4e/0x6f
> [ 269.784942] [<ffffffff8133bae1>] bus_for_each_dev+0x52/0x84
> [ 269.784944] [<ffffffff8133cec6>] driver_attach+0x19/0x1b
> [ 269.784946] [<ffffffff8133cb65>] bus_add_driver+0xdf/0x203
> [ 269.784948] [<ffffffff8133dad3>] driver_register+0x8e/0x114
> [ 269.784952] [<ffffffff8123f581>] __pci_register_driver+0x5d/0x62
> [ 269.784953] [<ffffffff812e0395>] drm_pci_init+0x81/0xe6
> [ 269.784957] [<ffffffff81af7612>] i915_init+0x66/0x68
> [ 269.784959] [<ffffffff810020b4>] do_one_initcall+0x7a/0x136
> [ 269.784962] [<ffffffff8147ceaa>] kernel_init+0x141/0x296
> [ 269.784964] [<ffffffff8149c7bc>] ret_from_fork+0x7c/0xb0
> [ 269.784966]
> [ 269.784966] -> #0 ((fb_notifier_list).rwsem){.+.+.+}:
> [ 269.784967] [<ffffffff81088955>] __lock_acquire+0xa7e/0xddd
> [ 269.784969] [<ffffffff810890e4>] lock_acquire+0x95/0x105
> [ 269.784971] [<ffffffff81495092>] down_read+0x34/0x43
> [ 269.784973] [<ffffffff81062a1d>] __blocking_notifier_call_chain+0x49/0x80
> [ 269.784975] [<ffffffff81062a63>] blocking_notifier_call_chain+0xf/0x11
> [ 269.784977] [<ffffffff8124e85e>] fb_notifier_call_chain+0x16/0x18
> [ 269.784979] [<ffffffff8124ec47>] fb_set_suspend+0x22/0x4d
> [ 269.784981] [<ffffffff8132cfe3>] intel_fbdev_set_suspend+0x20/0x22
> [ 269.784983] [<ffffffff812ee4db>] i915_drm_freeze+0xab/0xbb
> [ 269.784985] [<ffffffff812eea82>] i915_pm_freeze+0x3d/0x41
> [ 269.784987] [<ffffffff8123f759>] pci_pm_freeze+0x65/0x8d
> [ 269.784990] [<ffffffff81342f20>] dpm_run_callback.isra.3+0x27/0x56
> [ 269.784993] [<ffffffff81343085>] __device_suspend+0x136/0x1b1
> [ 269.784995] [<ffffffff8134311a>] async_suspend+0x1a/0x58
> [ 269.784997] [<ffffffff81063a6b>] async_run_entry_fn+0xa4/0x17c
> [ 269.785000] [<ffffffff81058df2>] process_one_work+0x1cf/0x38e
> [ 269.785002] [<ffffffff81059290>] worker_thread+0x12e/0x1cc
> [ 269.785004] [<ffffffff8105d416>] kthread+0xac/0xb4
> [ 269.785006] [<ffffffff8149c7bc>] ret_from_fork+0x7c/0xb0
> [ 269.785006]
> [ 269.785006] other info that might help us debug this:
> [ 269.785006]
> [ 269.785007] Possible unsafe locking scenario:
> [ 269.785007]
> [ 269.785008] CPU0 CPU1
> [ 269.785008] ---- ----
> [ 269.785009] lock(console_lock);
> [ 269.785010] lock((fb_notifier_list).rwsem);
> [ 269.785012] lock(console_lock);
> [ 269.785013] lock((fb_notifier_list).rwsem);
> [ 269.785013]
> [ 269.785013] *** DEADLOCK ***
> [ 269.785013]
> [ 269.785014] 4 locks held by kworker/u:3/56:
> [ 269.785018] #0: (events_unbound){.+.+.+}, at: [<ffffffff81058d77>]
> process_one_work+0x154/0x38e
> [ 269.785021] #1: ((&entry->work)){+.+.+.}, at: [<ffffffff81058d77>]
> process_one_work+0x154/0x38e
> [ 269.785024] #2: (&__lockdep_no_validate__){......}, at: [<ffffffff81342d85>]
> device_lock+0xf/0x11
> [ 269.785027] #3: (console_lock){+.+.+.}, at: [<ffffffff812ee4ce>]
> i915_drm_freeze+0x9e/0xbb
> [ 269.785028]
> [ 269.785028] stack backtrace:
> [ 269.785029] Pid: 56, comm: kworker/u:3 Not tainted 3.8.0-rc1 #1
> [ 269.785030] Call Trace:
> [ 269.785035] [<ffffffff8148fcb5>] print_circular_bug+0x1f8/0x209
> [ 269.785036] [<ffffffff81088955>] __lock_acquire+0xa7e/0xddd
> [ 269.785038] [<ffffffff810890e4>] lock_acquire+0x95/0x105
> [ 269.785040] [<ffffffff81062a1d>] ? __blocking_notifier_call_chain+0x49/0x80
> [ 269.785042] [<ffffffff81495092>] down_read+0x34/0x43
> [ 269.785044] [<ffffffff81062a1d>] ? __blocking_notifier_call_chain+0x49/0x80
> [ 269.785046] [<ffffffff81062a1d>] __blocking_notifier_call_chain+0x49/0x80
> [ 269.785047] [<ffffffff81062a63>] blocking_notifier_call_chain+0xf/0x11
> [ 269.785050] [<ffffffff8124e85e>] fb_notifier_call_chain+0x16/0x18
> [ 269.785052] [<ffffffff8124ec47>] fb_set_suspend+0x22/0x4d
> [ 269.785054] [<ffffffff8132cfe3>] intel_fbdev_set_suspend+0x20/0x22
> [ 269.785055] [<ffffffff812ee4db>] i915_drm_freeze+0xab/0xbb
> [ 269.785057] [<ffffffff812eea82>] i915_pm_freeze+0x3d/0x41
> [ 269.785060] [<ffffffff8123f759>] pci_pm_freeze+0x65/0x8d
> [ 269.785062] [<ffffffff8123f6f4>] ? pci_pm_poweroff+0x9c/0x9c
> [ 269.785064] [<ffffffff81342f20>] dpm_run_callback.isra.3+0x27/0x56
> [ 269.785066] [<ffffffff81343085>] __device_suspend+0x136/0x1b1
> [ 269.785068] [<ffffffff81089563>] ? trace_hardirqs_on_caller+0x117/0x173
> [ 269.785070] [<ffffffff8134311a>] async_suspend+0x1a/0x58
> [ 269.785072] [<ffffffff81063a6b>] async_run_entry_fn+0xa4/0x17c
> [ 269.785074] [<ffffffff81058df2>] process_one_work+0x1cf/0x38e
> [ 269.785076] [<ffffffff81058d77>] ? process_one_work+0x154/0x38e
> [ 269.785078] [<ffffffff810639c7>] ? async_schedule+0x12/0x12
> [ 269.785080] [<ffffffff8105679f>] ? spin_lock_irq+0x9/0xb
> [ 269.785082] [<ffffffff81059290>] worker_thread+0x12e/0x1cc
> [ 269.785084] [<ffffffff81059162>] ? rescuer_thread+0x187/0x187
> [ 269.785085] [<ffffffff8105d416>] kthread+0xac/0xb4
> [ 269.785088] [<ffffffff8105d36a>] ? __kthread_parkme+0x60/0x60
> [ 269.785090] [<ffffffff8149c7bc>] ret_from_fork+0x7c/0xb0
> [ 269.785091] [<ffffffff8105d36a>] ? __kthread_parkme+0x60/0x60
>
>
> Config:
> http://mrutecki.pl/download/kernel/3.8.0-rc1/s2disk/config-3.8.0-rc1
>
> dmesg:
> http://mrutecki.pl/download/kernel/3.8.0-rc1/s2disk/dmesg-3.8.0-rc1.txt
>
>
> Found similar report:
> http://marc.info/?l=linux-kernel&m=135546308908700&w=2
>
> Regards
>
> --
> Maciej Rutecki
> http://www.mrutecki.pl
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
BOFH excuse #435:
Internet shut down due to maintenance
^ permalink raw reply
* Re: [PATCH 1/2] powerpc/perf: Fix finding overflowed PMC in interrupt
From: Sukadev Bhattiprolu @ 2012-12-22 1:07 UTC (permalink / raw)
To: Michael Neuling
Cc: Linux PPC dev, Paul Mackerras, Anton Blanchard,
Benjamin Herrenschmidt
In-Reply-To: <1352164118-19450-1-git-send-email-mikey@neuling.org>
Ben
Will these two patches be pushed upstream or are they waiting for
review/test ?
They fix a hang that we get with some perf events.
Thanks,
Sukadev
Michael Neuling [mikey@neuling.org] wrote:
| If a PMC is about to overflow on a counter that's on an active perf event
| (ie. less than 256 from the end) and a _different_ PMC overflows just at this
| time (a PMC that's not on an active perf event), we currently mark the event as
| found, but in reality it's not as it's likely the other PMC that caused the
| IRQ. Since we mark it as found the second catch all for overflows doesn't run,
| and we don't reset the overflowing PMC ever. Hence we keep hitting that same
| PMC IRQ over and over and don't reset the actual overflowing counter.
|
| This is a rewrite of the perf interrupt handler for book3s to get around this.
| We now check to see if any of the PMCs have actually overflowed (ie >=
| 0x80000000). If yes, record it for active counters and just reset it for
| inactive counters. If it's not overflowed, then we check to see if it's one of
| the buggy power7 counters and if it is, record it and continue. If none of the
| PMCs match this, then we make note that we couldn't find the PMC that caused
| the IRQ.
|
| Signed-off-by: Michael Neuling <mikey@neuling.org>
| Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
| cc: Paul Mackerras <paulus@samba.org>
| cc: Anton Blanchard <anton@samba.org>
| cc: Linux PPC dev <linuxppc-dev@ozlabs.org>
| ---
| arch/powerpc/perf/core-book3s.c | 83 +++++++++++++++++++++++++--------------
| 1 file changed, 54 insertions(+), 29 deletions(-)
|
| diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
| index aa2465e..3575def 100644
| --- a/arch/powerpc/perf/core-book3s.c
| +++ b/arch/powerpc/perf/core-book3s.c
| @@ -1412,11 +1412,8 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs)
| return regs->nip;
| }
|
| -static bool pmc_overflow(unsigned long val)
| +static bool pmc_overflow_power7(unsigned long val)
| {
| - if ((int)val < 0)
| - return true;
| -
| /*
| * Events on POWER7 can roll back if a speculative event doesn't
| * eventually complete. Unfortunately in some rare cases they will
| @@ -1428,7 +1425,15 @@ static bool pmc_overflow(unsigned long val)
| * PMCs because a user might set a period of less than 256 and we
| * don't want to mistakenly reset them.
| */
| - if (pvr_version_is(PVR_POWER7) && ((0x80000000 - val) <= 256))
| + if ((0x80000000 - val) <= 256)
| + return true;
| +
| + return false;
| +}
| +
| +static bool pmc_overflow(unsigned long val)
| +{
| + if ((int)val < 0)
| return true;
|
| return false;
| @@ -1439,11 +1444,11 @@ static bool pmc_overflow(unsigned long val)
| */
| static void perf_event_interrupt(struct pt_regs *regs)
| {
| - int i;
| + int i, j;
| struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
| struct perf_event *event;
| - unsigned long val;
| - int found = 0;
| + unsigned long val[8];
| + int found, active;
| int nmi;
|
| if (cpuhw->n_limited)
| @@ -1458,33 +1463,53 @@ static void perf_event_interrupt(struct pt_regs *regs)
| else
| irq_enter();
|
| - for (i = 0; i < cpuhw->n_events; ++i) {
| - event = cpuhw->event[i];
| - if (!event->hw.idx || is_limited_pmc(event->hw.idx))
| + /* Read all the PMCs since we'll need them a bunch of times */
| + for (i = 0; i < ppmu->n_counter; ++i)
| + val[i] = read_pmc(i + 1);
| +
| + /* Try to find what caused the IRQ */
| + found = 0;
| + for (i = 0; i < ppmu->n_counter; ++i) {
| + if (!pmc_overflow(val[i]))
| continue;
| - val = read_pmc(event->hw.idx);
| - if ((int)val < 0) {
| - /* event has overflowed */
| - found = 1;
| - record_and_restart(event, val, regs);
| + if (is_limited_pmc(i + 1))
| + continue; /* these won't generate IRQs */
| + /*
| + * We've found one that's overflowed. For active
| + * counters we need to log this. For inactive
| + * counters, we need to reset it anyway
| + */
| + found = 1;
| + active = 0;
| + for (j = 0; j < cpuhw->n_events; ++j) {
| + event = cpuhw->event[j];
| + if (event->hw.idx == (i + 1)) {
| + active = 1;
| + record_and_restart(event, val[i], regs);
| + break;
| + }
| }
| + if (!active)
| + /* reset non active counters that have overflowed */
| + write_pmc(i + 1, 0);
| }
| -
| - /*
| - * In case we didn't find and reset the event that caused
| - * the interrupt, scan all events and reset any that are
| - * negative, to avoid getting continual interrupts.
| - * Any that we processed in the previous loop will not be negative.
| - */
| - if (!found) {
| - for (i = 0; i < ppmu->n_counter; ++i) {
| - if (is_limited_pmc(i + 1))
| + if (!found && pvr_version_is(PVR_POWER7)) {
| + /* check active counters for special buggy p7 overflow */
| + for (i = 0; i < cpuhw->n_events; ++i) {
| + event = cpuhw->event[i];
| + if (!event->hw.idx || is_limited_pmc(event->hw.idx))
| continue;
| - val = read_pmc(i + 1);
| - if (pmc_overflow(val))
| - write_pmc(i + 1, 0);
| + if (pmc_overflow_power7(val[event->hw.idx - 1])) {
| + /* event has overflowed in a buggy way*/
| + found = 1;
| + record_and_restart(event,
| + val[event->hw.idx - 1],
| + regs);
| + }
| }
| }
| + if (!found)
| + printk(KERN_WARNING "Can't find PMC that caused IRQ\n");
|
| /*
| * Reset MMCR0 to its normal value. This will set PMXE and
| --
| 1.7.9.5
^ permalink raw reply
* Re: [v1][PATCH 3/6] book3e/kgdb: update thread's dbcr0
From: tiejun.chen @ 2012-12-21 2:02 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, linux-kernel, jason.wessel
In-Reply-To: <A47C9236-0744-45BC-B25A-3DDEB05CD8A1@kernel.crashing.org>
On 12/21/2012 02:41 AM, Kumar Gala wrote:
>
> On Dec 20, 2012, at 3:08 AM, Tiejun Chen wrote:
>
>> gdb always need to generate a single step properly to invoke
>> a kgdb state. But with lazy interrupt, book3e can't always
>> trigger a debug exception with a single step since the current
>> is blocked for handling those pending exception, then we miss
>> that expected dbcr configuration at last to generate a debug
>> exception.
>>
>> So here we also update thread's dbcr0 to make sure the current
>> can go back with that missed dbcr0 configuration.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
>> ---
>> arch/powerpc/kernel/kgdb.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
>> index c470a40..516b44b 100644
>> --- a/arch/powerpc/kernel/kgdb.c
>> +++ b/arch/powerpc/kernel/kgdb.c
>> @@ -426,8 +426,18 @@ int kgdb_arch_handle_exception(int vector, int signo, int err_code,
>> /* set the trace bit if we're stepping */
>> if (remcom_in_buffer[0] == 's') {
>> #ifdef CONFIG_PPC_ADV_DEBUG_REGS
>> +#ifdef CONFIG_PPC_BOOK3E
>
> Should this really be CONFIG_PPC64 or CONFIG_PPC_BOOK3E_64?
Yes, I think CONFIG_PPC_BOOK3E_64 is better currently. Because I didn't validate
this on power arch since I have no a real machine :)
>
>> + /* With lazy interrut we have to update thread dbcr0 here
>> + * to make sure we can set debug properly at last to invoke
>> + * kgdb again to work well.
>> + */
>> + current->thread.dbcr0 =
>> + mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM;
>> + mtspr(SPRN_DBCR0, current->thread.dbcr0);
>> +#else
>> mtspr(SPRN_DBCR0,
>> mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM);
>> +#endif
>
> Can we code this so we don't need the #else?
I guess you're saying something like this:
---
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index c470a40..b2df42a 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -409,7 +409,7 @@ int kgdb_arch_handle_exception(int vector, int signo, int
err_code,
struct pt_regs *linux_regs)
{
char *ptr = &remcom_in_buffer[1];
- unsigned long addr;
+ unsigned long addr, dbcr0;
switch (remcom_in_buffer[0]) {
/*
@@ -426,8 +426,15 @@ int kgdb_arch_handle_exception(int vector, int signo, int
err_code,
/* set the trace bit if we're stepping */
if (remcom_in_buffer[0] == 's') {
#ifdef CONFIG_PPC_ADV_DEBUG_REGS
- mtspr(SPRN_DBCR0,
- mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM);
+ dbcr0 = mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM;
+ mtspr(SPRN_DBCR0, dbcr0);
+#ifdef CONFIG_PPC_BOOK3E_64
+ /* With lazy interrut we have to update thread dbcr0 here
+ * to make sure we can set debug properly at last to invoke
+ * kgdb again to work well.
+ */
+ current->thread.dbcr0 = dbcr0;
+#endif
linux_regs->msr |= MSR_DE;
#else
linux_regs->msr |= MSR_SE;
--
1.7.9.5
Tiejun
^ permalink raw reply related
* RE: [PATCH 0/4] iommu/fsl: Freescale PAMU driver and IOMMU API implementation.
From: Sethi Varun-B16395 @ 2012-12-21 1:46 UTC (permalink / raw)
To: Joerg Roedel
Cc: Wood Scott-B07421, joerg.roedel@amd.com, Tabi Timur-B04825,
linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
Sethi Varun-B16395, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1355493114-21776-1-git-send-email-Varun.Sethi@freescale.com>
ping!!
> -----Original Message-----
> From: Sethi Varun-B16395
> Sent: Friday, December 14, 2012 7:22 PM
> To: joerg.roedel@amd.com; iommu@lists.linux-foundation.org; linuxppc-
> dev@lists.ozlabs.org; linux-kernel@vger.kernel.org; Tabi Timur-B04825;
> Wood Scott-B07421
> Cc: Sethi Varun-B16395
> Subject: [PATCH 0/4] iommu/fsl: Freescale PAMU driver and IOMMU API
> implementation.
>=20
> This patchset provides the Freescale PAMU (Peripheral Access Management
> Unit) driver and the corresponding IOMMU API implementation. PAMU is the
> IOMMU present on Freescale QorIQ platforms. PAMU can authorize memory
> access, remap the memory address, and remap the I/O transaction type.
>=20
> This set consists of the following patches:
> 1. Addition of new field in the device (powerpc) archdata structure for
> storing iommu domain information
> pointer. This pointer is stored when the device is attached to a
> particular iommu domain.
> 2. Add PAMU bypass enable register to the ccsr_guts structure.
> 3. Addition of domain attributes required by the PAMU driver IOMMU API.
> 4. PAMU driver and IOMMU API implementation.
>=20
> This patch set is based on the next branch of the iommu git tree
> maintained by Joerg.
>=20
> Varun Sethi (4):
> store iommu domain info in device arch data.
> add pamu bypass enable register to guts.
> Add iommu attributes for PAMU
> FSL PAMU driver.
>=20
> arch/powerpc/include/asm/device.h | 4 +
> arch/powerpc/include/asm/fsl_guts.h | 4 +-
> drivers/iommu/Kconfig | 8 +
> drivers/iommu/Makefile | 1 +
> drivers/iommu/fsl_pamu.c | 1152
> +++++++++++++++++++++++++++++++++++
> drivers/iommu/fsl_pamu.h | 398 ++++++++++++
> drivers/iommu/fsl_pamu_domain.c | 1033
> +++++++++++++++++++++++++++++++
> drivers/iommu/fsl_pamu_domain.h | 96 +++
> include/linux/iommu.h | 49 ++
> 9 files changed, 2744 insertions(+), 1 deletions(-) create mode 100644
> drivers/iommu/fsl_pamu.c create mode 100644 drivers/iommu/fsl_pamu.h
> create mode 100644 drivers/iommu/fsl_pamu_domain.c create mode 100644
> drivers/iommu/fsl_pamu_domain.h
>=20
> --
> 1.7.4.1
^ permalink raw reply
* [PATCH 7/7] powerpc: Add the DAWR support to the set_break()
From: Michael Neuling @ 2012-12-21 0:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Ian Munsie
In-Reply-To: <1356048405-20560-1-git-send-email-mikey@neuling.org>
This adds DAWR supoprt to the set_break().
It does both bare metal and PAPR versions of setting the DAWR.
There is still some work we can do to make full use of the watchpoint but that
will come later.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/include/asm/machdep.h | 4 ++++
arch/powerpc/kernel/process.c | 23 +++++++++++++++++++++++
arch/powerpc/platforms/pseries/setup.c | 12 ++++++++++++
3 files changed, 39 insertions(+)
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 19d9d96..3d6b410 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -180,6 +180,10 @@ struct machdep_calls {
int (*set_dabr)(unsigned long dabr,
unsigned long dabrx);
+ /* Set DAWR for this platform, leave empty for default implemenation */
+ int (*set_dawr)(unsigned long dawr,
+ unsigned long dawrx);
+
#ifdef CONFIG_PPC32 /* XXX for now */
/* A general init function, called by ppc_init in init/main.c.
May be NULL. */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 5a64028..fec5f46 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -412,10 +412,33 @@ static inline int set_dabr(struct arch_hw_breakpoint *brk)
return 0;
}
+static inline int set_dawr(struct arch_hw_breakpoint *brk)
+{
+ unsigned long dawr, dawrx;
+
+ dawr = brk->address;
+
+ dawrx = (brk->type & (HW_BRK_TYPE_READ | HW_BRK_TYPE_WRITE)) \
+ << (63 - 58); //* read/write bits */
+ dawrx |= ((brk->type & (HW_BRK_TYPE_TRANSLATE)) >> 2) \
+ << (63 - 59); //* translate */
+ dawrx |= (brk->type & (HW_BRK_TYPE_PRIV_ALL)) \
+ >> 3; //* PRIM bits */
+
+ if (ppc_md.set_dawr)
+ return ppc_md.set_dawr(dawr, dawrx);
+ mtspr(SPRN_DAWR, dawr);
+ mtspr(SPRN_DAWRX, dawrx);
+ return 0;
+}
+
int set_break(struct arch_hw_breakpoint *brk)
{
__get_cpu_var(current_brk) = *brk;
+ if (cpu_has_feature(CPU_FTR_DAWR))
+ return set_dawr(brk);
+
return set_dabr(brk);
}
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 1890730..b1f60d1 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -65,6 +65,7 @@
#include <asm/smp.h>
#include <asm/firmware.h>
#include <asm/eeh.h>
+#include <asm/reg.h>
#include "plpar_wrappers.h"
#include "pseries.h"
@@ -500,6 +501,14 @@ static int pseries_set_xdabr(unsigned long dabr, unsigned long dabrx)
return plpar_hcall_norets(H_SET_XDABR, dabr, dabrx);
}
+static int pseries_set_dawr(unsigned long dawr, unsigned long dawrx)
+{
+ /* PAPR says we can't set HYP */
+ dawrx &= ~DAWRX_HYP;
+
+ return plapr_set_watchpoint0(dawr, dawrx);
+}
+
#define CMO_CHARACTERISTICS_TOKEN 44
#define CMO_MAXLENGTH 1026
@@ -606,6 +615,9 @@ static void __init pSeries_init_early(void)
else if (firmware_has_feature(FW_FEATURE_DABR))
ppc_md.set_dabr = pseries_set_dabr;
+ if (firmware_has_feature(FW_FEATURE_SET_MODE))
+ ppc_md.set_dawr = pseries_set_dawr;
+
pSeries_cmo_feature_init();
iommu_init_early_pSeries();
--
1.7.10.4
^ permalink raw reply related
* [PATCH 6/7] powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers
From: Michael Neuling @ 2012-12-21 0:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Ian Munsie
In-Reply-To: <1356048405-20560-1-git-send-email-mikey@neuling.org>
This is a rewrite so that we don't assume we are using the DABR throughout the
code. We now use the arch_hw_breakpoint to store the breakpoint in a generic
manner in the thread_struct, rather than storing the raw DABR value.
The ptrace GET/SET_DEBUGREG interface currently passes the raw DABR in from
userspace. We keep this functionality, so that future changes (like the POWER8
DAWR), will still fake the DABR to userspace.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/include/asm/debug.h | 15 ++++---
arch/powerpc/include/asm/hw_breakpoint.h | 33 +++++++++++---
arch/powerpc/include/asm/processor.h | 4 +-
arch/powerpc/include/asm/reg.h | 3 --
arch/powerpc/kernel/exceptions-64s.S | 2 +-
arch/powerpc/kernel/hw_breakpoint.c | 72 +++++++++++++-----------------
arch/powerpc/kernel/kgdb.c | 10 ++---
arch/powerpc/kernel/process.c | 67 ++++++++++++++++++---------
arch/powerpc/kernel/ptrace.c | 60 +++++++++++++------------
arch/powerpc/kernel/ptrace32.c | 8 +++-
arch/powerpc/kernel/signal.c | 5 ++-
arch/powerpc/kernel/traps.c | 4 +-
arch/powerpc/mm/fault.c | 4 +-
arch/powerpc/xmon/xmon.c | 21 ++++++---
14 files changed, 180 insertions(+), 128 deletions(-)
diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index 32de257..8d85ffb 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -4,6 +4,8 @@
#ifndef _ASM_POWERPC_DEBUG_H
#define _ASM_POWERPC_DEBUG_H
+#include <asm/hw_breakpoint.h>
+
struct pt_regs;
extern struct dentry *powerpc_debugfs_root;
@@ -15,7 +17,7 @@ extern int (*__debugger_ipi)(struct pt_regs *regs);
extern int (*__debugger_bpt)(struct pt_regs *regs);
extern int (*__debugger_sstep)(struct pt_regs *regs);
extern int (*__debugger_iabr_match)(struct pt_regs *regs);
-extern int (*__debugger_dabr_match)(struct pt_regs *regs);
+extern int (*__debugger_break_match)(struct pt_regs *regs);
extern int (*__debugger_fault_handler)(struct pt_regs *regs);
#define DEBUGGER_BOILERPLATE(__NAME) \
@@ -31,7 +33,7 @@ DEBUGGER_BOILERPLATE(debugger_ipi)
DEBUGGER_BOILERPLATE(debugger_bpt)
DEBUGGER_BOILERPLATE(debugger_sstep)
DEBUGGER_BOILERPLATE(debugger_iabr_match)
-DEBUGGER_BOILERPLATE(debugger_dabr_match)
+DEBUGGER_BOILERPLATE(debugger_break_match)
DEBUGGER_BOILERPLATE(debugger_fault_handler)
#else
@@ -40,17 +42,18 @@ static inline int debugger_ipi(struct pt_regs *regs) { return 0; }
static inline int debugger_bpt(struct pt_regs *regs) { return 0; }
static inline int debugger_sstep(struct pt_regs *regs) { return 0; }
static inline int debugger_iabr_match(struct pt_regs *regs) { return 0; }
-static inline int debugger_dabr_match(struct pt_regs *regs) { return 0; }
+static inline int debugger_break_match(struct pt_regs *regs) { return 0; }
static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
#endif
-extern int set_dabr(unsigned long dabr, unsigned long dabrx);
+int set_break(struct arch_hw_breakpoint *brk);
#ifdef CONFIG_PPC_ADV_DEBUG_REGS
extern void do_send_trap(struct pt_regs *regs, unsigned long address,
unsigned long error_code, int signal_code, int brkpt);
#else
-extern void do_dabr(struct pt_regs *regs, unsigned long address,
- unsigned long error_code);
+
+extern void do_break(struct pt_regs *regs, unsigned long address,
+ unsigned long error_code);
#endif
#endif /* _ASM_POWERPC_DEBUG_H */
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h
index 4234245..2c91faf 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -24,16 +24,30 @@
#define _PPC_BOOK3S_64_HW_BREAKPOINT_H
#ifdef __KERNEL__
-#ifdef CONFIG_HAVE_HW_BREAKPOINT
-
struct arch_hw_breakpoint {
unsigned long address;
- unsigned long dabrx;
- int type;
- u8 len; /* length of the target data symbol */
- bool extraneous_interrupt;
+ u16 type;
+ u16 len; /* length of the target data symbol */
};
+/* Note: Don't change the the first 6 bits below as they are in the same order
+ * as the dabr and dabrx.
+ */
+#define HW_BRK_TYPE_READ 0x01
+#define HW_BRK_TYPE_WRITE 0x02
+#define HW_BRK_TYPE_TRANSLATE 0x04
+#define HW_BRK_TYPE_USER 0x08
+#define HW_BRK_TYPE_KERNEL 0x10
+#define HW_BRK_TYPE_HYP 0x20
+#define HW_BRK_TYPE_EXTRANEOUS_IRQ 0x80
+
+/* bits that overlap with the bottom 3 bits of the dabr */
+#define HW_BRK_TYPE_RDWR (HW_BRK_TYPE_READ | HW_BRK_TYPE_WRITE)
+#define HW_BRK_TYPE_DABR (HW_BRK_TYPE_RDWR | HW_BRK_TYPE_TRANSLATE)
+#define HW_BRK_TYPE_PRIV_ALL (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \
+ HW_BRK_TYPE_HYP)
+
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
#include <linux/kdebug.h>
#include <asm/reg.h>
#include <asm/debug.h>
@@ -62,7 +76,12 @@ extern void ptrace_triggered(struct perf_event *bp,
struct perf_sample_data *data, struct pt_regs *regs);
static inline void hw_breakpoint_disable(void)
{
- set_dabr(0, 0);
+ struct arch_hw_breakpoint brk;
+
+ brk.address = 0;
+ brk.type = 0;
+ brk.len = 0;
+ set_break(&brk);
}
extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index cbbd82d..5a142c5 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -23,6 +23,7 @@
#include <linux/cache.h>
#include <asm/ptrace.h>
#include <asm/types.h>
+#include <asm/hw_breakpoint.h>
/* We do _not_ want to define new machine types at all, those must die
* in favor of using the device-tree
@@ -216,8 +217,7 @@ struct thread_struct {
struct perf_event *last_hit_ubp;
#endif /* CONFIG_HAVE_HW_BREAKPOINT */
#endif
- unsigned long dabr; /* Data address breakpoint register */
- unsigned long dabrx; /* ... extension */
+ struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpoint */
unsigned long trap_nr; /* last trap # on this thread */
#ifdef CONFIG_ALTIVEC
/* Complete AltiVec register set */
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 6cae32d..e9f5c05 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -228,9 +228,6 @@
#define DAWRX_KERNEL (1UL << 1)
#define DAWRX_HYP (1UL << 2)
#define SPRN_DABR 0x3F5 /* Data Address Breakpoint Register */
-#define DABR_TRANSLATION (1UL << 2)
-#define DABR_DATA_WRITE (1UL << 1)
-#define DABR_DATA_READ (1UL << 0)
#define SPRN_DABR2 0x13D /* e300 */
#define SPRN_DABRX 0x3F7 /* Data Address Breakpoint Register Extension */
#define DABRX_USER (1UL << 0)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 220b896..8ca5006 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1310,7 +1310,7 @@ handle_dabr_fault:
ld r4,_DAR(r1)
ld r5,_DSISR(r1)
addi r3,r1,STACK_FRAME_OVERHEAD
- bl .do_dabr
+ bl .do_break
12: b .ret_from_except_lite
diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c
index a89cae4..c7483d0 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -73,7 +73,7 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
* If so, DABR will be populated in single_step_dabr_instruction().
*/
if (current->thread.last_hit_ubp != bp)
- set_dabr(info->address | info->type | DABR_TRANSLATION, info->dabrx);
+ set_break(info);
return 0;
}
@@ -97,7 +97,7 @@ void arch_uninstall_hw_breakpoint(struct perf_event *bp)
}
*slot = NULL;
- set_dabr(0, 0);
+ hw_breakpoint_disable();
}
/*
@@ -127,19 +127,13 @@ int arch_check_bp_in_kernelspace(struct perf_event *bp)
int arch_bp_generic_fields(int type, int *gen_bp_type)
{
- switch (type) {
- case DABR_DATA_READ:
- *gen_bp_type = HW_BREAKPOINT_R;
- break;
- case DABR_DATA_WRITE:
- *gen_bp_type = HW_BREAKPOINT_W;
- break;
- case (DABR_DATA_WRITE | DABR_DATA_READ):
- *gen_bp_type = (HW_BREAKPOINT_W | HW_BREAKPOINT_R);
- break;
- default:
+ *gen_bp_type = 0;
+ if (type & HW_BRK_TYPE_READ)
+ *gen_bp_type |= HW_BREAKPOINT_R;
+ if (type & HW_BRK_TYPE_WRITE)
+ *gen_bp_type |= HW_BREAKPOINT_W;
+ if (*gen_bp_type == 0)
return -EINVAL;
- }
return 0;
}
@@ -154,29 +148,22 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
if (!bp)
return ret;
- switch (bp->attr.bp_type) {
- case HW_BREAKPOINT_R:
- info->type = DABR_DATA_READ;
- break;
- case HW_BREAKPOINT_W:
- info->type = DABR_DATA_WRITE;
- break;
- case HW_BREAKPOINT_R | HW_BREAKPOINT_W:
- info->type = (DABR_DATA_READ | DABR_DATA_WRITE);
- break;
- default:
+ info->type = HW_BRK_TYPE_TRANSLATE;
+ if (bp->attr.bp_type & HW_BREAKPOINT_R)
+ info->type |= HW_BRK_TYPE_READ;
+ if (bp->attr.bp_type & HW_BREAKPOINT_W)
+ info->type |= HW_BRK_TYPE_WRITE;
+ if (info->type == HW_BRK_TYPE_TRANSLATE)
+ /* must set alteast read or write */
return ret;
- }
-
+ if (!(bp->attr.exclude_user))
+ info->type |= HW_BRK_TYPE_USER;
+ if (!(bp->attr.exclude_kernel))
+ info->type |= HW_BRK_TYPE_KERNEL;
+ if (!(bp->attr.exclude_hv))
+ info->type |= HW_BRK_TYPE_HYP;
info->address = bp->attr.bp_addr;
info->len = bp->attr.bp_len;
- info->dabrx = DABRX_ALL;
- if (bp->attr.exclude_user)
- info->dabrx &= ~DABRX_USER;
- if (bp->attr.exclude_kernel)
- info->dabrx &= ~DABRX_KERNEL;
- if (bp->attr.exclude_hv)
- info->dabrx &= ~DABRX_HYP;
/*
* Since breakpoint length can be a maximum of HW_BREAKPOINT_LEN(8)
@@ -204,7 +191,7 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs)
info = counter_arch_bp(tsk->thread.last_hit_ubp);
regs->msr &= ~MSR_SE;
- set_dabr(info->address | info->type | DABR_TRANSLATION, info->dabrx);
+ set_break(info);
tsk->thread.last_hit_ubp = NULL;
}
@@ -222,7 +209,7 @@ int __kprobes hw_breakpoint_handler(struct die_args *args)
unsigned long dar = regs->dar;
/* Disable breakpoints during exception handling */
- set_dabr(0, 0);
+ hw_breakpoint_disable();
/*
* The counter may be concurrently released but that can only
@@ -255,8 +242,9 @@ int __kprobes hw_breakpoint_handler(struct die_args *args)
* we still need to single-step the instruction, but we don't
* generate an event.
*/
- info->extraneous_interrupt = !((bp->attr.bp_addr <= dar) &&
- (dar - bp->attr.bp_addr < bp->attr.bp_len));
+ if (!((bp->attr.bp_addr <= dar) &&
+ (dar - bp->attr.bp_addr < bp->attr.bp_len)))
+ info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
/* Do not emulate user-space instructions, instead single-step them */
if (user_mode(regs)) {
@@ -285,10 +273,10 @@ int __kprobes hw_breakpoint_handler(struct die_args *args)
* As a policy, the callback is invoked in a 'trigger-after-execute'
* fashion
*/
- if (!info->extraneous_interrupt)
+ if (!(info->type & HW_BRK_TYPE_EXTRANEOUS_IRQ))
perf_bp_event(bp, regs);
- set_dabr(info->address | info->type | DABR_TRANSLATION, info->dabrx);
+ set_break(info);
out:
rcu_read_unlock();
return rc;
@@ -317,10 +305,10 @@ int __kprobes single_step_dabr_instruction(struct die_args *args)
* We shall invoke the user-defined callback function in the single
* stepping handler to confirm to 'trigger-after-execute' semantics
*/
- if (!info->extraneous_interrupt)
+ if (!(info->type & HW_BRK_TYPE_EXTRANEOUS_IRQ))
perf_bp_event(bp, regs);
- set_dabr(info->address | info->type | DABR_TRANSLATION, info->dabrx);
+ set_break(info);
current->thread.last_hit_ubp = NULL;
/*
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index c470a40..a05f0e4 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -198,7 +198,7 @@ static int kgdb_iabr_match(struct pt_regs *regs)
return 1;
}
-static int kgdb_dabr_match(struct pt_regs *regs)
+static int kgdb_break_match(struct pt_regs *regs)
{
if (user_mode(regs))
return 0;
@@ -458,7 +458,7 @@ static void *old__debugger;
static void *old__debugger_bpt;
static void *old__debugger_sstep;
static void *old__debugger_iabr_match;
-static void *old__debugger_dabr_match;
+static void *old__debugger_break_match;
static void *old__debugger_fault_handler;
int kgdb_arch_init(void)
@@ -468,7 +468,7 @@ int kgdb_arch_init(void)
old__debugger_bpt = __debugger_bpt;
old__debugger_sstep = __debugger_sstep;
old__debugger_iabr_match = __debugger_iabr_match;
- old__debugger_dabr_match = __debugger_dabr_match;
+ old__debugger_break_match = __debugger_break_match;
old__debugger_fault_handler = __debugger_fault_handler;
__debugger_ipi = kgdb_call_nmi_hook;
@@ -476,7 +476,7 @@ int kgdb_arch_init(void)
__debugger_bpt = kgdb_handle_breakpoint;
__debugger_sstep = kgdb_singlestep;
__debugger_iabr_match = kgdb_iabr_match;
- __debugger_dabr_match = kgdb_dabr_match;
+ __debugger_break_match = kgdb_break_match;
__debugger_fault_handler = kgdb_not_implemented;
return 0;
@@ -489,6 +489,6 @@ void kgdb_arch_exit(void)
__debugger_bpt = old__debugger_bpt;
__debugger_sstep = old__debugger_sstep;
__debugger_iabr_match = old__debugger_iabr_match;
- __debugger_dabr_match = old__debugger_dabr_match;
+ __debugger_breakx_match = old__debugger_break_match;
__debugger_fault_handler = old__debugger_fault_handler;
}
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9d591de..5a64028 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -282,7 +282,7 @@ void do_send_trap(struct pt_regs *regs, unsigned long address,
force_sig_info(SIGTRAP, &info, current);
}
#else /* !CONFIG_PPC_ADV_DEBUG_REGS */
-void do_dabr(struct pt_regs *regs, unsigned long address,
+void do_break (struct pt_regs *regs, unsigned long address,
unsigned long error_code)
{
siginfo_t info;
@@ -292,11 +292,11 @@ void do_dabr(struct pt_regs *regs, unsigned long address,
11, SIGSEGV) == NOTIFY_STOP)
return;
- if (debugger_dabr_match(regs))
+ if (debugger_break_match(regs))
return;
- /* Clear the DABR */
- set_dabr(0, 0);
+ /* Clear the breakpoint */
+ hw_breakpoint_disable();
/* Deliver the signal to userspace */
info.si_signo = SIGTRAP;
@@ -307,7 +307,7 @@ void do_dabr(struct pt_regs *regs, unsigned long address,
}
#endif /* CONFIG_PPC_ADV_DEBUG_REGS */
-static DEFINE_PER_CPU(unsigned long, current_dabr);
+static DEFINE_PER_CPU(struct arch_hw_breakpoint, current_brk);
#ifdef CONFIG_PPC_ADV_DEBUG_REGS
/*
@@ -375,35 +375,50 @@ static void switch_booke_debug_regs(struct thread_struct *new_thread)
#ifndef CONFIG_HAVE_HW_BREAKPOINT
static void set_debug_reg_defaults(struct thread_struct *thread)
{
- if (thread->dabr) {
- thread->dabr = 0;
- thread->dabrx = 0;
- set_dabr(0, 0);
- }
+ thread->hw_brk.address = 0;
+ thread->hw_brk.type = 0;
+ set_break(&thread->hw_brk);
}
#endif /* !CONFIG_HAVE_HW_BREAKPOINT */
#endif /* CONFIG_PPC_ADV_DEBUG_REGS */
-int set_dabr(unsigned long dabr, unsigned long dabrx)
-{
- __get_cpu_var(current_dabr) = dabr;
-
- if (ppc_md.set_dabr)
- return ppc_md.set_dabr(dabr, dabrx);
-
- /* XXX should we have a CPU_FTR_HAS_DABR ? */
#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+static inline void __set_dabr(unsigned long dabr, unsigned long dabrx)
+{
mtspr(SPRN_DAC1, dabr);
#ifdef CONFIG_PPC_47x
isync();
#endif
+}
#elif defined(CONFIG_PPC_BOOK3S)
+static inline void __set_dabr(unsigned long dabr, unsigned long dabrx)
+{
mtspr(SPRN_DABR, dabr);
mtspr(SPRN_DABRX, dabrx);
+}
#endif
+
+static inline int set_dabr(struct arch_hw_breakpoint *brk)
+{
+ unsigned long dabr, dabrx;
+
+ dabr = brk->address | (brk->type & HW_BRK_TYPE_DABR);
+ dabrx = ((brk->type >> 3) & 0x7);
+
+ if (ppc_md.set_dabr)
+ return ppc_md.set_dabr(dabr, dabrx);
+
+ __set_dabr(dabr, dabrx);
return 0;
}
+int set_break(struct arch_hw_breakpoint *brk)
+{
+ __get_cpu_var(current_brk) = *brk;
+
+ return set_dabr(brk);
+}
+
#ifdef CONFIG_PPC64
DEFINE_PER_CPU(struct cpu_usage, cpu_usage_array);
#endif
@@ -522,6 +537,18 @@ static inline void __switch_to_tm(struct task_struct *prev)
#define __switch_to_tm(prev)
#endif /* CONFIG_TRANSACTIONAL_MEM */
+static inline bool hw_brk_match(struct arch_hw_breakpoint *a,
+ struct arch_hw_breakpoint *b)
+{
+ if (a->address != b->address)
+ return false;
+ if (a->type != b->type)
+ return false;
+ if (a->len != b->len)
+ return false;
+ return true;
+}
+
struct task_struct *__switch_to(struct task_struct *prev,
struct task_struct *new)
{
@@ -608,8 +635,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
* schedule DABR
*/
#ifndef CONFIG_HAVE_HW_BREAKPOINT
- if (unlikely(__get_cpu_var(current_dabr) != new->thread.dabr))
- set_dabr(new->thread.dabr, new->thread.dabrx);
+ if (unlikely(hw_brk_match(&__get_cpu_var(current_brk), &new->thread.hw_brk)))
+ set_break(&new->thread.hw_brk);
#endif /* CONFIG_HAVE_HW_BREAKPOINT */
#endif
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index c497000..d4afccc 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -905,6 +905,9 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned long addr,
struct perf_event *bp;
struct perf_event_attr attr;
#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+#ifndef CONFIG_PPC_ADV_DEBUG_REGS
+ struct arch_hw_breakpoint hw_brk;
+#endif
/* For ppc64 we support one DABR and no IABR's at the moment (ppc64).
* For embedded processors we support one DAC and no IAC's at the
@@ -931,14 +934,17 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned long addr,
*/
/* Ensure breakpoint translation bit is set */
- if (data && !(data & DABR_TRANSLATION))
+ if (data && !(data & HW_BRK_TYPE_TRANSLATE))
return -EIO;
+ hw_brk.address = data & (~HW_BRK_TYPE_DABR);
+ hw_brk.type = (data & HW_BRK_TYPE_DABR) | HW_BRK_TYPE_PRIV_ALL;
+ hw_brk.len = 8;
#ifdef CONFIG_HAVE_HW_BREAKPOINT
if (ptrace_get_breakpoints(task) < 0)
return -ESRCH;
bp = thread->ptrace_bps[0];
- if ((!data) || !(data & (DABR_DATA_WRITE | DABR_DATA_READ))) {
+ if ((!data) || !(hw_brk.type & HW_BRK_TYPE_RDWR)) {
if (bp) {
unregister_hw_breakpoint(bp);
thread->ptrace_bps[0] = NULL;
@@ -948,10 +954,8 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned long addr,
}
if (bp) {
attr = bp->attr;
- attr.bp_addr = data & ~HW_BREAKPOINT_ALIGN;
- arch_bp_generic_fields(data &
- (DABR_DATA_WRITE | DABR_DATA_READ),
- &attr.bp_type);
+ attr.bp_addr = hw_brk.address;
+ arch_bp_generic_fields(hw_brk.type, &attr.bp_type);
/* Enable breakpoint */
attr.disabled = false;
@@ -963,16 +967,15 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned long addr,
}
thread->ptrace_bps[0] = bp;
ptrace_put_breakpoints(task);
- thread->dabr = data;
- thread->dabrx = DABRX_ALL;
+ thread->hw_brk = hw_brk;
return 0;
}
/* Create a new breakpoint request if one doesn't exist already */
hw_breakpoint_init(&attr);
- attr.bp_addr = data & ~HW_BREAKPOINT_ALIGN;
- arch_bp_generic_fields(data & (DABR_DATA_WRITE | DABR_DATA_READ),
- &attr.bp_type);
+ attr.bp_addr = hw_brk.address;
+ arch_bp_generic_fields(hw_brk.type,
+ &attr.bp_type);
thread->ptrace_bps[0] = bp = register_user_hw_breakpoint(&attr,
ptrace_triggered, NULL, task);
@@ -985,10 +988,7 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned long addr,
ptrace_put_breakpoints(task);
#endif /* CONFIG_HAVE_HW_BREAKPOINT */
-
- /* Move contents to the DABR register */
- task->thread.dabr = data;
- task->thread.dabrx = DABRX_ALL;
+ task->thread.hw_brk = hw_brk;
#else /* CONFIG_PPC_ADV_DEBUG_REGS */
/* As described above, it was assumed 3 bits were passed with the data
* address, but we will assume only the mode bits will be passed
@@ -1349,7 +1349,7 @@ static long ppc_set_hwdebug(struct task_struct *child,
struct perf_event_attr attr;
#endif /* CONFIG_HAVE_HW_BREAKPOINT */
#ifndef CONFIG_PPC_ADV_DEBUG_REGS
- unsigned long dabr;
+ struct arch_hw_breakpoint brk;
#endif
if (bp_info->version != 1)
@@ -1397,12 +1397,12 @@ static long ppc_set_hwdebug(struct task_struct *child,
if ((unsigned long)bp_info->addr >= TASK_SIZE)
return -EIO;
- dabr = (unsigned long)bp_info->addr & ~7UL;
- dabr |= DABR_TRANSLATION;
+ brk.address = bp_info->addr & ~7UL;
+ brk.type = HW_BRK_TYPE_TRANSLATE;
if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ)
- dabr |= DABR_DATA_READ;
+ brk.type |= HW_BRK_TYPE_READ;
if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE)
- dabr |= DABR_DATA_WRITE;
+ brk.type |= HW_BRK_TYPE_WRITE;
#ifdef CONFIG_HAVE_HW_BREAKPOINT
if (ptrace_get_breakpoints(child) < 0)
return -ESRCH;
@@ -1427,8 +1427,7 @@ static long ppc_set_hwdebug(struct task_struct *child,
hw_breakpoint_init(&attr);
attr.bp_addr = (unsigned long)bp_info->addr & ~HW_BREAKPOINT_ALIGN;
attr.bp_len = len;
- arch_bp_generic_fields(dabr & (DABR_DATA_WRITE | DABR_DATA_READ),
- &attr.bp_type);
+ arch_bp_generic_fields(brk.type, &attr.bp_type);
thread->ptrace_bps[0] = bp = register_user_hw_breakpoint(&attr,
ptrace_triggered, NULL, child);
@@ -1445,11 +1444,10 @@ static long ppc_set_hwdebug(struct task_struct *child,
if (bp_info->addr_mode != PPC_BREAKPOINT_MODE_EXACT)
return -EINVAL;
- if (child->thread.dabr)
+ if (child->thread.hw_brk.address)
return -ENOSPC;
- child->thread.dabr = dabr;
- child->thread.dabrx = DABRX_ALL;
+ child->thread.hw_brk = brk;
return 1;
#endif /* !CONFIG_PPC_ADV_DEBUG_DVCS */
@@ -1495,10 +1493,11 @@ static long ppc_del_hwdebug(struct task_struct *child, long data)
ptrace_put_breakpoints(child);
return ret;
#else /* CONFIG_HAVE_HW_BREAKPOINT */
- if (child->thread.dabr == 0)
+ if (child->thread.hw_brk.address == 0)
return -ENOENT;
- child->thread.dabr = 0;
+ child->thread.hw_brk.address = 0;
+ child->thread.hw_brk.type = 0;
#endif /* CONFIG_HAVE_HW_BREAKPOINT */
return 0;
@@ -1642,6 +1641,9 @@ long arch_ptrace(struct task_struct *child, long request,
}
case PTRACE_GET_DEBUGREG: {
+#ifndef CONFIG_PPC_ADV_DEBUG_REGS
+ unsigned long dabr_fake;
+#endif
ret = -EINVAL;
/* We only support one DABR and no IABRS at the moment */
if (addr > 0)
@@ -1649,7 +1651,9 @@ long arch_ptrace(struct task_struct *child, long request,
#ifdef CONFIG_PPC_ADV_DEBUG_REGS
ret = put_user(child->thread.dac1, datalp);
#else
- ret = put_user(child->thread.dabr, datalp);
+ dabr_fake = ((child->thread.hw_brk.address & (~HW_BRK_TYPE_DABR)) |
+ (child->thread.hw_brk.type & HW_BRK_TYPE_DABR));
+ ret = put_user(dabr_fake, datalp);
#endif
break;
}
diff --git a/arch/powerpc/kernel/ptrace32.c b/arch/powerpc/kernel/ptrace32.c
index 8c21658..c0244e7 100644
--- a/arch/powerpc/kernel/ptrace32.c
+++ b/arch/powerpc/kernel/ptrace32.c
@@ -252,6 +252,9 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
}
case PTRACE_GET_DEBUGREG: {
+#ifndef CONFIG_PPC_ADV_DEBUG_REGS
+ unsigned long dabr_fake;
+#endif
ret = -EINVAL;
/* We only support one DABR and no IABRS at the moment */
if (addr > 0)
@@ -259,7 +262,10 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
#ifdef CONFIG_PPC_ADV_DEBUG_REGS
ret = put_user(child->thread.dac1, (u32 __user *)data);
#else
- ret = put_user(child->thread.dabr, (u32 __user *)data);
+ dabr_fake = (
+ (child->thread.hw_brk.address & (~HW_BRK_TYPE_DABR)) |
+ (child->thread.hw_brk.type & HW_BRK_TYPE_DABR));
+ ret = put_user(dabr_fake, (u32 __user *)data);
#endif
break;
}
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 3b99711..1f26956 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -130,8 +130,9 @@ static int do_signal(struct pt_regs *regs)
* user space. The DABR will have been cleared if it
* triggered inside the kernel.
*/
- if (current->thread.dabr)
- set_dabr(current->thread.dabr, current->thread.dabrx);
+ if (current->thread.hw_brk.address &&
+ current->thread.hw_brk.type)
+ set_break(¤t->thread.hw_brk);
#endif
/* Re-enable the breakpoints for the signal stack */
thread_change_pc(current, regs);
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index cb04fc7..6a67a0d 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -67,7 +67,7 @@ int (*__debugger_ipi)(struct pt_regs *regs) __read_mostly;
int (*__debugger_bpt)(struct pt_regs *regs) __read_mostly;
int (*__debugger_sstep)(struct pt_regs *regs) __read_mostly;
int (*__debugger_iabr_match)(struct pt_regs *regs) __read_mostly;
-int (*__debugger_dabr_match)(struct pt_regs *regs) __read_mostly;
+int (*__debugger_break_match)(struct pt_regs *regs) __read_mostly;
int (*__debugger_fault_handler)(struct pt_regs *regs) __read_mostly;
EXPORT_SYMBOL(__debugger);
@@ -75,7 +75,7 @@ EXPORT_SYMBOL(__debugger_ipi);
EXPORT_SYMBOL(__debugger_bpt);
EXPORT_SYMBOL(__debugger_sstep);
EXPORT_SYMBOL(__debugger_iabr_match);
-EXPORT_SYMBOL(__debugger_dabr_match);
+EXPORT_SYMBOL(__debugger_break_match);
EXPORT_SYMBOL(__debugger_fault_handler);
#endif
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 3a8489a..229951f 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -249,8 +249,8 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE) || \
defined(CONFIG_PPC_BOOK3S_64))
if (error_code & DSISR_DABRMATCH) {
- /* DABR match */
- do_dabr(regs, address, error_code);
+ /* breakpoint match */
+ do_break(regs, address, error_code);
return 0;
}
#endif
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 1f8d2f1..529c1ed 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -43,6 +43,7 @@
#include <asm/setjmp.h>
#include <asm/reg.h>
#include <asm/debug.h>
+#include <asm/hw_breakpoint.h>
#ifdef CONFIG_PPC64
#include <asm/hvcall.h>
@@ -607,7 +608,7 @@ static int xmon_sstep(struct pt_regs *regs)
return 1;
}
-static int xmon_dabr_match(struct pt_regs *regs)
+static int xmon_break_match(struct pt_regs *regs)
{
if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT))
return 0;
@@ -740,8 +741,14 @@ static void insert_bpts(void)
static void insert_cpu_bpts(void)
{
- if (dabr.enabled)
- set_dabr(dabr.address | (dabr.enabled & 7), DABRX_ALL);
+ struct arch_hw_breakpoint brk;
+
+ if (dabr.enabled) {
+ brk.address = dabr.address;
+ brk.type = (dabr.enabled & HW_BRK_TYPE_DABR) | HW_BRK_TYPE_PRIV_ALL;
+ brk.len = 8;
+ set_break(&brk);
+ }
if (iabr && cpu_has_feature(CPU_FTR_IABR))
mtspr(SPRN_IABR, iabr->address
| (iabr->enabled & (BP_IABR|BP_IABR_TE)));
@@ -769,7 +776,7 @@ static void remove_bpts(void)
static void remove_cpu_bpts(void)
{
- set_dabr(0, 0);
+ hw_breakpoint_disable();
if (cpu_has_feature(CPU_FTR_IABR))
mtspr(SPRN_IABR, 0);
}
@@ -1138,7 +1145,7 @@ bpt_cmds(void)
printf(badaddr);
break;
}
- dabr.address &= ~7;
+ dabr.address &= ~HW_BRK_TYPE_DABR;
dabr.enabled = mode | BP_DABR;
}
break;
@@ -2917,7 +2924,7 @@ static void xmon_init(int enable)
__debugger_bpt = xmon_bpt;
__debugger_sstep = xmon_sstep;
__debugger_iabr_match = xmon_iabr_match;
- __debugger_dabr_match = xmon_dabr_match;
+ __debugger_break_match = xmon_break_match;
__debugger_fault_handler = xmon_fault_handler;
} else {
__debugger = NULL;
@@ -2925,7 +2932,7 @@ static void xmon_init(int enable)
__debugger_bpt = NULL;
__debugger_sstep = NULL;
__debugger_iabr_match = NULL;
- __debugger_dabr_match = NULL;
+ __debugger_break_match = NULL;
__debugger_fault_handler = NULL;
}
}
--
1.7.10.4
^ permalink raw reply related
* [PATCH 5/7] powerpc: Add DAWR/X SPR number definitions
From: Michael Neuling @ 2012-12-21 0:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Ian Munsie
In-Reply-To: <1356048405-20560-1-git-send-email-mikey@neuling.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/include/asm/reg.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index ce7b50e..6cae32d 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -222,6 +222,11 @@
#define CTRL_CT1 0x40000000 /* thread 1 */
#define CTRL_TE 0x00c00000 /* thread enable */
#define CTRL_RUNLATCH 0x1
+#define SPRN_DAWR 0xB4
+#define SPRN_DAWRX 0xBC
+#define DAWRX_USER (1UL << 0)
+#define DAWRX_KERNEL (1UL << 1)
+#define DAWRX_HYP (1UL << 2)
#define SPRN_DABR 0x3F5 /* Data Address Breakpoint Register */
#define DABR_TRANSLATION (1UL << 2)
#define DABR_DATA_WRITE (1UL << 1)
--
1.7.10.4
^ permalink raw reply related
* [PATCH 4/7] powerpc: Add DAWR CPU feature bit definition
From: Michael Neuling @ 2012-12-21 0:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Ian Munsie
In-Reply-To: <1356048405-20560-1-git-send-email-mikey@neuling.org>
.. and add it to POWER8 cpu features.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/include/asm/cputable.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index d1f5911..2134e26 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -173,6 +173,7 @@ extern const char *powerpc_base_platform;
#define CPU_FTR_VMX_COPY LONG_ASM_CONST(0x0040000000000000)
#define CPU_FTR_TM LONG_ASM_CONST(0x0080000000000000)
#define CPU_FTR_BCTAR LONG_ASM_CONST(0x0100000000000000)
+#define CPU_FTR_DAWR LONG_ASM_CONST(0x0200000000000000)
#ifndef __ASSEMBLY__
@@ -418,7 +419,7 @@ extern const char *powerpc_base_platform;
CPU_FTR_DSCR | CPU_FTR_SAO | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
- CPU_FTR_DBELL | CPU_FTR_TM_COMP | CPU_FTR_BCTAR)
+ CPU_FTR_DBELL | CPU_FTR_TM_COMP | CPU_FTR_BCTAR | CPU_FTR_DAWR)
#define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
--
1.7.10.4
^ permalink raw reply related
* [PATCH 3/7] powerpc: Add helper functions set the DAWR and CIABR using set_mode
From: Michael Neuling @ 2012-12-21 0:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Ian Munsie
In-Reply-To: <1356048405-20560-1-git-send-email-mikey@neuling.org>
From: Ian Munsie <imunsie@au1.ibm.com>
These are just wrappers around the new set_mode HCALL.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/platforms/pseries/plpar_wrappers.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/plpar_wrappers.h b/arch/powerpc/platforms/pseries/plpar_wrappers.h
index e6cc34a..fd09183 100644
--- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -304,4 +304,14 @@ static inline long disable_reloc_on_exceptions(void) {
return plpar_set_mode(0, 3, 0, 0);
}
+static inline long plapr_set_ciabr(unsigned long ciabr)
+{
+ return plpar_set_mode(0, 1, ciabr, 0);
+}
+
+static inline long plapr_set_watchpoint0(unsigned long dawr0, unsigned long dawrx0)
+{
+ return plpar_set_mode(0, 2, dawr0, dawrx0);
+}
+
#endif /* _PSERIES_PLPAR_WRAPPERS_H */
--
1.7.10.4
^ permalink raw reply related
* [PATCH 2/7] powerpc: Repack 64bit CPU features to remove holes
From: Michael Neuling @ 2012-12-21 0:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Ian Munsie
In-Reply-To: <1356048405-20560-1-git-send-email-mikey@neuling.org>
This frees up 7 bits for crazy new CPU features.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/include/asm/cputable.h | 50 +++++++++++++++++------------------
1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 0a4c67d..d1f5911 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -148,31 +148,31 @@ extern const char *powerpc_base_platform;
#define LONG_ASM_CONST(x) 0
#endif
-#define CPU_FTR_HVMODE LONG_ASM_CONST(0x0000000200000000)
-#define CPU_FTR_ARCH_201 LONG_ASM_CONST(0x0000000400000000)
-#define CPU_FTR_ARCH_206 LONG_ASM_CONST(0x0000000800000000)
-#define CPU_FTR_CFAR LONG_ASM_CONST(0x0000001000000000)
-#define CPU_FTR_IABR LONG_ASM_CONST(0x0000002000000000)
-#define CPU_FTR_MMCRA LONG_ASM_CONST(0x0000004000000000)
-#define CPU_FTR_CTRL LONG_ASM_CONST(0x0000008000000000)
-#define CPU_FTR_SMT LONG_ASM_CONST(0x0000010000000000)
-#define CPU_FTR_PAUSE_ZERO LONG_ASM_CONST(0x0000200000000000)
-#define CPU_FTR_PURR LONG_ASM_CONST(0x0000400000000000)
-#define CPU_FTR_CELL_TB_BUG LONG_ASM_CONST(0x0000800000000000)
-#define CPU_FTR_SPURR LONG_ASM_CONST(0x0001000000000000)
-#define CPU_FTR_DSCR LONG_ASM_CONST(0x0002000000000000)
-#define CPU_FTR_VSX LONG_ASM_CONST(0x0010000000000000)
-#define CPU_FTR_SAO LONG_ASM_CONST(0x0020000000000000)
-#define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x0040000000000000)
-#define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x0080000000000000)
-#define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x0100000000000000)
-#define CPU_FTR_STCX_CHECKS_ADDRESS LONG_ASM_CONST(0x0200000000000000)
-#define CPU_FTR_POPCNTB LONG_ASM_CONST(0x0400000000000000)
-#define CPU_FTR_POPCNTD LONG_ASM_CONST(0x0800000000000000)
-#define CPU_FTR_ICSWX LONG_ASM_CONST(0x1000000000000000)
-#define CPU_FTR_VMX_COPY LONG_ASM_CONST(0x2000000000000000)
-#define CPU_FTR_TM LONG_ASM_CONST(0x4000000000000000)
-#define CPU_FTR_BCTAR LONG_ASM_CONST(0x8000000000000000)
+#define CPU_FTR_HVMODE LONG_ASM_CONST(0x0000000100000000)
+#define CPU_FTR_ARCH_201 LONG_ASM_CONST(0x0000000200000000)
+#define CPU_FTR_ARCH_206 LONG_ASM_CONST(0x0000000400000000)
+#define CPU_FTR_CFAR LONG_ASM_CONST(0x0000000800000000)
+#define CPU_FTR_IABR LONG_ASM_CONST(0x0000001000000000)
+#define CPU_FTR_MMCRA LONG_ASM_CONST(0x0000002000000000)
+#define CPU_FTR_CTRL LONG_ASM_CONST(0x0000004000000000)
+#define CPU_FTR_SMT LONG_ASM_CONST(0x0000008000000000)
+#define CPU_FTR_PAUSE_ZERO LONG_ASM_CONST(0x0000010000000000)
+#define CPU_FTR_PURR LONG_ASM_CONST(0x0000020000000000)
+#define CPU_FTR_CELL_TB_BUG LONG_ASM_CONST(0x0000040000000000)
+#define CPU_FTR_SPURR LONG_ASM_CONST(0x0000080000000000)
+#define CPU_FTR_DSCR LONG_ASM_CONST(0x0000100000000000)
+#define CPU_FTR_VSX LONG_ASM_CONST(0x0000200000000000)
+#define CPU_FTR_SAO LONG_ASM_CONST(0x0000400000000000)
+#define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x0000800000000000)
+#define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x0001000000000000)
+#define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x0002000000000000)
+#define CPU_FTR_STCX_CHECKS_ADDRESS LONG_ASM_CONST(0x0004000000000000)
+#define CPU_FTR_POPCNTB LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_POPCNTD LONG_ASM_CONST(0x0010000000000000)
+#define CPU_FTR_ICSWX LONG_ASM_CONST(0x0020000000000000)
+#define CPU_FTR_VMX_COPY LONG_ASM_CONST(0x0040000000000000)
+#define CPU_FTR_TM LONG_ASM_CONST(0x0080000000000000)
+#define CPU_FTR_BCTAR LONG_ASM_CONST(0x0100000000000000)
#ifndef __ASSEMBLY__
--
1.7.10.4
^ permalink raw reply related
* [PATCH 1/7] powerpc: Remove extra zeros from 32 bit CPU features definitions
From: Michael Neuling @ 2012-12-21 0:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Ian Munsie
In-Reply-To: <1356048405-20560-1-git-send-email-mikey@neuling.org>
These are 32 bit, so no need to have a bunch of wasted 0s.
The 0s saved here can be put to better use elsewhere, like at the end of my pay
check.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/include/asm/cputable.h | 62 +++++++++++++++++------------------
1 file changed, 31 insertions(+), 31 deletions(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index cbbec56a..0a4c67d 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -106,37 +106,37 @@ extern const char *powerpc_base_platform;
/* CPU kernel features */
/* Retain the 32b definitions all use bottom half of word */
-#define CPU_FTR_COHERENT_ICACHE ASM_CONST(0x0000000000000001)
-#define CPU_FTR_L2CR ASM_CONST(0x0000000000000002)
-#define CPU_FTR_SPEC7450 ASM_CONST(0x0000000000000004)
-#define CPU_FTR_ALTIVEC ASM_CONST(0x0000000000000008)
-#define CPU_FTR_TAU ASM_CONST(0x0000000000000010)
-#define CPU_FTR_CAN_DOZE ASM_CONST(0x0000000000000020)
-#define CPU_FTR_USE_TB ASM_CONST(0x0000000000000040)
-#define CPU_FTR_L2CSR ASM_CONST(0x0000000000000080)
-#define CPU_FTR_601 ASM_CONST(0x0000000000000100)
-#define CPU_FTR_DBELL ASM_CONST(0x0000000000000200)
-#define CPU_FTR_CAN_NAP ASM_CONST(0x0000000000000400)
-#define CPU_FTR_L3CR ASM_CONST(0x0000000000000800)
-#define CPU_FTR_L3_DISABLE_NAP ASM_CONST(0x0000000000001000)
-#define CPU_FTR_NAP_DISABLE_L2_PR ASM_CONST(0x0000000000002000)
-#define CPU_FTR_DUAL_PLL_750FX ASM_CONST(0x0000000000004000)
-#define CPU_FTR_NO_DPM ASM_CONST(0x0000000000008000)
-#define CPU_FTR_476_DD2 ASM_CONST(0x0000000000010000)
-#define CPU_FTR_NEED_COHERENT ASM_CONST(0x0000000000020000)
-#define CPU_FTR_NO_BTIC ASM_CONST(0x0000000000040000)
-#define CPU_FTR_DEBUG_LVL_EXC ASM_CONST(0x0000000000080000)
-#define CPU_FTR_NODSISRALIGN ASM_CONST(0x0000000000100000)
-#define CPU_FTR_PPC_LE ASM_CONST(0x0000000000200000)
-#define CPU_FTR_REAL_LE ASM_CONST(0x0000000000400000)
-#define CPU_FTR_FPU_UNAVAILABLE ASM_CONST(0x0000000000800000)
-#define CPU_FTR_UNIFIED_ID_CACHE ASM_CONST(0x0000000001000000)
-#define CPU_FTR_SPE ASM_CONST(0x0000000002000000)
-#define CPU_FTR_NEED_PAIRED_STWCX ASM_CONST(0x0000000004000000)
-#define CPU_FTR_LWSYNC ASM_CONST(0x0000000008000000)
-#define CPU_FTR_NOEXECUTE ASM_CONST(0x0000000010000000)
-#define CPU_FTR_INDEXED_DCR ASM_CONST(0x0000000020000000)
-#define CPU_FTR_EMB_HV ASM_CONST(0x0000000040000000)
+#define CPU_FTR_COHERENT_ICACHE ASM_CONST(0x00000001)
+#define CPU_FTR_L2CR ASM_CONST(0x00000002)
+#define CPU_FTR_SPEC7450 ASM_CONST(0x00000004)
+#define CPU_FTR_ALTIVEC ASM_CONST(0x00000008)
+#define CPU_FTR_TAU ASM_CONST(0x00000010)
+#define CPU_FTR_CAN_DOZE ASM_CONST(0x00000020)
+#define CPU_FTR_USE_TB ASM_CONST(0x00000040)
+#define CPU_FTR_L2CSR ASM_CONST(0x00000080)
+#define CPU_FTR_601 ASM_CONST(0x00000100)
+#define CPU_FTR_DBELL ASM_CONST(0x00000200)
+#define CPU_FTR_CAN_NAP ASM_CONST(0x00000400)
+#define CPU_FTR_L3CR ASM_CONST(0x00000800)
+#define CPU_FTR_L3_DISABLE_NAP ASM_CONST(0x00001000)
+#define CPU_FTR_NAP_DISABLE_L2_PR ASM_CONST(0x00002000)
+#define CPU_FTR_DUAL_PLL_750FX ASM_CONST(0x00004000)
+#define CPU_FTR_NO_DPM ASM_CONST(0x00008000)
+#define CPU_FTR_476_DD2 ASM_CONST(0x00010000)
+#define CPU_FTR_NEED_COHERENT ASM_CONST(0x00020000)
+#define CPU_FTR_NO_BTIC ASM_CONST(0x00040000)
+#define CPU_FTR_DEBUG_LVL_EXC ASM_CONST(0x00080000)
+#define CPU_FTR_NODSISRALIGN ASM_CONST(0x00100000)
+#define CPU_FTR_PPC_LE ASM_CONST(0x00200000)
+#define CPU_FTR_REAL_LE ASM_CONST(0x00400000)
+#define CPU_FTR_FPU_UNAVAILABLE ASM_CONST(0x00800000)
+#define CPU_FTR_UNIFIED_ID_CACHE ASM_CONST(0x01000000)
+#define CPU_FTR_SPE ASM_CONST(0x02000000)
+#define CPU_FTR_NEED_PAIRED_STWCX ASM_CONST(0x04000000)
+#define CPU_FTR_LWSYNC ASM_CONST(0x08000000)
+#define CPU_FTR_NOEXECUTE ASM_CONST(0x10000000)
+#define CPU_FTR_INDEXED_DCR ASM_CONST(0x20000000)
+#define CPU_FTR_EMB_HV ASM_CONST(0x40000000)
/*
* Add the 64-bit processor unique features in the top half of the word;
--
1.7.10.4
^ permalink raw reply related
* [PATCH 0/7] powerpc: add POWER8 DAWR support
From: Michael Neuling @ 2012-12-21 0:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Ian Munsie
Add support for the Data Address Watchpoint Register (DAWR) as found in POWER8.
This replaces the DABR found in POWER7 and earlier and allows for wider
watchpoint areas.
--
1.7.10.4
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox