From: Tang Chen <tangchen@cn.fujitsu.com>
To: Jianguo Wu <wujianguo@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>,
x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org,
linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
linux-ia64@vger.kernel.org, cmetcalf@tilera.com,
sparclinux@vger.kernel.org, David Rientjes <rientjes@google.com>,
Jiang Liu <liuj97@gmail.com>, Len Brown <len.brown@intel.com>,
benh@kernel.crashing.org, paulus@samba.org,
Christoph Lameter <cl@linux.com>,
Minchan Kim <minchan.kim@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Subject: Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap
Date: Fri, 07 Dec 2012 09:42:14 +0800 [thread overview]
Message-ID: <50C14976.2050606@cn.fujitsu.com> (raw)
In-Reply-To: <50BC0D2D.8040008@huawei.com>
Hi Wu,
I met some problems when I was digging into the code. It's very
kind of you if you could help me with that. :)
If I misunderstood your code, please tell me.
Please see below. :)
On 12/03/2012 10:23 AM, Jianguo Wu wrote:
> Signed-off-by: Jianguo Wu<wujianguo@huawei.com>
> Signed-off-by: Jiang Liu<jiang.liu@huawei.com>
> ---
> include/linux/mm.h | 1 +
> mm/sparse-vmemmap.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++++++
> mm/sparse.c | 3 +-
> 3 files changed, 234 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5657670..1f26af5 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
> void vmemmap_populate_print_last(void);
> void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
> unsigned long size);
> +void vmemmap_free(struct page *memmap, unsigned long nr_pages);
>
> enum mf_flags {
> MF_COUNT_INCREASED = 1<< 0,
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 1b7e22a..748732d 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -29,6 +29,10 @@
> #include<asm/pgalloc.h>
> #include<asm/pgtable.h>
>
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +#include<asm/tlbflush.h>
> +#endif
> +
> /*
> * Allocate a block of memory to be used to back the virtual memory map
> * or to back the page tables that are used to create the mapping.
> @@ -224,3 +228,230 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> vmemmap_buf_end = NULL;
> }
> }
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +
> +#define PAGE_INUSE 0xFD
> +
> +static void vmemmap_free_pages(struct page *page, int order)
> +{
> + struct zone *zone;
> + unsigned long magic;
> +
> + magic = (unsigned long) page->lru.next;
> + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
> + put_page_bootmem(page);
> +
> + zone = page_zone(page);
> + zone_span_writelock(zone);
> + zone->present_pages++;
> + zone_span_writeunlock(zone);
> + totalram_pages++;
> + } else
> + free_pages((unsigned long)page_address(page), order);
Here, I think SECTION_INFO and MIX_SECTION_INFO pages are all allocated
by bootmem, so I put this function this way.
I'm not sure if parameter order is necessary here. It will always be 0
in your code. Is this OK to you ?
static void free_pagetable(struct page *page)
{
struct zone *zone;
bool bootmem = false;
unsigned long magic;
/* bootmem page has reserved flag */
if (PageReserved(page)) {
__ClearPageReserved(page);
bootmem = true;
}
magic = (unsigned long) page->lru.next;
if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
put_page_bootmem(page);
else
__free_page(page);
/*
* SECTION_INFO pages and MIX_SECTION_INFO pages
* are all allocated by bootmem.
*/
if (bootmem) {
zone = page_zone(page);
zone_span_writelock(zone);
zone->present_pages++;
zone_span_writeunlock(zone);
totalram_pages++;
}
}
(snip)
> +
> +static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned long end)
> +{
> + pte_t *pte;
> + unsigned long next;
> + void *page_addr;
> +
> + pte = pte_offset_kernel(pmd, addr);
> + for (; addr< end; pte++, addr += PAGE_SIZE) {
> + next = (addr + PAGE_SIZE)& PAGE_MASK;
> + if (next> end)
> + next = end;
> +
> + if (pte_none(*pte))
Here, you checked xxx_none() in your vmemmap_xxx_remove(), but you used
!xxx_present() in your x86_64 patches. Is it OK if I only check
!xxx_present() ?
> + continue;
> + if (IS_ALIGNED(addr, PAGE_SIZE)&&
> + IS_ALIGNED(next, PAGE_SIZE)) {
> + vmemmap_free_pages(pte_page(*pte), 0);
> + spin_lock(&init_mm.page_table_lock);
> + pte_clear(&init_mm, addr, pte);
> + spin_unlock(&init_mm.page_table_lock);
> + } else {
> + /*
> + * Removed page structs are filled with 0xFD.
> + */
> + memset((void *)addr, PAGE_INUSE, next - addr);
> + page_addr = page_address(pte_page(*pte));
> +
> + if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
> + spin_lock(&init_mm.page_table_lock);
> + pte_clear(&init_mm, addr, pte);
> + spin_unlock(&init_mm.page_table_lock);
Here, since we clear pte, we should also free the page, right ?
> + }
> + }
> + }
> +
> + free_pte_table(pmd);
> + __flush_tlb_all();
> +}
> +
> +static void vmemmap_pmd_remove(pud_t *pud, unsigned long addr, unsigned long end)
> +{
> + unsigned long next;
> + pmd_t *pmd;
> +
> + pmd = pmd_offset(pud, addr);
> + for (; addr< end; addr = next, pmd++) {
> + next = (addr, end);
And by the way, there isn't pte_addr_end() in kernel, why ?
I saw you calculated it like this:
next = (addr + PAGE_SIZE) & PAGE_MASK;
if (next > end)
next = end;
This logic is very similar to {pmd|pud|pgd}_addr_end(). Shall we add a
pte_addr_end() or something ? :)
Since there is no such code in kernel for a long time, I think there
must be some reasons.
I merged free_xxx_table() and remove_xxx_table() as common interfaces.
And again, thanks for your patient and nice explanation. :)
(snip)
WARNING: multiple messages have this Message-ID (diff)
From: Tang Chen <tangchen@cn.fujitsu.com>
To: Jianguo Wu <wujianguo@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>,
x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org,
linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
linux-ia64@vger.kernel.org, cmetcalf@tilera.com,
sparclinux@vger.kernel.org, David Rientjes <rientjes@google.com>,
Jiang Liu <liuj97@gmail.com>, Len Brown <len.brown@intel.com>,
benh@kernel.crashing.org, paulus@samba.org,
Christoph Lameter <cl@linux.com>,
Minchan Kim <minchan.kim@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Subject: Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap
Date: Fri, 07 Dec 2012 01:42:14 +0000 [thread overview]
Message-ID: <50C14976.2050606@cn.fujitsu.com> (raw)
In-Reply-To: <50BC0D2D.8040008@huawei.com>
Hi Wu,
I met some problems when I was digging into the code. It's very
kind of you if you could help me with that. :)
If I misunderstood your code, please tell me.
Please see below. :)
On 12/03/2012 10:23 AM, Jianguo Wu wrote:
> Signed-off-by: Jianguo Wu<wujianguo@huawei.com>
> Signed-off-by: Jiang Liu<jiang.liu@huawei.com>
> ---
> include/linux/mm.h | 1 +
> mm/sparse-vmemmap.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++++++
> mm/sparse.c | 3 +-
> 3 files changed, 234 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5657670..1f26af5 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
> void vmemmap_populate_print_last(void);
> void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
> unsigned long size);
> +void vmemmap_free(struct page *memmap, unsigned long nr_pages);
>
> enum mf_flags {
> MF_COUNT_INCREASED = 1<< 0,
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 1b7e22a..748732d 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -29,6 +29,10 @@
> #include<asm/pgalloc.h>
> #include<asm/pgtable.h>
>
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +#include<asm/tlbflush.h>
> +#endif
> +
> /*
> * Allocate a block of memory to be used to back the virtual memory map
> * or to back the page tables that are used to create the mapping.
> @@ -224,3 +228,230 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> vmemmap_buf_end = NULL;
> }
> }
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +
> +#define PAGE_INUSE 0xFD
> +
> +static void vmemmap_free_pages(struct page *page, int order)
> +{
> + struct zone *zone;
> + unsigned long magic;
> +
> + magic = (unsigned long) page->lru.next;
> + if (magic = SECTION_INFO || magic = MIX_SECTION_INFO) {
> + put_page_bootmem(page);
> +
> + zone = page_zone(page);
> + zone_span_writelock(zone);
> + zone->present_pages++;
> + zone_span_writeunlock(zone);
> + totalram_pages++;
> + } else
> + free_pages((unsigned long)page_address(page), order);
Here, I think SECTION_INFO and MIX_SECTION_INFO pages are all allocated
by bootmem, so I put this function this way.
I'm not sure if parameter order is necessary here. It will always be 0
in your code. Is this OK to you ?
static void free_pagetable(struct page *page)
{
struct zone *zone;
bool bootmem = false;
unsigned long magic;
/* bootmem page has reserved flag */
if (PageReserved(page)) {
__ClearPageReserved(page);
bootmem = true;
}
magic = (unsigned long) page->lru.next;
if (magic = SECTION_INFO || magic = MIX_SECTION_INFO)
put_page_bootmem(page);
else
__free_page(page);
/*
* SECTION_INFO pages and MIX_SECTION_INFO pages
* are all allocated by bootmem.
*/
if (bootmem) {
zone = page_zone(page);
zone_span_writelock(zone);
zone->present_pages++;
zone_span_writeunlock(zone);
totalram_pages++;
}
}
(snip)
> +
> +static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned long end)
> +{
> + pte_t *pte;
> + unsigned long next;
> + void *page_addr;
> +
> + pte = pte_offset_kernel(pmd, addr);
> + for (; addr< end; pte++, addr += PAGE_SIZE) {
> + next = (addr + PAGE_SIZE)& PAGE_MASK;
> + if (next> end)
> + next = end;
> +
> + if (pte_none(*pte))
Here, you checked xxx_none() in your vmemmap_xxx_remove(), but you used
!xxx_present() in your x86_64 patches. Is it OK if I only check
!xxx_present() ?
> + continue;
> + if (IS_ALIGNED(addr, PAGE_SIZE)&&
> + IS_ALIGNED(next, PAGE_SIZE)) {
> + vmemmap_free_pages(pte_page(*pte), 0);
> + spin_lock(&init_mm.page_table_lock);
> + pte_clear(&init_mm, addr, pte);
> + spin_unlock(&init_mm.page_table_lock);
> + } else {
> + /*
> + * Removed page structs are filled with 0xFD.
> + */
> + memset((void *)addr, PAGE_INUSE, next - addr);
> + page_addr = page_address(pte_page(*pte));
> +
> + if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
> + spin_lock(&init_mm.page_table_lock);
> + pte_clear(&init_mm, addr, pte);
> + spin_unlock(&init_mm.page_table_lock);
Here, since we clear pte, we should also free the page, right ?
> + }
> + }
> + }
> +
> + free_pte_table(pmd);
> + __flush_tlb_all();
> +}
> +
> +static void vmemmap_pmd_remove(pud_t *pud, unsigned long addr, unsigned long end)
> +{
> + unsigned long next;
> + pmd_t *pmd;
> +
> + pmd = pmd_offset(pud, addr);
> + for (; addr< end; addr = next, pmd++) {
> + next = (addr, end);
And by the way, there isn't pte_addr_end() in kernel, why ?
I saw you calculated it like this:
next = (addr + PAGE_SIZE) & PAGE_MASK;
if (next > end)
next = end;
This logic is very similar to {pmd|pud|pgd}_addr_end(). Shall we add a
pte_addr_end() or something ? :)
Since there is no such code in kernel for a long time, I think there
must be some reasons.
I merged free_xxx_table() and remove_xxx_table() as common interfaces.
And again, thanks for your patient and nice explanation. :)
(snip)
WARNING: multiple messages have this Message-ID (diff)
From: Tang Chen <tangchen@cn.fujitsu.com>
To: Jianguo Wu <wujianguo@huawei.com>
Cc: linux-s390@vger.kernel.org, linux-ia64@vger.kernel.org,
Wen Congyang <wency@cn.fujitsu.com>,
linux-acpi@vger.kernel.org, linux-sh@vger.kernel.org,
Len Brown <len.brown@intel.com>,
x86@kernel.org, linux-kernel@vger.kernel.org,
cmetcalf@tilera.com, linux-mm@kvack.org,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
paulus@samba.org, Minchan Kim <minchan.kim@gmail.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
David Rientjes <rientjes@google.com>,
sparclinux@vger.kernel.org, Christoph Lameter <cl@linux.com>,
linuxppc-dev@lists.ozlabs.org,
Andrew Morton <akpm@linux-foundation.org>,
Jiang Liu <liuj97@gmail.com>
Subject: Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap
Date: Fri, 07 Dec 2012 09:42:14 +0800 [thread overview]
Message-ID: <50C14976.2050606@cn.fujitsu.com> (raw)
In-Reply-To: <50BC0D2D.8040008@huawei.com>
Hi Wu,
I met some problems when I was digging into the code. It's very
kind of you if you could help me with that. :)
If I misunderstood your code, please tell me.
Please see below. :)
On 12/03/2012 10:23 AM, Jianguo Wu wrote:
> Signed-off-by: Jianguo Wu<wujianguo@huawei.com>
> Signed-off-by: Jiang Liu<jiang.liu@huawei.com>
> ---
> include/linux/mm.h | 1 +
> mm/sparse-vmemmap.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++++++
> mm/sparse.c | 3 +-
> 3 files changed, 234 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5657670..1f26af5 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
> void vmemmap_populate_print_last(void);
> void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
> unsigned long size);
> +void vmemmap_free(struct page *memmap, unsigned long nr_pages);
>
> enum mf_flags {
> MF_COUNT_INCREASED = 1<< 0,
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 1b7e22a..748732d 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -29,6 +29,10 @@
> #include<asm/pgalloc.h>
> #include<asm/pgtable.h>
>
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +#include<asm/tlbflush.h>
> +#endif
> +
> /*
> * Allocate a block of memory to be used to back the virtual memory map
> * or to back the page tables that are used to create the mapping.
> @@ -224,3 +228,230 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> vmemmap_buf_end = NULL;
> }
> }
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +
> +#define PAGE_INUSE 0xFD
> +
> +static void vmemmap_free_pages(struct page *page, int order)
> +{
> + struct zone *zone;
> + unsigned long magic;
> +
> + magic = (unsigned long) page->lru.next;
> + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
> + put_page_bootmem(page);
> +
> + zone = page_zone(page);
> + zone_span_writelock(zone);
> + zone->present_pages++;
> + zone_span_writeunlock(zone);
> + totalram_pages++;
> + } else
> + free_pages((unsigned long)page_address(page), order);
Here, I think SECTION_INFO and MIX_SECTION_INFO pages are all allocated
by bootmem, so I put this function this way.
I'm not sure if parameter order is necessary here. It will always be 0
in your code. Is this OK to you ?
static void free_pagetable(struct page *page)
{
struct zone *zone;
bool bootmem = false;
unsigned long magic;
/* bootmem page has reserved flag */
if (PageReserved(page)) {
__ClearPageReserved(page);
bootmem = true;
}
magic = (unsigned long) page->lru.next;
if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
put_page_bootmem(page);
else
__free_page(page);
/*
* SECTION_INFO pages and MIX_SECTION_INFO pages
* are all allocated by bootmem.
*/
if (bootmem) {
zone = page_zone(page);
zone_span_writelock(zone);
zone->present_pages++;
zone_span_writeunlock(zone);
totalram_pages++;
}
}
(snip)
> +
> +static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned long end)
> +{
> + pte_t *pte;
> + unsigned long next;
> + void *page_addr;
> +
> + pte = pte_offset_kernel(pmd, addr);
> + for (; addr< end; pte++, addr += PAGE_SIZE) {
> + next = (addr + PAGE_SIZE)& PAGE_MASK;
> + if (next> end)
> + next = end;
> +
> + if (pte_none(*pte))
Here, you checked xxx_none() in your vmemmap_xxx_remove(), but you used
!xxx_present() in your x86_64 patches. Is it OK if I only check
!xxx_present() ?
> + continue;
> + if (IS_ALIGNED(addr, PAGE_SIZE)&&
> + IS_ALIGNED(next, PAGE_SIZE)) {
> + vmemmap_free_pages(pte_page(*pte), 0);
> + spin_lock(&init_mm.page_table_lock);
> + pte_clear(&init_mm, addr, pte);
> + spin_unlock(&init_mm.page_table_lock);
> + } else {
> + /*
> + * Removed page structs are filled with 0xFD.
> + */
> + memset((void *)addr, PAGE_INUSE, next - addr);
> + page_addr = page_address(pte_page(*pte));
> +
> + if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
> + spin_lock(&init_mm.page_table_lock);
> + pte_clear(&init_mm, addr, pte);
> + spin_unlock(&init_mm.page_table_lock);
Here, since we clear pte, we should also free the page, right ?
> + }
> + }
> + }
> +
> + free_pte_table(pmd);
> + __flush_tlb_all();
> +}
> +
> +static void vmemmap_pmd_remove(pud_t *pud, unsigned long addr, unsigned long end)
> +{
> + unsigned long next;
> + pmd_t *pmd;
> +
> + pmd = pmd_offset(pud, addr);
> + for (; addr< end; addr = next, pmd++) {
> + next = (addr, end);
And by the way, there isn't pte_addr_end() in kernel, why ?
I saw you calculated it like this:
next = (addr + PAGE_SIZE) & PAGE_MASK;
if (next > end)
next = end;
This logic is very similar to {pmd|pud|pgd}_addr_end(). Shall we add a
pte_addr_end() or something ? :)
Since there is no such code in kernel for a long time, I think there
must be some reasons.
I merged free_xxx_table() and remove_xxx_table() as common interfaces.
And again, thanks for your patient and nice explanation. :)
(snip)
WARNING: multiple messages have this Message-ID (diff)
From: Tang Chen <tangchen@cn.fujitsu.com>
To: Jianguo Wu <wujianguo@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>,
x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org,
linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
linux-ia64@vger.kernel.org, cmetcalf@tilera.com,
sparclinux@vger.kernel.org, David Rientjes <rientjes@google.com>,
Jiang Liu <liuj97@gmail.com>, Len Brown <len.brown@intel.com>,
benh@kernel.crashing.org, paulus@samba.org,
Christoph Lameter <cl@linux.com>,
Minchan Kim <minchan.kim@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Subject: Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap
Date: Fri, 07 Dec 2012 09:42:14 +0800 [thread overview]
Message-ID: <50C14976.2050606@cn.fujitsu.com> (raw)
In-Reply-To: <50BC0D2D.8040008@huawei.com>
Hi Wu,
I met some problems when I was digging into the code. It's very
kind of you if you could help me with that. :)
If I misunderstood your code, please tell me.
Please see below. :)
On 12/03/2012 10:23 AM, Jianguo Wu wrote:
> Signed-off-by: Jianguo Wu<wujianguo@huawei.com>
> Signed-off-by: Jiang Liu<jiang.liu@huawei.com>
> ---
> include/linux/mm.h | 1 +
> mm/sparse-vmemmap.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++++++
> mm/sparse.c | 3 +-
> 3 files changed, 234 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5657670..1f26af5 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
> void vmemmap_populate_print_last(void);
> void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
> unsigned long size);
> +void vmemmap_free(struct page *memmap, unsigned long nr_pages);
>
> enum mf_flags {
> MF_COUNT_INCREASED = 1<< 0,
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 1b7e22a..748732d 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -29,6 +29,10 @@
> #include<asm/pgalloc.h>
> #include<asm/pgtable.h>
>
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +#include<asm/tlbflush.h>
> +#endif
> +
> /*
> * Allocate a block of memory to be used to back the virtual memory map
> * or to back the page tables that are used to create the mapping.
> @@ -224,3 +228,230 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> vmemmap_buf_end = NULL;
> }
> }
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +
> +#define PAGE_INUSE 0xFD
> +
> +static void vmemmap_free_pages(struct page *page, int order)
> +{
> + struct zone *zone;
> + unsigned long magic;
> +
> + magic = (unsigned long) page->lru.next;
> + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
> + put_page_bootmem(page);
> +
> + zone = page_zone(page);
> + zone_span_writelock(zone);
> + zone->present_pages++;
> + zone_span_writeunlock(zone);
> + totalram_pages++;
> + } else
> + free_pages((unsigned long)page_address(page), order);
Here, I think SECTION_INFO and MIX_SECTION_INFO pages are all allocated
by bootmem, so I put this function this way.
I'm not sure if parameter order is necessary here. It will always be 0
in your code. Is this OK to you ?
static void free_pagetable(struct page *page)
{
struct zone *zone;
bool bootmem = false;
unsigned long magic;
/* bootmem page has reserved flag */
if (PageReserved(page)) {
__ClearPageReserved(page);
bootmem = true;
}
magic = (unsigned long) page->lru.next;
if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
put_page_bootmem(page);
else
__free_page(page);
/*
* SECTION_INFO pages and MIX_SECTION_INFO pages
* are all allocated by bootmem.
*/
if (bootmem) {
zone = page_zone(page);
zone_span_writelock(zone);
zone->present_pages++;
zone_span_writeunlock(zone);
totalram_pages++;
}
}
(snip)
> +
> +static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned long end)
> +{
> + pte_t *pte;
> + unsigned long next;
> + void *page_addr;
> +
> + pte = pte_offset_kernel(pmd, addr);
> + for (; addr< end; pte++, addr += PAGE_SIZE) {
> + next = (addr + PAGE_SIZE)& PAGE_MASK;
> + if (next> end)
> + next = end;
> +
> + if (pte_none(*pte))
Here, you checked xxx_none() in your vmemmap_xxx_remove(), but you used
!xxx_present() in your x86_64 patches. Is it OK if I only check
!xxx_present() ?
> + continue;
> + if (IS_ALIGNED(addr, PAGE_SIZE)&&
> + IS_ALIGNED(next, PAGE_SIZE)) {
> + vmemmap_free_pages(pte_page(*pte), 0);
> + spin_lock(&init_mm.page_table_lock);
> + pte_clear(&init_mm, addr, pte);
> + spin_unlock(&init_mm.page_table_lock);
> + } else {
> + /*
> + * Removed page structs are filled with 0xFD.
> + */
> + memset((void *)addr, PAGE_INUSE, next - addr);
> + page_addr = page_address(pte_page(*pte));
> +
> + if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
> + spin_lock(&init_mm.page_table_lock);
> + pte_clear(&init_mm, addr, pte);
> + spin_unlock(&init_mm.page_table_lock);
Here, since we clear pte, we should also free the page, right ?
> + }
> + }
> + }
> +
> + free_pte_table(pmd);
> + __flush_tlb_all();
> +}
> +
> +static void vmemmap_pmd_remove(pud_t *pud, unsigned long addr, unsigned long end)
> +{
> + unsigned long next;
> + pmd_t *pmd;
> +
> + pmd = pmd_offset(pud, addr);
> + for (; addr< end; addr = next, pmd++) {
> + next = (addr, end);
And by the way, there isn't pte_addr_end() in kernel, why ?
I saw you calculated it like this:
next = (addr + PAGE_SIZE) & PAGE_MASK;
if (next > end)
next = end;
This logic is very similar to {pmd|pud|pgd}_addr_end(). Shall we add a
pte_addr_end() or something ? :)
Since there is no such code in kernel for a long time, I think there
must be some reasons.
I merged free_xxx_table() and remove_xxx_table() as common interfaces.
And again, thanks for your patient and nice explanation. :)
(snip)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-12-07 1:43 UTC|newest]
Thread overview: 176+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-27 10:00 [Patch v4 00/12] memory-hotplug: hot-remove physical memory Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 9:58 ` [Patch v4 01/12] memory-hotplug: try to offline the memory twice to avoid dependence Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-12-04 9:17 ` Tang Chen
2012-12-04 9:17 ` Tang Chen
2012-12-04 9:17 ` Tang Chen
2012-12-04 9:17 ` Tang Chen
2012-11-27 9:58 ` [Patch v4 05/12] memory-hotplug: introduce new function arch_remove_memory() for removing page table Wen Congyang
2012-11-27 10:00 ` [Patch v4 05/12] memory-hotplug: introduce new function arch_remove_memory() for removing page table depends on architecture Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` [Patch v4 05/12] memory-hotplug: introduce new function arch_remove_memory() for removing page table Wen Congyang
2012-11-27 10:00 ` [Patch v4 05/12] memory-hotplug: introduce new function arch_remove_memory() for removing page table depends on architecture Wen Congyang
2012-11-27 9:59 ` [Patch v4 05/12] memory-hotplug: introduce new function arch_remove_memory() for removing page table Wen Congyang
2012-12-04 9:30 ` [Patch v4 05/12] memory-hotplug: introduce new function arch_remove_memory() for removing page table depends on architecture Tang Chen
2012-12-04 9:30 ` Tang Chen
2012-12-04 9:30 ` Tang Chen
2012-12-04 9:30 ` [Patch v4 05/12] memory-hotplug: introduce new function arch_remove_memory() for removing page t Tang Chen
2012-11-27 9:59 ` [Patch v4 12/12] memory-hotplug: free node_data when a node is offlined Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 9:59 ` Wen Congyang
2012-11-27 9:59 ` Wen Congyang
2012-12-04 10:10 ` Tang Chen
2012-12-04 10:10 ` Tang Chen
2012-12-04 10:10 ` Tang Chen
2012-12-04 10:10 ` Tang Chen
2012-11-27 9:59 ` [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-28 9:40 ` Jianguo Wu
2012-11-28 9:40 ` Jianguo Wu
2012-11-28 9:40 ` Jianguo Wu
2012-11-28 9:40 ` Jianguo Wu
2012-11-28 9:40 ` Jianguo Wu
2012-11-30 1:45 ` Wen Congyang
2012-11-30 1:45 ` Wen Congyang
2012-11-30 1:45 ` Wen Congyang
2012-11-30 1:45 ` Wen Congyang
2012-11-30 2:47 ` Jianguo Wu
2012-11-30 2:47 ` Jianguo Wu
2012-11-30 2:47 ` Jianguo Wu
2012-11-30 2:47 ` Jianguo Wu
2012-11-30 2:55 ` Yasuaki Ishimatsu
2012-11-30 2:55 ` Yasuaki Ishimatsu
2012-11-30 2:55 ` Yasuaki Ishimatsu
2012-11-30 2:55 ` Yasuaki Ishimatsu
2012-11-30 2:55 ` Yasuaki Ishimatsu
2012-12-03 2:23 ` Jianguo Wu
2012-12-03 2:23 ` Jianguo Wu
2012-12-03 2:23 ` Jianguo Wu
2012-12-03 2:23 ` Jianguo Wu
2012-12-03 2:23 ` Jianguo Wu
2012-12-04 9:13 ` Tang Chen
2012-12-04 9:13 ` Tang Chen
2012-12-04 9:13 ` Tang Chen
2012-12-04 9:13 ` Tang Chen
2012-12-04 12:20 ` Jianguo Wu
2012-12-04 12:20 ` Jianguo Wu
2012-12-04 12:20 ` Jianguo Wu
2012-12-04 12:20 ` Jianguo Wu
2012-12-04 12:20 ` Jianguo Wu
2012-12-05 2:07 ` Tang Chen
2012-12-05 2:07 ` Tang Chen
2012-12-05 2:07 ` Tang Chen
2012-12-05 2:07 ` Tang Chen
2012-12-05 3:23 ` Jianguo Wu
2012-12-05 3:23 ` Jianguo Wu
2012-12-05 3:23 ` Jianguo Wu
2012-12-05 3:23 ` Jianguo Wu
2012-12-05 3:23 ` Jianguo Wu
2012-12-07 1:42 ` Tang Chen [this message]
2012-12-07 1:42 ` Tang Chen
2012-12-07 1:42 ` Tang Chen
2012-12-07 1:42 ` Tang Chen
2012-12-07 2:20 ` Jianguo Wu
2012-12-07 2:20 ` Jianguo Wu
2012-12-07 2:20 ` Jianguo Wu
2012-12-07 2:20 ` Jianguo Wu
2012-12-07 2:20 ` Jianguo Wu
2012-12-04 9:47 ` Tang Chen
2012-12-04 9:47 ` Tang Chen
2012-12-04 9:47 ` Tang Chen
2012-12-04 9:47 ` Tang Chen
2012-11-27 9:59 ` [Patch v4 09/12] memory-hotplug: remove page table of x86_64 architecture Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 9:59 ` Wen Congyang
2012-12-07 6:43 ` Tang Chen
2012-12-07 6:43 ` Tang Chen
2012-12-07 6:43 ` Tang Chen
2012-12-07 6:43 ` Tang Chen
2012-12-07 7:06 ` Jianguo Wu
2012-12-07 7:06 ` Jianguo Wu
2012-12-07 7:06 ` Jianguo Wu
2012-12-07 7:06 ` Jianguo Wu
2012-12-07 7:06 ` Jianguo Wu
2012-11-27 10:00 ` [Patch v4 02/12] memory-hotplug: check whether all memory blocks are offlined or not when removing memory Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` [Patch v4 02/12] memory-hotplug: check whether all memory blocks are offlined or not when removing m Wen Congyang
2012-12-04 9:22 ` [Patch v4 02/12] memory-hotplug: check whether all memory blocks are offlined or not when removing memory Tang Chen
2012-12-04 9:22 ` Tang Chen
2012-12-04 9:22 ` Tang Chen
2012-12-04 9:22 ` [Patch v4 02/12] memory-hotplug: check whether all memory blocks are offlined or not when removi Tang Chen
2012-11-27 10:00 ` [Patch v4 03/12] memory-hotplug: remove redundant codes Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-12-04 9:22 ` Tang Chen
2012-12-04 9:22 ` Tang Chen
2012-12-04 9:22 ` Tang Chen
2012-12-04 9:22 ` Tang Chen
2012-12-04 10:31 ` Tang Chen
2012-12-04 10:31 ` Tang Chen
2012-12-04 10:31 ` Tang Chen
2012-12-04 10:31 ` Tang Chen
2012-11-27 10:00 ` [Patch v4 04/12] memory-hotplug: remove /sys/firmware/memmap/X sysfs Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` [Patch v4 06/12] memory-hotplug: unregister memory section on SPARSEMEM_VMEMMAP Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-12-04 9:34 ` Tang Chen
2012-12-04 9:34 ` Tang Chen
2012-12-04 9:34 ` Tang Chen
2012-12-04 9:34 ` Tang Chen
2012-11-27 10:00 ` [Patch v4 07/12] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` [Patch v4 10/12] memory-hotplug: memory_hotplug: clear zone when removing the memory Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-12-04 10:09 ` Tang Chen
2012-12-04 10:09 ` Tang Chen
2012-12-04 10:09 ` Tang Chen
2012-12-04 10:09 ` Tang Chen
2012-11-27 10:00 ` [Patch v4 11/12] memory-hotplug: remove sysfs file of node Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-11-27 10:00 ` Wen Congyang
2012-12-04 10:10 ` Tang Chen
2012-12-04 10:10 ` Tang Chen
2012-12-04 10:10 ` Tang Chen
2012-12-04 10:10 ` Tang Chen
2012-11-27 19:27 ` [Patch v4 00/12] memory-hotplug: hot-remove physical memory Andrew Morton
2012-11-27 19:27 ` Andrew Morton
2012-11-27 19:27 ` Andrew Morton
2012-11-27 19:27 ` Andrew Morton
2012-11-27 19:38 ` Rafael J. Wysocki
2012-11-27 19:38 ` Rafael J. Wysocki
2012-11-27 19:38 ` Rafael J. Wysocki
2012-11-27 19:38 ` Rafael J. Wysocki
2012-11-28 0:43 ` Yasuaki Ishimatsu
2012-11-28 0:43 ` Yasuaki Ishimatsu
2012-11-28 0:43 ` Yasuaki Ishimatsu
2012-11-28 0:43 ` Yasuaki Ishimatsu
2012-11-28 0:43 ` Yasuaki Ishimatsu
2012-11-30 6:37 ` Tang Chen
2012-11-30 6:37 ` Tang Chen
2012-11-30 6:37 ` Tang Chen
2012-11-30 6:37 ` Tang Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50C14976.2050606@cn.fujitsu.com \
--to=tangchen@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=cl@linux.com \
--cc=cmetcalf@tilera.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=len.brown@intel.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-sh@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=liuj97@gmail.com \
--cc=minchan.kim@gmail.com \
--cc=paulus@samba.org \
--cc=rientjes@google.com \
--cc=sparclinux@vger.kernel.org \
--cc=wency@cn.fujitsu.com \
--cc=wujianguo@huawei.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.