public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tang Chen <tangchen@cn.fujitsu.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, Thomas Renninger <trenn@suse.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Subject: Re: [PATCH v4 00/22] x86, ACPI, numa: Parse numa info early
Date: Mon, 13 May 2013 10:59:24 +0800	[thread overview]
Message-ID: <5190570C.9010707@cn.fujitsu.com> (raw)
In-Reply-To: <CAE9FiQWPFL0qagAmkcyj_XfjWQtONtomwQPAimKUm+nJi5dHTw@mail.gmail.com>

Hi Yinghai,

On 05/10/2013 02:24 AM, Yinghai Lu wrote:
>>> So I suggest to separate the job into 2 parts:
>>> 1. Push Yinghai's patch1 ~ patch20, without putting pagetable in local
>>> node.
>>> And push my work to use SRAT to arrange ZONE_MOVABLE.
>>> In this case, we can enable memory hotplug in the kernel first.
>>> 2. Merge patch21 and patch22 into the fixing work I am doing now, and push
>>> them
>>> together when finished.
>>>
>
> no, no, no, please do not half-done work.
>
> Do it right, and Do it clean.
>

I'm not saying I want to do it half-way. Putting pagetable in local node
will make memory hot-remove patch unable to work.

Before removing pages, the kernel first offlines pages. If the offline logic
fails, the hot-remove cannot work. Since your patches have put node 
pagetable
in local node at boot time, this memory cannot be offlined, furthermore,
it cannot be hot-removed.

The minimum unit of memory online/offline is block. And by default,
one block contains one section, which by default is 128MB. So if parts
of a block are pagetable, and the rest parts are movable memory, this
block cannot be offlined. And as a result, it cannot be removed.

In order to fix it, we have three solutions:

1. Reserve the whole block (128MB), making no user can use the rest
    parts of the block. And skip them when offlining memory.
    When all the other blocks are offlined, free the pagetable, and remove
    all the memory.

    But we may lose some memory for this purpose. 128MB is a little big
    to waste.


2. Migrate movable pages and keep this block online. Although the offline
    operation fails, it is OK to remove memory.

    But the offline operation will always fail. And generally speaking,
    there are a lot of reasons of offline failing, it is difficult to
    detect if it is OK to remove memory.


3. Migrate user pages and make this block offline, but the kernel can
    still use the pagetable in it.

    But this will change the semantics of "offline". I'm not sure if we
    can do it in this way.


4. Do not allocate pagetable to local node when CONFIG_MEMORY_HOTREMOVE
    is enabled. (I do suggest not to put pagetable in local node in
    memory hot-remove situation.)


How do you think about these 4 solutions above ?

I think I need some advices for this problem in community. Do you have
any idea to fix this problem if we put pagetable in local node ?

The memory hot-plug guys do want to use memory hot-remove. And I think
for now, we use solution 4 above. When CONFIG_MEMORY_HOTREMOVE is enabled,
do not allocate pagetable to local node.

I'm not trying to do it half-way. When we fix this problem, we can allocate
pagetable to local node again with CONFIG_MEMORY_HOTREMOVE enabled.

Please do give some advices or feedback.


>>
>> If you have any thinking of this patch-set, please let me know.
>
> Talked to HPA, and he will put my patchset into tip/x86/mm after v3.10-rc1.
>
> after that we can work on put pagetable on local node for hotadd path.
>

hot-add path is another problem. But I think the hot-remove path is more
urgent now.


Thanks. :)

  reply	other threads:[~2013-05-13  2:56 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-12  0:55 [PATCH v4 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 01/22] x86: Change get_ramdisk_image() to global Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 02/22] x86, microcode: Use common get_ramdisk_image() Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 03/22] x86, ACPI, mm: Kill max_low_pfn_mapped Yinghai Lu
2013-06-05  8:36   ` Tang Chen
2013-04-12  0:55 ` [PATCH v4 04/22] x86, ACPI: Search buffer above 4G in second try for acpi override tables Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 05/22] x86, ACPI: Increase override tables number limit Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 06/22] x86, ACPI: Split acpi_initrd_override to find/copy two functions Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 07/22] x86, ACPI: Store override acpi tables phys addr in cpio files info array Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 08/22] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 09/22] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 10/22] x86, mm, numa: Move two functions calling on successful path later Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 11/22] x86, mm, numa: Call numa_meminfo_cover_memory() checking early Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 12/22] x86, mm, numa: Move node_map_pfn alignment() to x86 Yinghai Lu
2013-04-12  0:55 ` [PATCH v4 13/22] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment Yinghai Lu
2013-04-12  0:56 ` [PATCH v4 14/22] x86, mm, numa: Set memblock nid later Yinghai Lu
2013-04-12  0:56 ` [PATCH v4 15/22] x86, mm, numa: Move node_possible_map setting later Yinghai Lu
2013-04-12  0:56 ` [PATCH v4 16/22] x86, mm, numa: Move emulation handling down Yinghai Lu
2013-04-12  0:56 ` [PATCH v4 17/22] x86, ACPI, numa, ia64: split SLIT handling out Yinghai Lu
2013-04-12  0:56 ` [PATCH v4 18/22] x86, mm, numa: Add early_initmem_init() stub Yinghai Lu
2013-04-12  0:56 ` [PATCH v4 19/22] x86, mm: Parse numa info early Yinghai Lu
2013-04-12  0:56 ` [PATCH v4 20/22] x86, mm: Add comments for step_size shift Yinghai Lu
2013-04-12  0:56 ` [PATCH v4 21/22] x86, mm: Make init_mem_mapping be able to be called several times Yinghai Lu
2013-04-12  0:56 ` [PATCH v4 22/22] x86, mm, numa: Put pagetable on local node ram for 64bit Yinghai Lu
2013-04-26  8:58 ` [PATCH v4 00/22] x86, ACPI, numa: Parse numa info early Tang Chen
2013-04-30  7:21 ` Tang Chen
2013-05-06  9:49   ` Tang Chen
2013-05-09  8:54   ` Tang Chen
2013-05-09 18:24     ` Yinghai Lu
2013-05-13  2:59       ` Tang Chen [this message]
2013-05-14  9:06         ` Tang Chen
2013-05-22  5:14       ` Tang Chen
2013-05-22  5:18         ` H. Peter Anvin
2013-06-03  6:01           ` Tang Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5190570C.9010707@cn.fujitsu.com \
    --to=tangchen@cn.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=trenn@suse.de \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox