From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Xishi Qiu <qiuxishi@huawei.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
tony.luck@intel.com, mel@csn.ul.ie, akpm@linux-foundation.org,
Dave Hansen <dave.hansen@intel.com>, Mel Gorman <mgorman@suse.de>,
Ingo Molnar <mingo@kernel.org>,
zhongjiang@huawei.com
Subject: Re: [PATCH][RFC] mm: Introduce kernelcore=reliable option
Date: Tue, 13 Oct 2015 18:51:17 +0900 [thread overview]
Message-ID: <561CD415.9010804@jp.fujitsu.com> (raw)
In-Reply-To: <5617989E.9070700@huawei.com>
On 2015/10/09 19:36, Xishi Qiu wrote:
> On 2015/10/9 17:24, Kamezawa Hiroyuki wrote:
>
>> On 2015/10/09 15:46, Xishi Qiu wrote:
>>> On 2015/10/9 22:56, Taku Izumi wrote:
>>>
>>>> Xeon E7 v3 based systems supports Address Range Mirroring
>>>> and UEFI BIOS complied with UEFI spec 2.5 can notify which
>>>> ranges are reliable (mirrored) via EFI memory map.
>>>> Now Linux kernel utilize its information and allocates
>>>> boot time memory from reliable region.
>>>>
>>>> My requirement is:
>>>> - allocate kernel memory from reliable region
>>>> - allocate user memory from non-reliable region
>>>>
>>>> In order to meet my requirement, ZONE_MOVABLE is useful.
>>>> By arranging non-reliable range into ZONE_MOVABLE,
>>>> reliable memory is only used for kernel allocations.
>>>>
>>>
>>> Hi Taku,
>>>
>>> You mean set non-mirrored memory to movable zone, and set
>>> mirrored memory to normal zone, right? So kernel allocations
>>> will use mirrored memory in normal zone, and user allocations
>>> will use non-mirrored memory in movable zone.
>>>
>>> My question is:
>>> 1) do we need to change the fallback function?
>>
>> For *our* requirement, it's not required. But if someone want to prevent
>> user's memory allocation from NORMAL_ZONE, we need some change in zonelist
>> walking.
>>
>
> Hi Kame,
>
> So we assume kernel will only use normal zone(mirrored), and users use movable
> zone(non-mirrored) first if the memory is not enough, then use normal zone too.
>
Yes.
>>> 2) the mirrored region should locate at the start of normal
>>> zone, right?
>>
>> Precisely, "not-reliable" range of memory are handled by ZONE_MOVABLE.
>> This patch does only that.
>
> I mean the mirrored region can not at the middle or end of the zone,
> BIOS should report the memory like this,
>
> e.g.
> BIOS
> node0: 0-4G mirrored, 4-8G mirrored, 8-16G non-mirrored
> node1: 16-24G mirrored, 24-32G non-mirrored
>
> OS
> node0: DMA DMA32 are both mirrored, NORMAL(4-8G), MOVABLE(8-16G)
> node1: NORMAL(16-24G), MOVABLE(24-32G)
>
I think zones can be overlapped even while they are aligned to MAX_ORDER.
>>
>>>
>>> I remember Kame has already suggested this idea. In my opinion,
>>> I still think it's better to add a new migratetype or a new zone,
>>> so both user and kernel could use mirrored memory.
>>
>> Hi, Xishi.
>>
>> I and Izumi-san discussed the implementation much and found using "zone"
>> is better approach.
>>
>> The biggest reason is that zone is a unit of vmscan and all statistics and
>> handling the range of memory for a purpose. We can reuse all vmscan and
>> information codes by making use of zones. Introdcing other structure will be messy.
>
> Yes, add a new zone is better, but it will change much code, so reuse ZONE_MOVABLE
> is simpler and easier, right?
>
I think so. If someone feels difficulty with ZONE_MOVABLE, adding zone will be another job.
(*)Taku-san's bootoption is to specify kernelcore to be placed into reliable memory and
doesn't specify anything about users.
>> His patch is very simple.
>>
>
> The following plan sounds good to me. Shall we rename the zone name when it is
> used for mirrored memory, "movable" is a little confusion.
>
Maybe. I think it should be another discussion. With this patch and his fake-reliable-memory
patch, everyone can give a try.
>> For your requirements. I and Izumi-san are discussing following plan.
>>
>> - Add a flag to show the zone is reliable or not, then, mark ZONE_MOVABLE as not-reliable.
>> - Add __GFP_RELIABLE. This will allow alloc_pages() to skip not-reliable zone.
>> - Add madivse() MADV_RELIABLE and modify page fault code's gfp flag with that flag.
>>
>
> like this?
> user: madvise()/mmap()/or others -> add vma_reliable flag -> add gfp_reliable flag -> alloc_pages
> kernel: use __GFP_RELIABLE flag in buddy allocation/slab/vmalloc...
yes.
>
> Also we can introduce some interfaces in procfs or sysfs, right?
>
It's based on your use case. I think madvise() will be the 1st choice.
Thanks,
-kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Xishi Qiu <qiuxishi@huawei.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
tony.luck@intel.com, mel@csn.ul.ie, akpm@linux-foundation.org,
Dave Hansen <dave.hansen@intel.com>, Mel Gorman <mgorman@suse.de>,
Ingo Molnar <mingo@kernel.org>,
zhongjiang@huawei.com
Subject: Re: [PATCH][RFC] mm: Introduce kernelcore=reliable option
Date: Tue, 13 Oct 2015 18:51:17 +0900 [thread overview]
Message-ID: <561CD415.9010804@jp.fujitsu.com> (raw)
In-Reply-To: <5617989E.9070700@huawei.com>
On 2015/10/09 19:36, Xishi Qiu wrote:
> On 2015/10/9 17:24, Kamezawa Hiroyuki wrote:
>
>> On 2015/10/09 15:46, Xishi Qiu wrote:
>>> On 2015/10/9 22:56, Taku Izumi wrote:
>>>
>>>> Xeon E7 v3 based systems supports Address Range Mirroring
>>>> and UEFI BIOS complied with UEFI spec 2.5 can notify which
>>>> ranges are reliable (mirrored) via EFI memory map.
>>>> Now Linux kernel utilize its information and allocates
>>>> boot time memory from reliable region.
>>>>
>>>> My requirement is:
>>>> - allocate kernel memory from reliable region
>>>> - allocate user memory from non-reliable region
>>>>
>>>> In order to meet my requirement, ZONE_MOVABLE is useful.
>>>> By arranging non-reliable range into ZONE_MOVABLE,
>>>> reliable memory is only used for kernel allocations.
>>>>
>>>
>>> Hi Taku,
>>>
>>> You mean set non-mirrored memory to movable zone, and set
>>> mirrored memory to normal zone, right? So kernel allocations
>>> will use mirrored memory in normal zone, and user allocations
>>> will use non-mirrored memory in movable zone.
>>>
>>> My question is:
>>> 1) do we need to change the fallback function?
>>
>> For *our* requirement, it's not required. But if someone want to prevent
>> user's memory allocation from NORMAL_ZONE, we need some change in zonelist
>> walking.
>>
>
> Hi Kame,
>
> So we assume kernel will only use normal zone(mirrored), and users use movable
> zone(non-mirrored) first if the memory is not enough, then use normal zone too.
>
Yes.
>>> 2) the mirrored region should locate at the start of normal
>>> zone, right?
>>
>> Precisely, "not-reliable" range of memory are handled by ZONE_MOVABLE.
>> This patch does only that.
>
> I mean the mirrored region can not at the middle or end of the zone,
> BIOS should report the memory like this,
>
> e.g.
> BIOS
> node0: 0-4G mirrored, 4-8G mirrored, 8-16G non-mirrored
> node1: 16-24G mirrored, 24-32G non-mirrored
>
> OS
> node0: DMA DMA32 are both mirrored, NORMAL(4-8G), MOVABLE(8-16G)
> node1: NORMAL(16-24G), MOVABLE(24-32G)
>
I think zones can be overlapped even while they are aligned to MAX_ORDER.
>>
>>>
>>> I remember Kame has already suggested this idea. In my opinion,
>>> I still think it's better to add a new migratetype or a new zone,
>>> so both user and kernel could use mirrored memory.
>>
>> Hi, Xishi.
>>
>> I and Izumi-san discussed the implementation much and found using "zone"
>> is better approach.
>>
>> The biggest reason is that zone is a unit of vmscan and all statistics and
>> handling the range of memory for a purpose. We can reuse all vmscan and
>> information codes by making use of zones. Introdcing other structure will be messy.
>
> Yes, add a new zone is better, but it will change much code, so reuse ZONE_MOVABLE
> is simpler and easier, right?
>
I think so. If someone feels difficulty with ZONE_MOVABLE, adding zone will be another job.
(*)Taku-san's bootoption is to specify kernelcore to be placed into reliable memory and
doesn't specify anything about users.
>> His patch is very simple.
>>
>
> The following plan sounds good to me. Shall we rename the zone name when it is
> used for mirrored memory, "movable" is a little confusion.
>
Maybe. I think it should be another discussion. With this patch and his fake-reliable-memory
patch, everyone can give a try.
>> For your requirements. I and Izumi-san are discussing following plan.
>>
>> - Add a flag to show the zone is reliable or not, then, mark ZONE_MOVABLE as not-reliable.
>> - Add __GFP_RELIABLE. This will allow alloc_pages() to skip not-reliable zone.
>> - Add madivse() MADV_RELIABLE and modify page fault code's gfp flag with that flag.
>>
>
> like this?
> user: madvise()/mmap()/or others -> add vma_reliable flag -> add gfp_reliable flag -> alloc_pages
> kernel: use __GFP_RELIABLE flag in buddy allocation/slab/vmalloc...
yes.
>
> Also we can introduce some interfaces in procfs or sysfs, right?
>
It's based on your use case. I think madvise() will be the 1st choice.
Thanks,
-kame
next prev parent reply other threads:[~2015-10-13 9:51 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-09 14:56 [PATCH][RFC] mm: Introduce kernelcore=reliable option Taku Izumi
2015-10-09 14:56 ` Taku Izumi
2015-10-09 6:46 ` Xishi Qiu
2015-10-09 6:46 ` Xishi Qiu
2015-10-09 9:24 ` Kamezawa Hiroyuki
2015-10-09 9:24 ` Kamezawa Hiroyuki
2015-10-09 10:36 ` Xishi Qiu
2015-10-09 10:36 ` Xishi Qiu
2015-10-09 15:08 ` Dave Hansen
2015-10-09 15:08 ` Dave Hansen
2015-10-09 18:51 ` Luck, Tony
2015-10-09 18:51 ` Luck, Tony
2015-10-12 10:32 ` Matt Fleming
2015-10-12 10:32 ` Matt Fleming
2015-10-10 2:01 ` Xishi Qiu
2015-10-10 2:01 ` Xishi Qiu
2015-10-12 18:43 ` Luck, Tony
2015-10-12 18:43 ` Luck, Tony
2015-10-13 9:51 ` Kamezawa Hiroyuki [this message]
2015-10-13 9:51 ` Kamezawa Hiroyuki
2015-10-09 21:43 ` Luck, Tony
2015-10-09 21:43 ` Luck, Tony
2015-10-14 1:19 ` Izumi, Taku
2015-10-14 1:19 ` Izumi, Taku
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=561CD415.9010804@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=qiuxishi@huawei.com \
--cc=tony.luck@intel.com \
--cc=zhongjiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.