From: Xishi Qiu <qiuxishi@huawei.com>
To: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Hanjun Guo <guohanjun@huawei.com>, Xiexiuqi <xiexiuqi@huawei.com>
Subject: Re: [PATCHv2 0/3] Find mirrored memory, use for boot time allocations
Date: Tue, 19 May 2015 11:01:22 +0800 [thread overview]
Message-ID: <555AA782.2070603@huawei.com> (raw)
In-Reply-To: <cover.1431103461.git.tony.luck@intel.com>
On 2015/5/9 0:44, Tony Luck wrote:
> Some high end Intel Xeon systems report uncorrectable memory errors
> as a recoverable machine check. Linux has included code for some time
> to process these and just signal the affected processes (or even
> recover completely if the error was in a read only page that can be
> replaced by reading from disk).
>
> But we have no recovery path for errors encountered during kernel
> code execution. Except for some very specific cases were are unlikely
> to ever be able to recover.
>
> Enter memory mirroring. Actually 3rd generation of memory mirroing.
>
> Gen1: All memory is mirrored
> Pro: No s/w enabling - h/w just gets good data from other side of the mirror
> Con: Halves effective memory capacity available to OS/applications
> Gen2: Partial memory mirror - just mirror memory begind some memory controllers
> Pro: Keep more of the capacity
> Con: Nightmare to enable. Have to choose between allocating from
> mirrored memory for safety vs. NUMA local memory for performance
> Gen3: Address range partial memory mirror - some mirror on each memory controller
> Pro: Can tune the amount of mirror and keep NUMA performance
> Con: I have to write memory management code to implement
>
> The current plan is just to use mirrored memory for kernel allocations. This
> has been broken into two phases:
> 1) This patch series - find the mirrored memory, use it for boot time allocations
> 2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the unused
> mirrored memory from mm/memblock.c and only give it out to select kernel
> allocations (this is still being scoped because page_alloc.c is scary).
>
Hi Tony,
In part2, does it means the memory allocated from kernel should use mirrored memory?
I have heard of this feature(address range mirroring) before, and I changed some
code to test it(implement memory allocations in specific physical areas).
In my opinion, add a new zone(ZONE_MIRROR) to fill the mirrored memory is not a good
idea. If there are XX discontiguous mirrored areas in one numa node, there should be
XX ZONE_MIRROR zones in one pgdat, it is impossible, right?
I think add a new migrate type(MIGRATE_MIRROR) will be better, the following print
is from my changed kernel.
[root@localhost ~]# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 0 3
Node 0, zone DMA, type Mirror 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 1 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 14 7 6 1 3 0 1 0 0 0 0
Node 0, zone DMA32, type Reclaimable 15 2 2 1 1 2 1 1 0 0 0
Node 0, zone DMA32, type Movable 3 24 52 58 31 2 1 1 1 3 231
Node 0, zone DMA32, type Mirror 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Reserve 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 80 12 6 7 3 1 67 58 23 11 0
Node 0, zone Normal, type Reclaimable 6 6 8 11 5 3 0 1 0 0 0
Node 0, zone Normal, type Movable 6 198 618 675 363 13 4 3 0 2 4074
Node 0, zone Normal, type Mirror 0 0 0 0 0 0 0 0 0 0 1024
Node 0, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Reclaimable Movable Mirror Reserve CMA Isolate
Node 0, zone DMA 1 0 6 0 1 0 0
Node 0, zone DMA32 8 32 975 0 1 0 0
Node 0, zone Normal 216 334 12760 2048 2 0 0
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 1, zone Normal, type Unmovable 18 2 19 3 21 28 13 0 1 1 0
Node 1, zone Normal, type Reclaimable 0 1 1 1 0 0 1 0 0 1 0
Node 1, zone Normal, type Movable 6 13 9 3 0 4 5 0 1 0 6970
Node 1, zone Normal, type Mirror 0 0 0 0 0 0 0 0 0 0 1024
Node 1, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 1
Node 1, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 1, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Reclaimable Movable Mirror Reserve CMA Isolate
Node 1, zone Normal 112 4 14218 2048 2 0 0
Also I add a new flag(GFP_MIRROR), then we can use the mirrored form both
kernel-space and user-space. If there is no mirrored memory, we will allocate
other types memory.
1) kernel-space(pcp, page buddy, slab/slub ...):
-> use mirrored memory(e.g. /proc/sys/vm/mirrorable)
-> __alloc_pages_nodemask()
->gfpflags_to_migratetype()
-> use MIGRATE_MIRROR list
2) user-space(syscall, madvise, mmap ...):
-> add VM_MIRROR flag in the vma
-> add GFP_MIRROR when page fault in the vma
-> __alloc_pages_nodemask()
-> use MIGRATE_MIRROR list
Thanks,
Xishi Qiu
> Tony Luck (3):
> mm/memblock: Add extra "flags" to memblock to allow selection of
> memory based on attribute
> mm/memblock: Allocate boot time data structures from mirrored memory
> x86, mirror: x86 enabling - find mirrored memory ranges
>
> arch/s390/kernel/crash_dump.c | 5 +-
> arch/sparc/mm/init_64.c | 6 ++-
> arch/x86/kernel/check.c | 3 +-
> arch/x86/kernel/e820.c | 3 +-
> arch/x86/kernel/setup.c | 3 ++
> arch/x86/mm/init_32.c | 2 +-
> arch/x86/platform/efi/efi.c | 21 ++++++++
> include/linux/efi.h | 3 ++
> include/linux/memblock.h | 49 +++++++++++------
> mm/cma.c | 6 ++-
> mm/memblock.c | 123 +++++++++++++++++++++++++++++++++---------
> mm/memtest.c | 3 +-
> mm/nobootmem.c | 14 ++++-
> 13 files changed, 188 insertions(+), 53 deletions(-)
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-05-19 3:04 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-08 16:44 [PATCHv2 0/3] Find mirrored memory, use for boot time allocations Tony Luck
2015-05-07 22:17 ` [PATCHv2 1/3] mm/memblock: Add extra "flags" to memblock to allow selection of memory based on attribute Tony Luck
2015-05-07 22:18 ` [PATCHv2 2/3] mm/memblock: Allocate boot time data structures from mirrored memory Tony Luck
2015-05-07 22:19 ` [PATCHv2 3/3] x86, mirror: x86 enabling - find mirrored memory ranges Tony Luck
2015-05-08 20:03 ` [PATCHv2 0/3] Find mirrored memory, use for boot time allocations Andrew Morton
2015-05-08 20:38 ` Tony Luck
2015-05-08 20:49 ` Andrew Morton
2015-05-08 23:41 ` Tony Luck
2015-05-19 3:01 ` Xishi Qiu [this message]
2015-05-19 4:48 ` Tony Luck
2015-05-19 6:37 ` Xishi Qiu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=555AA782.2070603@huawei.com \
--to=qiuxishi@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=guohanjun@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tony.luck@intel.com \
--cc=xiexiuqi@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).