Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Xishi Qiu <qiuxishi@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	Hanjun Guo <guohanjun@huawei.com>, Xiexiuqi <xiexiuqi@huawei.com>,
	leon@leon.nu, Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Vlastimil Babka <vbabka@suse.cz>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations
Date: Tue, 30 Jun 2015 12:53:53 +0100	[thread overview]
Message-ID: <20150630115353.GB6812@suse.de> (raw)
In-Reply-To: <20150630104654.GA24932@gmail.com>

On Tue, Jun 30, 2015 at 12:46:54PM +0200, Ingo Molnar wrote:
> 
> * Mel Gorman <mgorman@suse.de> wrote:
> 
> > [...]
> > 
> > Basically, overall I feel this series is the wrong approach but not knowing who 
> > the users are making is much harder to judge. I strongly suspect that if 
> > mirrored memory is to be properly used then it needs to be available before the 
> > page allocator is even active. Once active, there needs to be controlled access 
> > for allocation requests that are really critical to mirror and not just all 
> > kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be 
> > alterations to the bootmem allocator and access to an explicit reserve that is 
> > not accounted for as "free memory" and accessed via an explicit GFP flag.
> 
> So I think the main goal is to avoid kernel crashes when a #MC memory fault 
> arrives on a piece of memory that is owned by the kernel.
> 

Sounds logical. In that case, bootmem awareness would be crucial.
Enabling support in just the page allocator is too late.

> In that sense 'protecting' all kernel allocations is natural: we don't know how to 
> recover from faults that affect kernel memory.
> 

It potentially uses all mirrored memory on memory that does not need that
sort of guarantee. For example, if there was a MC on memory backing the
inode cache then potentially that is recoverable as long as the inodes
were not dirty. That's a minor detail as the kernel could later protect
only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fatal
MC in kernel space could be distinguished from non-fatal checks.

Bootmem awareness is much more important either way. If that was addressed
then potentially a MIGRATE_UNMOVABLE_MIRROR type could be created that
is only used for MIGRATE_UNMOVABLE allocations and never for user-space.
That misses MIGRATE_RECLAIMABLE so if that is required then we need
something else that both preserves fragmentation avoidance and avoid
introducing loads of new migratetypes.

Reclaim-related issues could be partially avoided by forbidding use from
userspace and accounting for the size of MIGRATE_UNMOVABLE_MIRROR during
watermark checks.

> We do know how to recover from faults that affect user-space memory alone.
> 
> So if a mechanism is in place that prioritizes 3 groups of allocators:
> 
>   - non-recoverable memory (kernel allocations mostly)
> 

So bootmem at the very least followed by MIGRATE_UNMOVABLE requests whether
they are accounted for by zones of MIGRATE_TYPES.

>   - high priority user memory (critical apps that must never fail)
> 

This one is problematic with a MIGRATE_TYPE-based approach such as the one in
this series. If a high priority requires memory and MIGRATE_MIRROR is full
then some of it must be reclaimed. With a MIGRATE_TYPE approach, the kernel
may reclaim a lot of unnecessary memory trying to free some MIGRATE_MIRROR
memory with no guarantee of success. It'll look like unnecessary thrashing
from userspace but difficult to diagnose as reclaim stats are per-zone based.
Dealing with this needs either a zone-based approach or a lot of surgery
to reclaim (similar to what the node-based LRU series does actually when
it skips pages when the caller requires lowmem pages).

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Xishi Qiu <qiuxishi@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	Hanjun Guo <guohanjun@huawei.com>, Xiexiuqi <xiexiuqi@huawei.com>,
	leon@leon.nu, Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Vlastimil Babka <vbabka@suse.cz>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations
Date: Tue, 30 Jun 2015 12:53:53 +0100	[thread overview]
Message-ID: <20150630115353.GB6812@suse.de> (raw)
In-Reply-To: <20150630104654.GA24932@gmail.com>

On Tue, Jun 30, 2015 at 12:46:54PM +0200, Ingo Molnar wrote:
> 
> * Mel Gorman <mgorman@suse.de> wrote:
> 
> > [...]
> > 
> > Basically, overall I feel this series is the wrong approach but not knowing who 
> > the users are making is much harder to judge. I strongly suspect that if 
> > mirrored memory is to be properly used then it needs to be available before the 
> > page allocator is even active. Once active, there needs to be controlled access 
> > for allocation requests that are really critical to mirror and not just all 
> > kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be 
> > alterations to the bootmem allocator and access to an explicit reserve that is 
> > not accounted for as "free memory" and accessed via an explicit GFP flag.
> 
> So I think the main goal is to avoid kernel crashes when a #MC memory fault 
> arrives on a piece of memory that is owned by the kernel.
> 

Sounds logical. In that case, bootmem awareness would be crucial.
Enabling support in just the page allocator is too late.

> In that sense 'protecting' all kernel allocations is natural: we don't know how to 
> recover from faults that affect kernel memory.
> 

It potentially uses all mirrored memory on memory that does not need that
sort of guarantee. For example, if there was a MC on memory backing the
inode cache then potentially that is recoverable as long as the inodes
were not dirty. That's a minor detail as the kernel could later protect
only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fatal
MC in kernel space could be distinguished from non-fatal checks.

Bootmem awareness is much more important either way. If that was addressed
then potentially a MIGRATE_UNMOVABLE_MIRROR type could be created that
is only used for MIGRATE_UNMOVABLE allocations and never for user-space.
That misses MIGRATE_RECLAIMABLE so if that is required then we need
something else that both preserves fragmentation avoidance and avoid
introducing loads of new migratetypes.

Reclaim-related issues could be partially avoided by forbidding use from
userspace and accounting for the size of MIGRATE_UNMOVABLE_MIRROR during
watermark checks.

> We do know how to recover from faults that affect user-space memory alone.
> 
> So if a mechanism is in place that prioritizes 3 groups of allocators:
> 
>   - non-recoverable memory (kernel allocations mostly)
> 

So bootmem at the very least followed by MIGRATE_UNMOVABLE requests whether
they are accounted for by zones of MIGRATE_TYPES.

>   - high priority user memory (critical apps that must never fail)
> 

This one is problematic with a MIGRATE_TYPE-based approach such as the one in
this series. If a high priority requires memory and MIGRATE_MIRROR is full
then some of it must be reclaimed. With a MIGRATE_TYPE approach, the kernel
may reclaim a lot of unnecessary memory trying to free some MIGRATE_MIRROR
memory with no guarantee of success. It'll look like unnecessary thrashing
from userspace but difficult to diagnose as reclaim stats are per-zone based.
Dealing with this needs either a zone-based approach or a lot of surgery
to reclaim (similar to what the node-based LRU series does actually when
it skips pages when the caller requires lowmem pages).

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2015-06-30 11:54 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-27  2:19 [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Xishi Qiu
2015-06-27  2:19 ` Xishi Qiu
2015-06-27  2:23 ` [RFC v2 PATCH 1/8] mm: add a new config to manage the code Xishi Qiu
2015-06-27  2:23   ` Xishi Qiu
2015-06-29  6:50   ` Kamezawa Hiroyuki
2015-06-29  6:50     ` Kamezawa Hiroyuki
2015-06-30  2:52     ` Xishi Qiu
2015-06-30  2:52       ` Xishi Qiu
2015-06-27  2:24 ` [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages Xishi Qiu
2015-06-27  2:24   ` Xishi Qiu
2015-06-29  7:32   ` Kamezawa Hiroyuki
2015-06-29  7:32     ` Kamezawa Hiroyuki
2015-06-30  2:45     ` Xishi Qiu
2015-06-30  2:45       ` Xishi Qiu
2015-06-30  7:53       ` Kamezawa Hiroyuki
2015-06-30  7:53         ` Kamezawa Hiroyuki
2015-06-30  9:22         ` Xishi Qiu
2015-06-30  9:22           ` Xishi Qiu
2015-06-27  2:24 ` [RFC v2 PATCH 3/8] mm: find mirrored memory in memblock Xishi Qiu
2015-06-27  2:24   ` Xishi Qiu
2015-06-27  2:25 ` [RFC v2 PATCH 4/8] mm: add mirrored memory to buddy system Xishi Qiu
2015-06-27  2:25   ` Xishi Qiu
2015-06-29  7:39   ` Kamezawa Hiroyuki
2015-06-29  7:39     ` Kamezawa Hiroyuki
2015-06-27  2:26 ` [RFC v2 PATCH 5/8] mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES Xishi Qiu
2015-06-27  2:26   ` Xishi Qiu
2015-06-27  2:27 ` [RFC v2 PATCH 6/8] mm: add free mirrored pages info Xishi Qiu
2015-06-27  2:27   ` Xishi Qiu
2015-06-27  2:27 ` [RFC v2 PATCH 7/8] mm: add the buddy system interface Xishi Qiu
2015-06-27  2:27   ` Xishi Qiu
2015-06-29 23:11   ` Luck, Tony
2015-06-29 23:11     ` Luck, Tony
2015-06-30  1:01     ` Kamezawa Hiroyuki
2015-06-30  1:01       ` Kamezawa Hiroyuki
2015-06-30  1:31       ` Xishi Qiu
2015-06-30  1:31         ` Xishi Qiu
2015-06-30  2:01         ` Kamezawa Hiroyuki
2015-06-30  2:01           ` Kamezawa Hiroyuki
2015-06-27  2:28 ` [RFC v2 PATCH 8/8] mm: add the PCP interface Xishi Qiu
2015-06-27  2:28   ` Xishi Qiu
2015-06-29 15:19 ` [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Dave Hansen
2015-06-29 15:19   ` Dave Hansen
2015-06-30  1:26   ` Xishi Qiu
2015-06-30  1:26     ` Xishi Qiu
2015-06-30  1:52     ` Dave Hansen
2015-06-30  1:52       ` Dave Hansen
2015-06-30  2:48       ` Xishi Qiu
2015-06-30  2:48         ` Xishi Qiu
2015-06-30  9:41 ` Mel Gorman
2015-06-30  9:41   ` Mel Gorman
2015-06-30 10:46   ` Ingo Molnar
2015-06-30 10:46     ` Ingo Molnar
2015-06-30 11:53     ` Mel Gorman [this message]
2015-06-30 11:53       ` Mel Gorman
2015-06-30 18:12       ` Luck, Tony
2015-06-30 18:12         ` Luck, Tony
2015-07-13  4:56       ` Xishi Qiu
2015-07-13  4:56         ` Xishi Qiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150630115353.GB6812@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=guohanjun@huawei.com \
    --cc=hpa@zytor.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=leon@leon.nu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=qiuxishi@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    --cc=xiexiuqi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.