From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=lSp8=EC=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.2 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,
	SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 619A4C55179
	for <linux-mm@archiver.kernel.org>; Tue, 27 Oct 2020 15:29:09 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 1008922264
	for <linux-mm@archiver.kernel.org>; Tue, 27 Oct 2020 15:29:08 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1008922264
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 5B9136B0073; Tue, 27 Oct 2020 11:29:08 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 567206B0074; Tue, 27 Oct 2020 11:29:08 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 42F256B0075; Tue, 27 Oct 2020 11:29:08 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0082.hostedemail.com [216.40.44.82])
	by kanga.kvack.org (Postfix) with ESMTP id 087B16B0073
	for <linux-mm@kvack.org>; Tue, 27 Oct 2020 11:29:07 -0400 (EDT)
Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id 9FBF28249980
	for <linux-mm@kvack.org>; Tue, 27 Oct 2020 15:29:07 +0000 (UTC)
X-FDA: 77418088734.14.lead06_1d0a65b2727d
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin14.hostedemail.com (Postfix) with ESMTP id 7612518229837
	for <linux-mm@kvack.org>; Tue, 27 Oct 2020 15:29:07 +0000 (UTC)
X-HE-Tag: lead06_1d0a65b2727d
X-Filterd-Recvd-Size: 5287
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
	by imf23.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 27 Oct 2020 15:29:06 +0000 (UTC)
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.221.27])
	by mx2.suse.de (Postfix) with ESMTP id 8A00FADD9;
	Tue, 27 Oct 2020 15:29:05 +0000 (UTC)
Date: Tue, 27 Oct 2020 16:29:02 +0100
From: Oscar Salvador <osalvador@suse.de>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	yang.shi@linux.alibaba.com, rientjes@google.com,
	ying.huang@intel.com, dan.j.williams@intel.com
Subject: Re: [RFC][PATCH 5/9] mm/migrate: demote pages during reclaim
Message-ID: <20201027152858.GA11135@linux>
References: <20201007161736.ACC6E387@viggo.jf.intel.com>
 <20201007161745.26B1D789@viggo.jf.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20201007161745.26B1D789@viggo.jf.intel.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Oct 07, 2020 at 09:17:45AM -0700, Dave Hansen wrote:
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>

I am still going through all the details, but just my thoughts on things
that caught my eye:

> --- a/include/linux/migrate.h~demote-with-migrate_pages	2020-10-07 09:15:31.028642442 -0700
> +++ b/include/linux/migrate.h	2020-10-07 09:15:31.034642442 -0700
> @@ -27,6 +27,7 @@ enum migrate_reason {
>  	MR_MEMPOLICY_MBIND,
>  	MR_NUMA_MISPLACED,
>  	MR_CONTIG_RANGE,
> +	MR_DEMOTION,
>  	MR_TYPES

I think you also need to add it under include/trace/events/migrate.h, so
mm_migrate_pages event can know about it.

> +bool migrate_demote_page_ok(struct page *page, struct scan_control *sc)

Make it static?
Also, scan_control seems to be unused here.

> +{
> +	int next_nid = next_demotion_node(page_to_nid(page));
> +
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);

Right after the call to migrate_demote_page_ok, we call unlock_page
which already has this check in place.
I know that this is only to be on the safe side and we do not loss anything,
but just my thoughts.

> +static struct page *alloc_demote_page(struct page *page, unsigned long node)
> +{
> +	/*
> +	 * Try to fail quickly if memory on the target node is not
> +	 * available.  Leaving out __GFP_IO and __GFP_FS helps with
> +	 * this.  If the desintation node is full, we want kswapd to
> +	 * run there so that its pages will get reclaimed and future
> +	 * migration attempts may succeed.
> +	 */
> +	gfp_t flags = (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_NORETRY |
> +		       __GFP_NOMEMALLOC | __GFP_NOWARN | __GFP_THISNODE |
> +		       __GFP_KSWAPD_RECLAIM);

I think it would be nicer to have this as a real GFP_ thingy defined.
e.g: GFP_DEMOTION

> +	/* HugeTLB pages should not be on the LRU */
> +	WARN_ON_ONCE(PageHuge(page));

I am not sure about this one.
This could only happen if the page, which now it is in another list, ends up in
the buddy system. That is quite unlikely bth.
And nevertheless, this is only a warning, which means that if this scenario gets
to happen, we will be allocating a single page to satisfy a higher-order page, and
I am not sure about the situation we will end up with.

> +
> +	if (PageTransHuge(page)) {
> +		struct page *thp;
> +
> +		flags |= __GFP_COMP;
> +
> +		thp = alloc_pages_node(node, flags, HPAGE_PMD_ORDER);
> +		if (!thp)
> +			return NULL;
> +		prep_transhuge_page(thp);
> +		return thp;
> +	}
> +
> +	return __alloc_pages_node(node, flags, 0);

Would make sense to transform this in some sort of new_demotion_page,
which actually calls alloc_migration_target with the right stuff in place?
And then pass a struct migration_target_control so alloc_migration_target
does the right thing.
alloc_migration_target also takes care of calling prep_transhuge_page
when needed.
e.g:

static struct page *new_demotion_node(struct page *page, unsigned long private)
{
        struct migration_target_control mtc = {
                .nid = private,
                .gfp_mask = GFP_DEMOTION,
        };

        if (PageTransHuge(page))
                mtc.gfp_mask |= __GFP_COMP;

        return alloc_migration_target(page, (unsigned long)&mtc);
}

The only thing I see is that alloc_migration_target seems to "override"
the gfp_mask and does ORs GFP_TRANSHUGE for THP pages, which includes
__GFP_DIRECT_RECLAIM (not appreciated in this case).
But maybe this can be worked around by checking if gfp_mask == GFP_DEMOTION,
and if so, just keep the mask as it is.

> +
> +	if (list_empty(demote_pages))
> +		return 0;
> +
> +	/* Demotion ignores all cpuset and mempolicy settings */
> +	err = migrate_pages(demote_pages, alloc_demote_page, NULL,
> +			    target_nid, MIGRATE_ASYNC, MR_DEMOTION,
> +			    &nr_succeeded);

As I said, instead of alloc_demote_page, use a new_demote_page and make
alloc_migration_target handle the allocations and prep thp pages.


-- 
Oscar Salvador
SUSE L3