From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=VkKC=B7=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3BC49C433E1
	for <linux-mm@archiver.kernel.org>; Fri, 21 Aug 2020 16:18:04 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 0168C22CAF
	for <linux-mm@archiver.kernel.org>; Fri, 21 Aug 2020 16:18:03 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RIu0IhIm"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0168C22CAF
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 8F1E78D0065; Fri, 21 Aug 2020 12:18:03 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8A0708D0013; Fri, 21 Aug 2020 12:18:03 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 769608D0065; Fri, 21 Aug 2020 12:18:03 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0090.hostedemail.com [216.40.44.90])
	by kanga.kvack.org (Postfix) with ESMTP id 5715B8D0013
	for <linux-mm@kvack.org>; Fri, 21 Aug 2020 12:18:03 -0400 (EDT)
Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id E683D824556B
	for <linux-mm@kvack.org>; Fri, 21 Aug 2020 16:18:02 +0000 (UTC)
X-FDA: 77175082404.11.soup45_12159312703a
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin11.hostedemail.com (Postfix) with ESMTP id AF35E180F8B86
	for <linux-mm@kvack.org>; Fri, 21 Aug 2020 16:18:02 +0000 (UTC)
X-HE-Tag: soup45_12159312703a
X-Filterd-Recvd-Size: 6791
Received: from mail-ej1-f66.google.com (mail-ej1-f66.google.com [209.85.218.66])
	by imf08.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri, 21 Aug 2020 16:18:01 +0000 (UTC)
Received: by mail-ej1-f66.google.com with SMTP id a26so2976392ejc.2
        for <linux-mm@kvack.org>; Fri, 21 Aug 2020 09:18:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=Ptn3HzexirT3XdAWduQkOmakKONtwiNCxuYSboBYnto=;
        b=RIu0IhImYyZb+DCQMBwkJ+3WPSis6jTVueamCSe5QQipks9UerbYdyD8Sq4CLTzzNk
         XTVnXAq/vHXlH/jKZsfTBdR8qbQzJxrWAomXDLBlKiNGK3tYbaNSdTTRiIRetwiTZ3Q9
         a+sQ0FP2pAqHggqQFBJNTPDLS7XiuUSb8nNxkchFktftLWgLOqvuvwi9+VDrRfM8Td9W
         Jt6kAy3v6P4EOnxQEbw96hCMvOUQMKkXa2VOts+BmZ4Ov7+MrlVefcMGlH08eDTAV7zk
         2HckMpr7tM+UN4J4i//sprE9bCobWntwzz6+4jCT2kWtV+893cVCvC3PAN2kX6yVst0T
         7t5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=Ptn3HzexirT3XdAWduQkOmakKONtwiNCxuYSboBYnto=;
        b=NGckix8l9PQDtXCK0O0DZ8jVsaGdTiDGiIF7snBQ9UkaDOHYuuWEMlnyVHxfgZC6P2
         JjUPcFplVuMMcg10MTf3/0BOH913xgbaOOH5LuqV9WfPNsUt+QxxUoF4wQwmquzTz0cz
         bAo64EahQcD0JwllKuuw2BlmgiQ1JF8WO85wD3tEsBGH/pk32cmW8i7FspqC2uQ9SU5n
         YDIWOiFvqT6dagk83nHhIJraQMCs8Jc3aoNishTWPnOasTSsmlKp4waPJ5dLVj22tRhF
         O1u1M949m4Esxjspz3AFPVxtgtBj21unPBw7zAcI77cvJUz4fDCBSxUS7rU1DAXyX8V3
         gcRw==
X-Gm-Message-State: AOAM5328rwgQICRTJKTZ6c3f7E93Bbii7g70p6+5+OOw5Y7geG5InvBa
	u1+mRU8BFvIqXadzAO5Gu4rGj+H/F/7GPrYLVMU=
X-Google-Smtp-Source: ABdhPJy/1qSCgmfbU6wNHXKkCpa9LF5ZYhy1PGrzlWHfEx/m/36FbJHC3jvEjhB6QmM0PEA786bXcBLr3cl9AK28q7Y=
X-Received: by 2002:a17:906:84e1:: with SMTP id zp1mr3253699ejb.499.1598026680604;
 Fri, 21 Aug 2020 09:18:00 -0700 (PDT)
MIME-Version: 1.0
References: <20200818184122.29C415DF@viggo.jf.intel.com> <20200818184131.C972AFCC@viggo.jf.intel.com>
 <87lfi9wxk9.fsf@yhuang-dev.intel.com> <6a378a57-a453-0318-924b-05dfa0a10c1f@intel.com>
 <CAHbLzkrjxm38VV+ibQxoQkC4nW7F13aJcL5RypUchX30rqUstA@mail.gmail.com> <87v9hcvmr5.fsf@yhuang-dev.intel.com>
In-Reply-To: <87v9hcvmr5.fsf@yhuang-dev.intel.com>
From: Yang Shi <shy828301@gmail.com>
Date: Fri, 21 Aug 2020 09:17:48 -0700
Message-ID: <CAHbLzkpnCSgRa1TGKk8zih7-h2bAh1N6X==rsLpSPY-n90F-ww@mail.gmail.com>
Subject: Re: [RFC][PATCH 5/9] mm/migrate: demote pages during reclaim
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>, Dave Hansen <dave.hansen@linux.intel.com>, 
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Yang Shi <yang.shi@linux.alibaba.com>, 
	David Rientjes <rientjes@google.com>, Dan Williams <dan.j.williams@intel.com>, 
	Linux-MM <linux-mm@kvack.org>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: AF35E180F8B86
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam02
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Aug 20, 2020 at 5:57 PM Huang, Ying <ying.huang@intel.com> wrote:
>
> Yang Shi <shy828301@gmail.com> writes:
>
> > On Thu, Aug 20, 2020 at 8:22 AM Dave Hansen <dave.hansen@intel.com> wrote:
> >>
> >> On 8/20/20 1:06 AM, Huang, Ying wrote:
> >> >> +    /* Migrate pages selected for demotion */
> >> >> +    nr_reclaimed += demote_page_list(&ret_pages, &demote_pages, pgdat, sc);
> >> >> +
> >> >>      pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
> >> >>
> >> >>      mem_cgroup_uncharge_list(&free_pages);
> >> >> _
> >> > Generally, it's good to batch the page migration.  But one side effect
> >> > is that, if the pages are failed to be migrated, they will be placed
> >> > back to the LRU list instead of falling back to be reclaimed really.
> >> > This may cause some issue in some situation.  For example, if there's no
> >> > enough space in the PMEM (slow) node, so the page migration fails, OOM
> >> > may be triggered, because the direct reclaiming on the DRAM (fast) node
> >> > may make no progress, while it can reclaim some pages really before.
> >>
> >> Yes, agreed.
> >
> > Kind of. But I think that should be transient and very rare. The
> > kswapd on pmem nodes will be waken up to drop pages when we try to
> > allocate migration target pages. It should be very rare that there is
> > not reclaimable page on pmem nodes.
> >
> >>
> >> There are a couple of ways we could fix this.  Instead of splicing
> >> 'demote_pages' back into 'ret_pages', we could try to get them back on
> >> 'page_list' and goto the beginning on shrink_page_list().  This will
> >> probably yield the best behavior, but might be a bit ugly.
> >>
> >> We could also add a field to 'struct scan_control' and just stop trying
> >> to migrate after it has failed one or more times.  The trick will be
> >> picking a threshold that doesn't mess with either the normal reclaim
> >> rate or the migration rate.
> >
> > In my patchset I implemented a fallback mechanism via adding a new
> > PGDAT_CONTENDED node flag. Please check this out:
> > https://patchwork.kernel.org/patch/10993839/.
> >
> > Basically the PGDAT_CONTENDED flag will be set once migrate_pages()
> > return -ENOMEM which indicates the target pmem node is under memory
> > pressure, then it would fallback to regular reclaim path. The flag
> > would be cleared by clear_pgdat_congested() once the pmem node memory
> > pressure is gone.
>
> There may be some races between the flag set and clear.  For example,
>
> - try to migrate some pages from DRAM node to PMEM node
>
> - no enough free pages on the PMEM node, so wakeup kswapd
>
> - kswapd on PMEM node reclaimed some page and try to clear
>   PGDAT_CONTENDED on DRAM node
>
> - set PGDAT_CONTENDED on DRAM node

Yes, the race is true. Someone else may set PGDAT_CONTENDED, but pmem
node's kswapd already went to sleep, so the flag might be not be able
to be cleared for a while.

I think this can be solved easily. We can just move the flag set to
kswapd. Once kswapd is waken up we know there is kind of memory
pressure on that node, then set the flag, clear the flag when kswapd
goes to sleep. kswapd is single threaded and just set/clear its own
node's flag, so there should be no race if I don't miss something.

>
> This may be resolvable.  But I still prefer to fallback to real page
> reclaiming directly for the pages failed to be migrated.  That looks
> more robust.
>
> Best Regards,
> Huang, Ying
>
> > We already use node flags to indicate the state of node in reclaim
> > code, i.e. PGDAT_WRITEBACK, PGDAT_DIRTY, etc. So, adding a new flag
> > sounds more straightforward to me IMHO.
> >
> >>
> >> This is on my list to fix up next.
> >>