From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 687A3C433DF for ; Fri, 21 Aug 2020 00:57:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 36BF320702 for ; Fri, 21 Aug 2020 00:57:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 36BF320702 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B37B88D002D; Thu, 20 Aug 2020 20:57:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AE8DA8D001E; Thu, 20 Aug 2020 20:57:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9FE458D002D; Thu, 20 Aug 2020 20:57:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 866708D001E for ; Thu, 20 Aug 2020 20:57:56 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4E10D2826 for ; Fri, 21 Aug 2020 00:57:56 +0000 (UTC) X-FDA: 77172763752.19.crate93_191070a27035 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id 23AE41AD1B7 for ; Fri, 21 Aug 2020 00:57:56 +0000 (UTC) X-HE-Tag: crate93_191070a27035 X-Filterd-Recvd-Size: 5123 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Fri, 21 Aug 2020 00:57:54 +0000 (UTC) IronPort-SDR: +DMrz8Cgyh5S2wHY5zHBGLvPr78VQVOtmmfSp0EvJzQRZpL5aQnwRq6udjN6eiLNQF7tzUo7K7 WLGgTvHQBZ5g== X-IronPort-AV: E=McAfee;i="6000,8403,9719"; a="219734032" X-IronPort-AV: E=Sophos;i="5.76,335,1592895600"; d="scan'208";a="219734032" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Aug 2020 17:57:53 -0700 IronPort-SDR: s8Q3GTMRn4CZHTNRB6VtBti3Xkq6jyXeTOWGQp5lLiKprp79ujG+ljBfEV7mujn1EP0U8k3RpN jdl/ao99K4gA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,335,1592895600"; d="scan'208";a="311288655" Received: from yhuang-dev.sh.intel.com (HELO yhuang-dev) ([10.239.159.164]) by orsmga002.jf.intel.com with ESMTP; 20 Aug 2020 17:57:51 -0700 From: "Huang\, Ying" To: Yang Shi Cc: Dave Hansen , Dave Hansen , Linux Kernel Mailing List , Yang Shi , "David Rientjes" , Dan Williams , Linux-MM Subject: Re: [RFC][PATCH 5/9] mm/migrate: demote pages during reclaim References: <20200818184122.29C415DF@viggo.jf.intel.com> <20200818184131.C972AFCC@viggo.jf.intel.com> <87lfi9wxk9.fsf@yhuang-dev.intel.com> <6a378a57-a453-0318-924b-05dfa0a10c1f@intel.com> Date: Fri, 21 Aug 2020 08:57:50 +0800 In-Reply-To: (Yang Shi's message of "Thu, 20 Aug 2020 09:26:57 -0700") Message-ID: <87v9hcvmr5.fsf@yhuang-dev.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 23AE41AD1B7 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Yang Shi writes: > On Thu, Aug 20, 2020 at 8:22 AM Dave Hansen wrote: >> >> On 8/20/20 1:06 AM, Huang, Ying wrote: >> >> + /* Migrate pages selected for demotion */ >> >> + nr_reclaimed += demote_page_list(&ret_pages, &demote_pages, pgdat, sc); >> >> + >> >> pgactivate = stat->nr_activate[0] + stat->nr_activate[1]; >> >> >> >> mem_cgroup_uncharge_list(&free_pages); >> >> _ >> > Generally, it's good to batch the page migration. But one side effect >> > is that, if the pages are failed to be migrated, they will be placed >> > back to the LRU list instead of falling back to be reclaimed really. >> > This may cause some issue in some situation. For example, if there's no >> > enough space in the PMEM (slow) node, so the page migration fails, OOM >> > may be triggered, because the direct reclaiming on the DRAM (fast) node >> > may make no progress, while it can reclaim some pages really before. >> >> Yes, agreed. > > Kind of. But I think that should be transient and very rare. The > kswapd on pmem nodes will be waken up to drop pages when we try to > allocate migration target pages. It should be very rare that there is > not reclaimable page on pmem nodes. > >> >> There are a couple of ways we could fix this. Instead of splicing >> 'demote_pages' back into 'ret_pages', we could try to get them back on >> 'page_list' and goto the beginning on shrink_page_list(). This will >> probably yield the best behavior, but might be a bit ugly. >> >> We could also add a field to 'struct scan_control' and just stop trying >> to migrate after it has failed one or more times. The trick will be >> picking a threshold that doesn't mess with either the normal reclaim >> rate or the migration rate. > > In my patchset I implemented a fallback mechanism via adding a new > PGDAT_CONTENDED node flag. Please check this out: > https://patchwork.kernel.org/patch/10993839/. > > Basically the PGDAT_CONTENDED flag will be set once migrate_pages() > return -ENOMEM which indicates the target pmem node is under memory > pressure, then it would fallback to regular reclaim path. The flag > would be cleared by clear_pgdat_congested() once the pmem node memory > pressure is gone. There may be some races between the flag set and clear. For example, - try to migrate some pages from DRAM node to PMEM node - no enough free pages on the PMEM node, so wakeup kswapd - kswapd on PMEM node reclaimed some page and try to clear PGDAT_CONTENDED on DRAM node - set PGDAT_CONTENDED on DRAM node This may be resolvable. But I still prefer to fallback to real page reclaiming directly for the pages failed to be migrated. That looks more robust. Best Regards, Huang, Ying > We already use node flags to indicate the state of node in reclaim > code, i.e. PGDAT_WRITEBACK, PGDAT_DIRTY, etc. So, adding a new flag > sounds more straightforward to me IMHO. > >> >> This is on my list to fix up next. >>