From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0B8CE7070A for ; Thu, 21 Sep 2023 10:19:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 480CB6B018B; Thu, 21 Sep 2023 06:19:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 430D76B018D; Thu, 21 Sep 2023 06:19:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2D1D76B0192; Thu, 21 Sep 2023 06:19:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1A99D6B018B for ; Thu, 21 Sep 2023 06:19:43 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id DFBA3A08A6 for ; Thu, 21 Sep 2023 10:19:42 +0000 (UTC) X-FDA: 81260208204.17.8B8A9D8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf10.hostedemail.com (Postfix) with ESMTP id AFAB3C0004 for ; Thu, 21 Sep 2023 10:19:40 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=L0uSp90p; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf10.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695291580; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oAT3UvZTnp6DCFOrZjZuTDovRcgOdOo7IvXN06p+X2A=; b=IE8ZrbLJlDS5E6gymu89hkUh89Ov3nY78g5SGJJO6mdlsCWxE0Xo6EK30PBHVlhNofHUoV ipqsan/qGqxe1OUlCfQ8uDJQ1d6hq6l8B1qRgGTH/VYx9q1zci2jWUO8EEQ6vIzttjNTcG AR7cGiO+keQddBE8kFZ/EKcYyB4yd70= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=L0uSp90p; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf10.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695291580; a=rsa-sha256; cv=none; b=8TcEtjxE4BoTiO1RT04GqsfCTuywg2HDz9bn48oxjRZ9BGrXsxCFi6vzQ3RqBRM62bRFxU qoolLL9QtIQ0b7PQm2RrXx5wUwDnvQDWK6hi9Oa7lqCaRtHbMX9iw2h3/SqZS7ro2ko+1r pI+EWVLWALcRTQWlTR98+5bIDjlsfSM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1695291580; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oAT3UvZTnp6DCFOrZjZuTDovRcgOdOo7IvXN06p+X2A=; b=L0uSp90pKIEKGqcyALPeMAzALDaz6HUkGUNF/lkfM48b/bMxbDatIW0lSQNMHeSiRgSVRz vHk0j2uizj7Jlz1RrHF0MEBuilWETNKgdFhsqSCLn5+R6OrO8MvD7rOf1ZpJJuyz4a7FKO OQeSHSqLxhg440QWX+kgsjM+UAkCH7k= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-607-P522GuclPi2K7H9ChP7nKg-1; Thu, 21 Sep 2023 06:19:36 -0400 X-MC-Unique: P522GuclPi2K7H9ChP7nKg-1 Received: by mail-lf1-f70.google.com with SMTP id 2adb3069b0e04-5041bddcedaso1101986e87.3 for ; Thu, 21 Sep 2023 03:19:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695291575; x=1695896375; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oAT3UvZTnp6DCFOrZjZuTDovRcgOdOo7IvXN06p+X2A=; b=s7OV4z9otvcCcq7OQcL9DrcZDlIB8IdOsmQ21xgpkRwpEQx+Cm0gCQQCaYPqIiBKIU xGqUq0sBAFy+eD9ye96Mp4R9rZw6Mw1FilvhjREr2eumIpFwQTtuEIaalwdVtqQoZBNY YpDtBlHyIXRCsKSrCoYO0i9xaY3XnkNJxLnx2kW0O/R+13rUo0BlwRFSBD7V15eh2lv1 7aQWn/rl3aADV2fZpLTJU343MMcI8CwrscYotIdOVzmSMx37GT2PjtoZv6ab9HQN495V l4b0hvU4x6/k9oGnvzM5xsC0k4fqr3L7RR33liA5+osNHcTUeCD8lepF84pJXkcFwpJE +Msw== X-Gm-Message-State: AOJu0Yyp/OQLdaG5LNRaUcFw6C3N6Fo+lpDpGxms/S6KhyweACkEOvMG xUdMylKiUi5WBxO0OozyAnoqnJeSsmuvQPQ275yfZ2sX8lM2VBCkwIknC/9F26sDaVUA4BGenzw S5dfSYUCCvsM= X-Received: by 2002:a19:9110:0:b0:502:fd08:69f7 with SMTP id t16-20020a199110000000b00502fd0869f7mr4327615lfd.28.1695291575337; Thu, 21 Sep 2023 03:19:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE7jJ9xtz4Oe0tKNLMBiYLXy8fO9H31dsoPmJqX16r6+rt3XdcKKobE0nkYohi28Engb3S94w== X-Received: by 2002:a19:9110:0:b0:502:fd08:69f7 with SMTP id t16-20020a199110000000b00502fd0869f7mr4327602lfd.28.1695291574927; Thu, 21 Sep 2023 03:19:34 -0700 (PDT) Received: from ?IPV6:2003:cb:c70d:3c00:9eab:fce5:e6f3:e626? (p200300cbc70d3c009eabfce5e6f3e626.dip0.t-ipconnect.de. [2003:cb:c70d:3c00:9eab:fce5:e6f3:e626]) by smtp.gmail.com with ESMTPSA id x26-20020a1c7c1a000000b003fe29f6b61bsm1506954wmc.46.2023.09.21.03.19.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 21 Sep 2023 03:19:34 -0700 (PDT) Message-ID: <505e7f55-f63a-b33d-aa10-44de16d2d3cc@redhat.com> Date: Thu, 21 Sep 2023 12:19:33 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene To: Zi Yan , Johannes Weiner Cc: Vlastimil Babka , Mike Kravetz , Andrew Morton , Mel Gorman , Miaohe Lin , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230918145204.GB16104@cmpxchg.org> <20230918174037.GA112714@monkey> <20230919064914.GA124289@cmpxchg.org> <20230919184731.GC112714@monkey> <20230920003239.GD112714@monkey> <149ACAE8-D3E4-4009-828A-D3AC881FFB9C@nvidia.com> <20230920134811.GB124289@cmpxchg.org> <20230920160400.GC124289@cmpxchg.org> <762CA634-053A-41DD-8ED7-895374640858@nvidia.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: AFAB3C0004 X-Stat-Signature: n47sspezpwpxnu9frs9ej4774bhmpawc X-Rspam-User: X-HE-Tag: 1695291580-513029 X-HE-Meta: U2FsdGVkX18bVKbXFalHjdfqWrXJMdwJT53GRFCkfpcZirfskME8m+DPBl0GsoS5rc7QWdZTrKEqDeyLl2QFdGnlJokK5RPeXC/UZleXEDjqbX0LkPt1lK9ZjwhnZpbcOapYGrAsaw6YuEJ2OIrmSJI14XxaI60hPGzKHOzs7ssPp8qW18TjBkQC4UE2nTqVORVnd2/xh4Kb6hzngLJ/vh/oLzIGiy01s/RuX3122ET+ba7tywjDPO0ggFAheh6HlOMNOLeN0Q6caVdSv8oE3yAf0gyZ8/vzoeG2pUxDwfHrHmop73Vy168z9L1q/Rx4ZEFGDS0bOx0dt3Y6WmzmfkP4/4XqYeisXIWrMzSdJ4e8P6KQZTBvT0Bxrm76atglFCLdePObtSnbG1HWAEvUUXYrbBg0u5TUGJvsgrzxJ5UL1Jrzwhh2MtKAhn/3GGz6Y3FAcyWAVmO+mk6MC3+9Tp7StcllI52RUVrKo6NOdxAP+w9cXw7Dg1QTvkkLumK1yfpO2L++KafJqRzrHNSE80lgyxZq3lqFhueOk/u2qhPzHwha+aRlYZDaFT6aajwNUlIjwXRYCQxhUS4zBCXYUMxIV/f5hVGbFHwPtNt4bouiCwCpiVDRWoA7yvHnSLnKtDsQXxvtZIF+s1eGxcYehwDXTxD0zQq7G7pf/WtdJ409EPeiRwx99IhCqSw32gV61OKGVke1my2RqmAmlGmhf0kgUya8mhTMerICg1uESVeHNejp5BxEdfvwC4Tf/0o4y9S91bOsnUSggon/u4v2FfXEAnDlw9Hqh/E/mBQKfOAWQp4Pk1wa8y8JldHCY+HPZWxpb786XIYKTK0CMu0N+5GdKuaBLPRIR7XGVAlI0rpYDxHC2VpmGJZScP35emZlG6qM18+PsR3ocZQ23tCmik+zMrGA9GsBGHkiSS0Z8XqpCEx/nBESIMvAgYYkZxtTwcuN1mcslf4FK5ReDqz R38/FoEZ ygSKrWhEIctC5ITZtnSSTYvS4tMeRglB1xapA3wL7OXc7aX13NzQ6RCdnixJGvS2P2xOtvb41+3f2+hbQJEtu/qH0g77617/Ora2QLtjW8GQ7kTeqQdYNL4+lvaew07/oJYCvRl3bgFFmzXWx6551S13+Aha5MeY6y03NOnwjyjM84YAyz4kbfwcouIC41hzYRq3pVNkgsI5de50soj3dzY6NBBpLyiHvBu8fidrPVdBZLVThd7JR/WZgJw++DU1xKJtbXqx4uzwv5LUMflOaXtBesb1VmiQWNrjmGcuMpfAlTOUf1BgLjeQQDqZ1GTmB0mM6e9sLYUjxMunYeIty0DilJW02QL8CxuqnWcUjxeQZbIu6RKVIiEcPcVkvUEb554O1QgvSPGkztMRUh5jopTQgu+8UyqRzN09qyywBLYZSodWYAT+4/0Gbvw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 21.09.23 04:31, Zi Yan wrote: > On 20 Sep 2023, at 13:23, Zi Yan wrote: > >> On 20 Sep 2023, at 12:04, Johannes Weiner wrote: >> >>> On Wed, Sep 20, 2023 at 09:48:12AM -0400, Johannes Weiner wrote: >>>> On Wed, Sep 20, 2023 at 08:07:53AM +0200, Vlastimil Babka wrote: >>>>> On 9/20/23 03:38, Zi Yan wrote: >>>>>> On 19 Sep 2023, at 20:32, Mike Kravetz wrote: >>>>>> >>>>>>> On 09/19/23 16:57, Zi Yan wrote: >>>>>>>> On 19 Sep 2023, at 14:47, Mike Kravetz wrote: >>>>>>>> >>>>>>>>> --- a/mm/page_alloc.c >>>>>>>>> +++ b/mm/page_alloc.c >>>>>>>>> @@ -1651,8 +1651,13 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page, >>>>>>>>> end = pageblock_end_pfn(pfn) - 1; >>>>>>>>> >>>>>>>>> /* Do not cross zone boundaries */ >>>>>>>>> +#if 0 >>>>>>>>> if (!zone_spans_pfn(zone, start)) >>>>>>>>> start = zone->zone_start_pfn; >>>>>>>>> +#else >>>>>>>>> + if (!zone_spans_pfn(zone, start)) >>>>>>>>> + start = pfn; >>>>>>>>> +#endif >>>>>>>>> if (!zone_spans_pfn(zone, end)) >>>>>>>>> return false; >>>>>>>>> I can still trigger warnings. >>>>>>>> >>>>>>>> OK. One thing to note is that the page type in the warning changed from >>>>>>>> 5 (MIGRATE_ISOLATE) to 0 (MIGRATE_UNMOVABLE) with my suggested change. >>>>>>>> >>>>>>> >>>>>>> Just to be really clear, >>>>>>> - the 5 (MIGRATE_ISOLATE) warning was from the __alloc_pages call path. >>>>>>> - the 0 (MIGRATE_UNMOVABLE) as above was from the alloc_contig_range call >>>>>>> path WITHOUT your change. >>>>>>> >>>>>>> I am guessing the difference here has more to do with the allocation path? >>>>>>> >>>>>>> I went back and reran focusing on the specific migrate type. >>>>>>> Without your patch, and coming from the alloc_contig_range call path, >>>>>>> I got two warnings of 'page type is 0, passed migratetype is 1' as above. >>>>>>> With your patch I got one 'page type is 0, passed migratetype is 1' >>>>>>> warning and one 'page type is 1, passed migratetype is 0' warning. >>>>>>> >>>>>>> I could be wrong, but I do not think your patch changes things. >>>>>> >>>>>> Got it. Thanks for the clarification. >>>>>>> >>>>>>>>> >>>>>>>>> One idea about recreating the issue is that it may have to do with size >>>>>>>>> of my VM (16G) and the requested allocation sizes 4G. However, I tried >>>>>>>>> to really stress the allocations by increasing the number of hugetlb >>>>>>>>> pages requested and that did not help. I also noticed that I only seem >>>>>>>>> to get two warnings and then they stop, even if I continue to run the >>>>>>>>> script. >>>>>>>>> >>>>>>>>> Zi asked about my config, so it is attached. >>>>>>>> >>>>>>>> With your config, I still have no luck reproducing the issue. I will keep >>>>>>>> trying. Thanks. >>>>>>>> >>>>>>> >>>>>>> Perhaps try running both scripts in parallel? >>>>>> >>>>>> Yes. It seems to do the trick. >>>>>> >>>>>>> Adjust the number of hugetlb pages allocated to equal 25% of memory? >>>>>> >>>>>> I am able to reproduce it with the script below: >>>>>> >>>>>> while true; do >>>>>> echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages& >>>>>> echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages& >>>>>> wait >>>>>> echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages >>>>>> echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >>>>>> done >>>>>> >>>>>> I will look into the issue. >>>> >>>> Nice! >>>> >>>> I managed to reproduce it ONCE, triggering it not even a second after >>>> starting the script. But I can't seem to do it twice, even after >>>> several reboots and letting it run for minutes. >>> >>> I managed to reproduce it reliably by cutting the nr_hugepages >>> parameters respectively in half. >>> >>> The one that triggers for me is always MIGRATE_ISOLATE. With some >>> printk-tracing, the scenario seems to be this: >>> >>> #0 #1 >>> start_isolate_page_range() >>> isolate_single_pageblock() >>> set_migratetype_isolate(tail) >>> lock zone->lock >>> move_freepages_block(tail) // nop >>> set_pageblock_migratetype(tail) >>> unlock zone->lock >>> del_page_from_freelist(head) >>> expand(head, head_mt) >>> WARN(head_mt != tail_mt) >>> start_pfn = ALIGN_DOWN(MAX_ORDER_NR_PAGES) >>> for (pfn = start_pfn, pfn < end_pfn) >>> if (PageBuddy()) >>> split_free_page(head) >>> >>> IOW, we update a pageblock that isn't MAX_ORDER aligned, then drop the >>> lock. The move_freepages_block() does nothing because the PageBuddy() >>> is set on the pageblock to the left. Once we drop the lock, the buddy >>> gets allocated and the expand() puts things on the wrong list. The >>> splitting code that handles MAX_ORDER blocks runs *after* the tail >>> type is set and the lock has been dropped, so it's too late. >> >> Yes, this is the issue I can confirm as well. But it is intentional to enable >> allocating a contiguous range at pageblock granularity instead of MAX_ORDER >> granularity. With your changes below, it no longer works, because if there >> is an unmovable page in >> [ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES), pageblock_start_pfn(start_pfn)), >> the allocation fails but it would succeed in current implementation. >> >> I think a proper fix would be to make move_freepages_block() split the >> MAX_ORDER page and put the split pages in the right migratetype free lists. >> >> I am working on that. > > After spending half a day on this, I think it is much harder than I thought > to get alloc_contig_range() working with the freelist migratetype hygiene > patchset. Because alloc_contig_range() relies on racy migratetype changes: > > 1. pageblocks in the range are first marked as MIGRATE_ISOLATE to prevent > another parallel isolation, but they are not moved to the MIGRATE_ISOLATE > free list yet. > > 2. later in the process, isolate_freepages_range() is used to actually grab > the free pages. > > 3. there was no problem when alloc_contig_range() works on MAX_ORDER aligned > ranges, since MIGRATE_ISOLATE cannot be set in the middle of free pages or > in-use pages. But it is not the case when alloc_contig_range() work on > pageblock aligned ranges. Now during isolation phase, free or in-use pages > will need to be split to get their subpages into the right free lists. > > 4. the hardest case is when a in-use page sits across two pageblocks, currently, > the code just isolate one pageblock, migrate the page, and let split_free_page() > to correct the free list later. But to strictly enforce freelist migratetype > hygiene, extra work is needed at free page path to split the free page into > the right freelists. > > I need more time to think about how to get alloc_contig_range() properly. > Help is needed for the bullet point 4. I once raised that we should maybe try making MIGRATE_ISOLATE a flag that preserves the original migratetype. Not sure if that would help here in any way. The whole alloc_contig_range() implementation is quite complicated and hard to grasp. If we could find ways to clean all that up and make it easier to understand and play along, that would be nice. -- Cheers, David / dhildenb