From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C040C04AAA for ; Wed, 20 Sep 2023 16:04:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27EAF6B0189; Wed, 20 Sep 2023 12:04:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 22EFE6B018A; Wed, 20 Sep 2023 12:04:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F8476B018B; Wed, 20 Sep 2023 12:04:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0182D6B0189 for ; Wed, 20 Sep 2023 12:04:34 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id BB413A072E for ; Wed, 20 Sep 2023 16:04:34 +0000 (UTC) X-FDA: 81257448468.01.B09C4F7 Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf06.hostedemail.com (Postfix) with ESMTP id 94F06180137 for ; Wed, 20 Sep 2023 16:04:02 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=AuSPfyQZ; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf06.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.41 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695225844; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CTf4+1qMO9RJAdxQkVKBmTQcbxilq1N2AJD6yiobRZU=; b=YWeWAaCJlWh87BNrb4WjxRY1cGfIQoKyLKQ9qzq3aDJ1QE1wtDf+oR/5AG4215cf8KbUPR dg8J+BfOMa5kIdqJ7JrKcFqoeOLRvdH1w2Is0v601bWoBCWnQA4vsHwAioXIdcIy+GWiag lb2Gn4Xi+55YhYQxOyKiYMP2Asc/XJk= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=AuSPfyQZ; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf06.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.41 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695225844; a=rsa-sha256; cv=none; b=cre/VDXhTca+MYk9abWAqmg74q++s2ZWFNr+frNTAdciogZ97zKAXQvJPloj7yBF0q3UF0 ZTIVpxHMtfqvc9QblF1xwlr8kvf+jSh0T8BUDpbngDbtt+AsINLwxcZyX5oshs+LJr5hu3 rMfbgSpGHh8GFbkZmcS+jGCRKrd/cZM= Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6561c09ead6so37102396d6.1 for ; Wed, 20 Sep 2023 09:04:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1695225842; x=1695830642; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=CTf4+1qMO9RJAdxQkVKBmTQcbxilq1N2AJD6yiobRZU=; b=AuSPfyQZqsWliirz+WqMYhTqUK5IdrYuKRUXGoa5oAMhHmGRdS9JVhTIB6mHtUPTWT Gjc34EbUOaTB1vnfbZx3YSWLbo12NP567wztUqgSzitYlMOfx2fgUAIOC1v3EKyjCezR +I2/qlG5+Na9TpWYWKFL3D2kyYuAtfX8htfzf6DYScrZ0fR77zW4TOyWo8eu9L7sayt4 S6tnR0xeuN4gc2c2gMRpST6ZY+i6Ra/ohLGgqTth4HRABzE4omvnpB57b+OdiOGav4mu Bu7WjzACCWDjEB1MUP+N4XP5+6hyriR/HBFGUL3XX111xrzrWEEyReLCuMtL7Gp9HVuX 35vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695225842; x=1695830642; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CTf4+1qMO9RJAdxQkVKBmTQcbxilq1N2AJD6yiobRZU=; b=WWfj8nBXn6Rqyk6sa1iHpoIe/CcyK5qjFWphBC06PBSoqUTPNjsI4P5FB3XYqTA5sl mcnUhHFJG5qvSFK5iwTo0HN+f/MUewVoIXSOo+UnpguarEMGX4+/DYZXveipT8IS0GEi j5SBKR5K/DOFAlH7IowGlVGgdTvKW61IJHGDJN0PfOmo2PGRhkeD6Hj5nVqj5HXDuw8p U0NDdSWRVorYGq3YApyPEK262XwN82CkadAi/GPdTRbkkZoonky4Xk3QwefSiJIGxsrA +pA+QmxBF9B1PS9zGxTmJlbS0TRW1IO/8D9nRbOfKQ3SesC3Mf/Jb8VBI/xpprr/zowP r6gQ== X-Gm-Message-State: AOJu0YznDGcZr9CcUqoOBVfepnM0C5Nte3eKt4ZVlzypR6PUznNcfusf Mr/K71LlNBrMBMkdHr0clBwqmg== X-Google-Smtp-Source: AGHT+IHHXbHgASQaS929AeKMlhY2inMZSrvpLyMfiPOrCVe6y/8IW4Jal46dUXMGYNyJabzlAN+kHQ== X-Received: by 2002:a0c:e8ce:0:b0:655:88e9:1b09 with SMTP id m14-20020a0ce8ce000000b0065588e91b09mr2919676qvo.18.1695225841455; Wed, 20 Sep 2023 09:04:01 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:d7e0]) by smtp.gmail.com with ESMTPSA id d8-20020a0cfe88000000b0064f4258184csm5306520qvs.53.2023.09.20.09.04.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 09:04:01 -0700 (PDT) Date: Wed, 20 Sep 2023 12:04:00 -0400 From: Johannes Weiner To: Vlastimil Babka Cc: Zi Yan , Mike Kravetz , Andrew Morton , Mel Gorman , Miaohe Lin , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Message-ID: <20230920160400.GC124289@cmpxchg.org> References: <20230918145204.GB16104@cmpxchg.org> <20230918174037.GA112714@monkey> <20230919064914.GA124289@cmpxchg.org> <20230919184731.GC112714@monkey> <20230920003239.GD112714@monkey> <149ACAE8-D3E4-4009-828A-D3AC881FFB9C@nvidia.com> <20230920134811.GB124289@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230920134811.GB124289@cmpxchg.org> X-Rspamd-Queue-Id: 94F06180137 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 8p43xfh7dz5gmja95uz5pahypitmb6fn X-HE-Tag: 1695225842-738266 X-HE-Meta: U2FsdGVkX19MdBB0Q8h7QOCP40bBWylEL3Fu1ZyGWk66YEdX0d9Lc0Gu5RDtFVAenGjg/WUpqwxJPhQ1jnjOoSxeUxSCiUczrlq1f5RGWqXLm1J5IX5nQGcbCJr40EQCAN/aNYOLzZJlW3vE2t8QDpN0NQCNP221wVBJUJOUxN7+TiXFGmgwM82ZyxuFc3Ukf8swsBA0UZ8k8xZJmxCrLOq+RFTYk7Af5sQw6tjrtPkZ9yKddUH1rVXaOQxUP5WqQp9lRwFK4RS5Zketv7wFvdyhTz3w2Y3gEJUvx2a8fdopzDXH90tQldY8XdVIu7aufDRyk5cAqb0f5ki6H2vq03zo8clE1PQjX7moHRjHvPvLh1z969olybaDdrJ/ZX8WlhmlFvWtXnoWXvaAf3CQhz5lbSo0iVodk2UhsAV8lMUalsJ8KhLKDVb+qpw1J8Y64LAwk+pUGTeJ2DwfQyvta7VplIJ9+bJF7wh/u9sz8LpeABQJTnsvCHzrQ8+R5rFfqEJXNqWQMos+o/uywwYuiKmgxTxhfYvVMoDHinqM26Pgl2pI8tz/toxsTX34hzyfevdKDiCGjRtqjB/H6g4vIQQyy9pebkCxg1mn8s25lD0lDMery9AfZNlOy+0DQeXs8HzPZlBGrsj9ckH7wFUlfWFwVSr1Gefyhq2cnlp75fb72cAaArz1ouM1WI+pYoSWXPl81TBmilEx94zvd3njNon8N64EP1isP4c1ZdrgDB7zKSroy5IuhdEtJbS2hNTeJC27UMCM9NTU7mjYgbLMiflpUxyPitJV1p5WTIFhcvKGTGVF9RlH/RGUe08IkBxBYUIllGtof5oxjOe/GtCRe+GjngMUuz4XJaJmGEC0pCmMcdHtZwryejLAgD37JdHFYFbwFVvOXRExdDoWlzz58lXWshuZD/DHakwyUuONFrASlQSBmsXpspo54wObDUwkpqNVQTRjFabWVU2XZBF tlICkU38 CcWp2vJpXN9t+AJgvDDxPajxtI0z+LxQAM4M2rxrE3yG/UZL63IMRBlGYHZljjxMdQrmV09VU80vfHaMALjEorDRsW9gnceBwYaAxtuzkeIY1fisxYNSWK2VvG8adlNpweLK61WbGMs0wzCbwV7QO4nZnXDaRG+pxv7fxBN2Y+6PWcxL1au32FLKZYtwHZo4tu3rd/I2lMW0bWuZUGJH/wK9ZBf1FQg1DeYuqueSiOxGbOZC+bcHJLbp6I1zxBKM/D2Q+c7Xpukge7u0NK9NOFcp8an9zq3RG68BtrP9mxMRag9EcK5lgiswFhcgLuFpOsiioEu2VET6wFXV30y7T0tYGq0EBq5uYqg6LqjriSXMGsA9S2lYuG4W8c6Slh7jFUveyh/3iJ+diPwxNAmDOALCdkpXrNGGIIEl6FtlkcX2Uf4CMZWUy7gPdzqmxguNctmBmoezZWLWEOjjMWKmqDUaggA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 20, 2023 at 09:48:12AM -0400, Johannes Weiner wrote: > On Wed, Sep 20, 2023 at 08:07:53AM +0200, Vlastimil Babka wrote: > > On 9/20/23 03:38, Zi Yan wrote: > > > On 19 Sep 2023, at 20:32, Mike Kravetz wrote: > > > > > >> On 09/19/23 16:57, Zi Yan wrote: > > >>> On 19 Sep 2023, at 14:47, Mike Kravetz wrote: > > >>> > > >>>> --- a/mm/page_alloc.c > > >>>> +++ b/mm/page_alloc.c > > >>>> @@ -1651,8 +1651,13 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page, > > >>>> end = pageblock_end_pfn(pfn) - 1; > > >>>> > > >>>> /* Do not cross zone boundaries */ > > >>>> +#if 0 > > >>>> if (!zone_spans_pfn(zone, start)) > > >>>> start = zone->zone_start_pfn; > > >>>> +#else > > >>>> + if (!zone_spans_pfn(zone, start)) > > >>>> + start = pfn; > > >>>> +#endif > > >>>> if (!zone_spans_pfn(zone, end)) > > >>>> return false; > > >>>> I can still trigger warnings. > > >>> > > >>> OK. One thing to note is that the page type in the warning changed from > > >>> 5 (MIGRATE_ISOLATE) to 0 (MIGRATE_UNMOVABLE) with my suggested change. > > >>> > > >> > > >> Just to be really clear, > > >> - the 5 (MIGRATE_ISOLATE) warning was from the __alloc_pages call path. > > >> - the 0 (MIGRATE_UNMOVABLE) as above was from the alloc_contig_range call > > >> path WITHOUT your change. > > >> > > >> I am guessing the difference here has more to do with the allocation path? > > >> > > >> I went back and reran focusing on the specific migrate type. > > >> Without your patch, and coming from the alloc_contig_range call path, > > >> I got two warnings of 'page type is 0, passed migratetype is 1' as above. > > >> With your patch I got one 'page type is 0, passed migratetype is 1' > > >> warning and one 'page type is 1, passed migratetype is 0' warning. > > >> > > >> I could be wrong, but I do not think your patch changes things. > > > > > > Got it. Thanks for the clarification. > > >> > > >>>> > > >>>> One idea about recreating the issue is that it may have to do with size > > >>>> of my VM (16G) and the requested allocation sizes 4G. However, I tried > > >>>> to really stress the allocations by increasing the number of hugetlb > > >>>> pages requested and that did not help. I also noticed that I only seem > > >>>> to get two warnings and then they stop, even if I continue to run the > > >>>> script. > > >>>> > > >>>> Zi asked about my config, so it is attached. > > >>> > > >>> With your config, I still have no luck reproducing the issue. I will keep > > >>> trying. Thanks. > > >>> > > >> > > >> Perhaps try running both scripts in parallel? > > > > > > Yes. It seems to do the trick. > > > > > >> Adjust the number of hugetlb pages allocated to equal 25% of memory? > > > > > > I am able to reproduce it with the script below: > > > > > > while true; do > > > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages& > > > echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages& > > > wait > > > echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > > > echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > > > done > > > > > > I will look into the issue. > > Nice! > > I managed to reproduce it ONCE, triggering it not even a second after > starting the script. But I can't seem to do it twice, even after > several reboots and letting it run for minutes. I managed to reproduce it reliably by cutting the nr_hugepages parameters respectively in half. The one that triggers for me is always MIGRATE_ISOLATE. With some printk-tracing, the scenario seems to be this: #0 #1 start_isolate_page_range() isolate_single_pageblock() set_migratetype_isolate(tail) lock zone->lock move_freepages_block(tail) // nop set_pageblock_migratetype(tail) unlock zone->lock del_page_from_freelist(head) expand(head, head_mt) WARN(head_mt != tail_mt) start_pfn = ALIGN_DOWN(MAX_ORDER_NR_PAGES) for (pfn = start_pfn, pfn < end_pfn) if (PageBuddy()) split_free_page(head) IOW, we update a pageblock that isn't MAX_ORDER aligned, then drop the lock. The move_freepages_block() does nothing because the PageBuddy() is set on the pageblock to the left. Once we drop the lock, the buddy gets allocated and the expand() puts things on the wrong list. The splitting code that handles MAX_ORDER blocks runs *after* the tail type is set and the lock has been dropped, so it's too late. I think this would work fine if we always set MIGRATE_ISOLATE in a linear fashion, with start and end aligned to MAX_ORDER. Then we also wouldn't have to split things. There are two reasons this doesn't happen today: 1. The isolation range is rounded to pageblocks, not MAX_ORDER. In this test case they always seem aligned, but it's not guaranteed. However, 2. start_isolate_page_range() explicitly breaks ordering by doing the last block in the range before the center. It's that last block that triggers the race with __rmqueue_smallest -> expand() for me. With the below patch I can no longer reproduce the issue: --- diff --git a/mm/page_isolation.c b/mm/page_isolation.c index b5c7a9d21257..b7c8730bf0e2 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -538,8 +538,8 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, unsigned long pfn; struct page *page; /* isolation is done at page block granularity */ - unsigned long isolate_start = pageblock_start_pfn(start_pfn); - unsigned long isolate_end = pageblock_align(end_pfn); + unsigned long isolate_start = ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES); + unsigned long isolate_end = ALIGN(end_pfn, MAX_ORDER_NR_PAGES); int ret; bool skip_isolation = false; @@ -549,17 +549,6 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, if (ret) return ret; - if (isolate_start == isolate_end - pageblock_nr_pages) - skip_isolation = true; - - /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */ - ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true, - skip_isolation, migratetype); - if (ret) { - unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype); - return ret; - } - /* skip isolated pageblocks at the beginning and end */ for (pfn = isolate_start + pageblock_nr_pages; pfn < isolate_end - pageblock_nr_pages; @@ -568,12 +557,21 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, if (page && set_migratetype_isolate(page, migratetype, flags, start_pfn, end_pfn)) { undo_isolate_page_range(isolate_start, pfn, migratetype); - unset_migratetype_isolate( - pfn_to_page(isolate_end - pageblock_nr_pages), - migratetype); return -EBUSY; } } + + if (isolate_start == isolate_end - pageblock_nr_pages) + skip_isolation = true; + + /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */ + ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true, + skip_isolation, migratetype); + if (ret) { + undo_isolate_page_range(isolate_start, pfn, migratetype); + return ret; + } + return 0; } @@ -591,8 +589,8 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, { unsigned long pfn; struct page *page; - unsigned long isolate_start = pageblock_start_pfn(start_pfn); - unsigned long isolate_end = pageblock_align(end_pfn); + unsigned long isolate_start = ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES); + unsigned long isolate_end = ALIGN(end_pfn, MAX_ORDER_NR_PAGES); for (pfn = isolate_start; pfn < isolate_end;