From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04547E7849A for ; Mon, 2 Oct 2023 11:43:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 562568D0021; Mon, 2 Oct 2023 07:43:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 512D18D0001; Mon, 2 Oct 2023 07:43:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DCA08D0021; Mon, 2 Oct 2023 07:43:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2FB0A8D0001 for ; Mon, 2 Oct 2023 07:43:48 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id F289A1A0247 for ; Mon, 2 Oct 2023 11:43:47 +0000 (UTC) X-FDA: 81300336894.29.262488A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id B402A80006 for ; Mon, 2 Oct 2023 11:43:45 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=S+VTcR9H; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696247025; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LdWQ9z8fOkBkb74XxqcIV2JNA2pnsgRxBSJgOXL/JUc=; b=67XXlvaqfXcDTW0IfH9ahFqzNuO6LPj7kh6l0+5TzPH0Wu63zPknJzWId9Ort0jX+Nufuv q0l14NsNIC6fd36AX4BdFTqWcbyNRc+xeLWvSVgqfVGaJ8lOjClZK7HLHCk1Quid6kKgka vIvlwjY+kAUVgtoUfVpweZnD3IzLCvw= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=S+VTcR9H; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696247025; a=rsa-sha256; cv=none; b=PH6LGRKAe0HJAkOcpccRAwEtdp/75WfkDayDZDNiYCah2IptZPo0FDDgCYzUUsTEpuhYil lFUn0WUOeOQh3l9sRWuoDSo4TlXDVqohFNgPDVx8SOH4+RicOq2D4vC+7v6iiZid8+foPv Dg3lIx1hL06XIQHBe9R517J83qukfD4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696247025; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LdWQ9z8fOkBkb74XxqcIV2JNA2pnsgRxBSJgOXL/JUc=; b=S+VTcR9Hckiu36FmiuptAUh7XrjhEgx/0YdoJo14kkYDt5Hd3da8yxJq1WDXfIqsvBpIVt a6cFgkl55GqwCBnhl0RCo1Wi4CagBBjIWwNW2Le9gvpfAvpAbCb9RBNgbAlJYogmugb8LI +x8O+qluu4clQ2a1JWbDK4caM88Z4Js= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-554-0FugwikeNVOqoZrxc4ccIg-1; Mon, 02 Oct 2023 07:43:33 -0400 X-MC-Unique: 0FugwikeNVOqoZrxc4ccIg-1 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-323334992fbso7319165f8f.1 for ; Mon, 02 Oct 2023 04:43:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696247012; x=1696851812; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LdWQ9z8fOkBkb74XxqcIV2JNA2pnsgRxBSJgOXL/JUc=; b=oLsBHYWtGGgpE/XPV7Ej3Fl7uNQTnbNbL/XEFYSwVxJNPCDvktRGohQw2QvUQN94X9 RBz/Im9hmeMdUzrr3aPJgNrPzYTfd5/s7VE9QL98Ls2kzEvXLNu3Kxj0VYNp9NTYWc9U 8+/iIPNgFcL/78hteJpwObnik8mQdY0xDQaiy6yJ+BCi41ox2UgC8YNX9eVphOC5a4kv 2s/lkrsAbQMELTwVQ21cwTDq3/kKeaGaRQgfvJxXx7/lNH8hWfRCEMkVld4GhEND2gz8 iVNdhcAhItoVJ4C3aRI5sgh4FCA4i8Hnw0nHyfv8CC6OzF1LCBPiPA1QnPXub6tAeG96 v+dA== X-Gm-Message-State: AOJu0YxGt2103NNo6LxBQaIh+DbLPA5F0pWJU7tKAm4X6bS5/sheWk8R S/jiRS5U/nQ6HAOLLUD0F+Ftvw++Tx+YcaHPxTGnAi0W5Oxjh/bgWrX3OxrVjVptUC5E6O03aN2 x5sj9018CHOc= X-Received: by 2002:a05:6000:109:b0:317:f70b:3156 with SMTP id o9-20020a056000010900b00317f70b3156mr9388985wrx.28.1696247012516; Mon, 02 Oct 2023 04:43:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG2VtYKpfiI7DgOq24joo0XGjgOCqAtFXichvVpOWYSLVbPNVAtso2DOpQ0xzD71nGgtOVKCw== X-Received: by 2002:a05:6000:109:b0:317:f70b:3156 with SMTP id o9-20020a056000010900b00317f70b3156mr9388972wrx.28.1696247012075; Mon, 02 Oct 2023 04:43:32 -0700 (PDT) Received: from ?IPV6:2003:cb:c735:f200:cb49:cb8f:88fc:9446? (p200300cbc735f200cb49cb8f88fc9446.dip0.t-ipconnect.de. [2003:cb:c735:f200:cb49:cb8f:88fc:9446]) by smtp.gmail.com with ESMTPSA id o5-20020adfeac5000000b0031984b370f2sm27838751wrn.47.2023.10.02.04.43.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 02 Oct 2023 04:43:31 -0700 (PDT) Message-ID: Date: Mon, 2 Oct 2023 13:43:30 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 To: Zi Yan Cc: Johannes Weiner , Vlastimil Babka , Mike Kravetz , Andrew Morton , Mel Gorman , Miaohe Lin , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230918145204.GB16104@cmpxchg.org> <20230918174037.GA112714@monkey> <20230919064914.GA124289@cmpxchg.org> <20230919184731.GC112714@monkey> <20230920003239.GD112714@monkey> <149ACAE8-D3E4-4009-828A-D3AC881FFB9C@nvidia.com> <20230920134811.GB124289@cmpxchg.org> <20230920160400.GC124289@cmpxchg.org> <762CA634-053A-41DD-8ED7-895374640858@nvidia.com> <505e7f55-f63a-b33d-aa10-44de16d2d3cc@redhat.com> <4466F447-43D3-43CD-8930-FBE9A49028BA@nvidia.com> <98f3e433-153d-5dd8-c868-30f703baeb46@redhat.com> <6DD1F426-A87D-47B7-B27F-043B399CBEDA@nvidia.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene In-Reply-To: <6DD1F426-A87D-47B7-B27F-043B399CBEDA@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: B402A80006 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: a4mcd3nxmnm59m3ohewx3qi6xztxsbnf X-HE-Tag: 1696247025-225627 X-HE-Meta: U2FsdGVkX1971l7kZMHqh6mrpRa27k7PigpmgEQ8kH177ajkE5e02Q8o/+MGUWCnCB3nEF18O99h7Hg+H0nigoCW8F5q2guqx5TdTRa1rLTEklAw08Io7ce12jVUwibAWHb5jHrYEF07fA0xkFSJ+gNKKfIVUFA9kCv563zRzzJC01D/b/k420ikrk96fYMKfFEBSvK/3LiA6tHS5/l9hTbylAlKiyEV28coOnpGy5KDVE3FtaiEnmmqjs+E8NuJNHTO2wivgi3GRevG6Fg97wLrUUkrfJ9OrS1v3IUEEiyXerut2csaD0vaqtE0e+5l0FWvBxppICjj+Yonqh1OMRYXEEwyF5VSZ7n+wA0dKf5cwLYHAIgcfi/FKghLLYM9kjYC9M3Lc1623ECNViChS7/+NWPG0YD28hkPAIFmTRpyQ4cyX2nEWYUYFFL1gjB7wowZbhMEw5D2wZcH3zWyV0r+WShWxl2gCAoOZQjHulLzjt666oZnXEPRwFzeXgkAqv4llTL6muzzeReztYmrmq/ZMgvUTDulsvQnJJeYIXmhKOqP7irrz9T/CuONbvFRiOzkei9aQgAzCuls/GJr1bG5Ag9rV/sEbpufEDOjKdpKs+hVSpf15ocaRBoYR64B4emwkUfbT7cy9AHxrTCzSLptwcWGBCJ1OAQty51EAEmYAUdhSauhAezz0zfA80Qs1oTOZVa47JIZisf2fmORqe/+WhhZoy9MSexGMsS3M4qG7/263w4OyamSnzvUuCkb65Xjalq3EPkPVsew1UY4xeu+Y1VMZzmsN7LSf+Fxbo0SAjIGH01xB/iZTE3M7iCT6u1x57hRyTOh/6R3qSwXq8NRCzcIBc/QWN7ShMktLo3aNC8mVCRcFjhDFZTCEeS18Nwz/HXV7DLkwMBJOtgxvPgmh0m31MGbC+J+m/QmBMUmBhI2sfQSFdEwmgD1w6L3oqHyth4yvx02bM+XoFg GSE6LaQ8 PvkO/zdgtvCQUbfFgbqFDeSB6jtVpCB3cSX2IrqqOp3hBXqTBwzh9xBbzMrXurV/QEbt7uTCbdTib6wxNhpIJjNkBRRDtmCMiRiAaKf9UfcDvQwR5OPadz5JyiUodtozVw8I4jpmw5AL6cM1J7WE/yUCZ+vgca2gAlePvwYY2iWkJr1W1a0S9wAK8iWMwGxaEZAIVBuFeVR7Rh/IdejerQSqnNHjEhSY+gT+68a1AmqfV3FIjVDeGVc2OAo+03zyD2sdJucbb0+Kxp2/k0JCAdok9vji+NDNbwzSLa2UdBVoMWVnPbJnkIU/W+NI94Rll2H/d/G96ZArZNTgobju456+YwhBmA0ltguFFPzdasxattRYWJpWQeGtzqikpnDx0GFxsPWrX/b/UrC0xgPyisD2Pb1CV7vt5PqpuXsAIlrNFESXrAYTAQxm1lw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >>> I can do it after I fix this. That change might or might not help only if we make >>> some redesign on how migratetype is managed. If MIGRATE_ISOLATE does not >>> overwrite existing migratetype, the code might not need to split a page and move >>> it to MIGRATE_ISOLATE freelist? >> >> Did someone test how memory offlining plays along with that? (I can try myself >> within the next 1-2 weeks) >> >> There [mm/memory_hotplug.c:offline_pages] we always cover full MAX_ORDER ranges, >> though. >> >> ret = start_isolate_page_range(start_pfn, end_pfn, >> MIGRATE_MOVABLE, >> MEMORY_OFFLINE | REPORT_FAILURE, >> GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL); > > Since a full MAX_ORDER range is passed, no free page split will happen. Okay, thanks for verifying that it should not be affected! > >> >>> >>> The fundamental issue in alloc_contig_range() is that to work at >>> pageblock level, a page (>pageblock_order) can have one part is isolated and >>> the rest is a different migratetype. {add_to,move_to,del_page_from}_free_list() >>> now checks first pageblock migratetype, so such a page needs to be removed >>> from its free_list, set MIGRATE_ISOLATE on one of the pageblock, split, and >>> finally put back to multiple free lists. This needs to be done at isolation stage >>> before free pages are removed from their free lists (the stage after isolation). >> >> One idea was to always isolate larger chunks, and handle movability checks/split/etc >> at a later stage. Once isolation would be decoupled from the actual/original migratetype, >> the could have been easier to handle (especially some corner cases I had in mind back then). > > I think it is a good idea. When I coded alloc_contig_range() up, I tried to > accommodate existing set_migratetype_isolate(), which calls has_unmovable_pages(). > If these two are decoupled, set_migrateype_isolate() can work on MAX_ORDER-aligned > ranges and has_unmovable_pages() can still work on pageblock-aligned ranges. > Let me give this a try. > But again, just some thought I had back then, maybe it doesn't help for anything; I found more time to look into the whole thing in more detail. >> >>> If MIGRATE_ISOLATE is a separate flag and we are OK with leaving isolated pages >>> in their original migratetype and check migratetype before allocating a page, >>> that might help. But that might add extra work (e.g., splitting a partially >>> isolated free page before allocation) in the really hot code path, which is not >>> desirable. >> >> With MIGRATE_ISOLATE being a separate flag, one idea was to have not a single >> separate isolate list, but one per "proper migratetype". But again, just some random >> thoughts I had back then, I never had sufficient time to think it all through. > > Got it. I will think about it. > > One question on separate MIGRATE_ISOLATE: > > the implementation I have in mind is that MIGRATE_ISOLATE will need a dedicated flag > bit instead of being one of migratetype. But now there are 5 migratetypes + Exactly what I was concerned about back then ... > MIGRATE_ISOLATE and PB_migratetype_bits is 3, so an extra migratetype_bit is needed. > But current migratetype implementation is a word-based operation, requiring > NR_PAGEBLOCK_BITS to be divisor of BITS_PER_LONG. This means NR_PAGEBLOCK_BITS > needs to be increased from 4 to 8 to meet the requirement, wasting a lot of space. ... until I did the math. Let's assume a pageblock is 2 MiB. 4/(2* 1024 * 1024 * 8) = 0,00002384185791016 % 8/(2* 1024 * 1024 * 8) -> 1 / (2* 1024 * 1024) = 0,00004768371582031 % For a 1 TiB machine that means 256 KiB vs. 512 KiB I concluded that "wasting a lot of space" is not really the right word to describe that :) Just to put it into perspective, the memmap (64/4096) for a 1 TiB machine is ... 16 GiB. > An alternative is to have a separate array for MIGRATE_ISOLATE, which requires > additional changes. Let me know if you have a better idea. Thanks. It would probably be cleanest to just use one byte per pageblock. That would cleanup the whole machinery eventually as well. -- Cheers, David / dhildenb