From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BF0BC43460 for ; Tue, 20 Apr 2021 09:03:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A3E7F6127C for ; Tue, 20 Apr 2021 09:03:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3E7F6127C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F0EC06B0036; Tue, 20 Apr 2021 05:03:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E96F46B006E; Tue, 20 Apr 2021 05:03:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC2B76B0070; Tue, 20 Apr 2021 05:03:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0051.hostedemail.com [216.40.44.51]) by kanga.kvack.org (Postfix) with ESMTP id A85226B0036 for ; Tue, 20 Apr 2021 05:03:49 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 448D02477 for ; Tue, 20 Apr 2021 09:03:49 +0000 (UTC) X-FDA: 78052157778.16.A4D76F6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 1210680192E7 for ; Tue, 20 Apr 2021 09:03:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1618909428; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zOmqfkZfJ81cGODs96B5RU3aGjEyMHtTRJDCqNpS4YI=; b=FN0Ja/cvm7Bl/F4MSgWPK/+OOr3vGCYjjGJ1+pQLsoN93f2H8R+EVcVMMXmLYxthwxA52W Zbvk0nmxpsLS8Y9zf2KKt7IgWvRaxqEGIQBwwyTho+xU9x8TwwQNgkkup8nRYouKE3cdcp nbKG5/5Nf2VFyc6XfLEWCkEEMJKUkpg= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-17-7u4EJZkhOVaQvmTxJMp2XA-1; Tue, 20 Apr 2021 05:03:46 -0400 X-MC-Unique: 7u4EJZkhOVaQvmTxJMp2XA-1 Received: by mail-ej1-f72.google.com with SMTP id bx15-20020a170906a1cfb029037415131f28so4559260ejb.18 for ; Tue, 20 Apr 2021 02:03:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=zOmqfkZfJ81cGODs96B5RU3aGjEyMHtTRJDCqNpS4YI=; b=RWnQZQXVnrxH87CFttCl09dyJepFkzeMFGSTcrGp3mcivzAp3j1aqYRpQYHNu//+H6 cpyMPiVZJ7qrKeyx0gMvi0hIE6uEiJNT3sJpKVlq+hyMGD+Gd7KHf+TzoB69IdrKz+nT HyGhmYzc848Wi5OvcaUYR9Nyjdf5iarmfSvytyBFcCSf0hJuKcOu5ngvyF73tgBqgOuu csMCMGzjE0kUrsCEqLqDCjsl9vzbCH6NANrkB1IIAiDB7yB1+Nex9fSo2A//cwGl3gAT svU2KEWIIE7kST6sJEwBDQj7wUH9v1HiWwx+lzfoDm7OSD9cK54c/+4i6LnVujSqO5lL z/4g== X-Gm-Message-State: AOAM5335S6ER39BDYm96Z5uJmsi3okIonf+tpNvCV5uJo/jF7n9U8BGc txwHJrDLDGHugxRA8NwwA0FnIgIKjN3lbi7bWu8JqT/vphw8dtvcPgXPoN2dVm/0VT3TUlpAGN+ 7YDRtftYgEP8= X-Received: by 2002:aa7:cd6e:: with SMTP id ca14mr14443020edb.111.1618909425360; Tue, 20 Apr 2021 02:03:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxZ3pRylkvogdCY5d+NpiLzbIuTxXYQR0frVwMnQm2/ZbN0JlGpq74b49bjg7gPVq+dRBi8IA== X-Received: by 2002:aa7:cd6e:: with SMTP id ca14mr14442958edb.111.1618909425009; Tue, 20 Apr 2021 02:03:45 -0700 (PDT) Received: from [192.168.3.132] (p4ff2390a.dip0.t-ipconnect.de. [79.242.57.10]) by smtp.gmail.com with ESMTPSA id gu14sm10625639ejb.114.2021.04.20.02.03.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 20 Apr 2021 02:03:44 -0700 (PDT) To: Christoph Lameter , Anshuman Khandual Cc: linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, "linuxppc-dev @ lists . ozlabs . org" , "linux-ia64@vger.kernel.org" , Vlastimil Babka , Michal Hocko , Mel Gorman , Mike Kravetz , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras References: <1618199302-29335-1-git-send-email-anshuman.khandual@arm.com> <09284b9a-cfe1-fc49-e1f6-3cf0c1b74c76@arm.com> <162877dd-e6ba-d465-d301-2956bb034429@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH V2] mm/page_alloc: Ensure that HUGETLB_PAGE_ORDER is less than MAX_ORDER Message-ID: <01bdeedc-f77d-ebd0-9d42-62f09b0a2d1a@redhat.com> Date: Tue, 20 Apr 2021 11:03:43 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1210680192E7 X-Stat-Signature: 3esb7knjfnikwkbejfg7u833936q4abj Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf16; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618909427-247729 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Christoph, thanks for your insight. > You can have larger blocks but you would need to allocate multiple > contigous max order blocks or do it at boot time before the buddy > allocator is active. >=20 > What IA64 did was to do this at boot time thereby avoiding the buddy > lists. And it had a separate virtual address range and page table for t= he > huge pages. >=20 > Looks like the current code does these allocations via CMA which should > also bypass the buddy allocator. Using CMA doesn't really care about the pageblock size when it comes to=20 fragmentation avoidance a.k.a. somewhat reliable allocation of memory=20 chunks with an order > MAX_ORDER - 1. IOW, when using CMA for hugetlb, we don't need pageblock_order >=20 MAX_ORDER - 1. >=20 >=20 >>> =C2=A0=C2=A0=C2=A0=C2=A0} >>> >>> >>> But it's kind of weird, isn't it? Let's assume we have MAX_ORDER - 1 = correspond to 4 MiB and pageblock_order correspond to 8 MiB. >>> >>> Sure, we'd be grouping pages in 8 MiB chunks, however, we cannot even >>> allocate 8 MiB chunks via the buddy. So only alloc_contig_range() >>> could really grab them (IOW: gigantic pages). >> >> Right. >=20 > But then you can avoid the buddy allocator. >=20 >>> Further, we have code like deferred_free_range(), where we end up >>> calling __free_pages_core()->...->__free_one_page() with >>> pageblock_order. Wouldn't we end up setting the buddy order to >>> something > MAX_ORDER -1 on that path? >> >> Agreed. >=20 > We would need to return the supersized block to the huge page pool and = not > to the buddy allocator. There is a special callback in the compound pag= e > sos that you can call an alternate free function that is not the buddy > allocator. Sorry, but that doesn't make any sense. We are talking about bringup=20 code, where we transition from memblock to the buddy and fill the free=20 page lists. Looking at the code, deferred initialization of the memmap=20 is broken on these setups -- so I deferred memmap init is never enabled. >=20 >> >>> >>> Having pageblock_order > MAX_ORDER feels wrong and looks shaky. >>> >> Agreed, definitely does not look right. Lets see what other folks >> might have to say on this. >> >> + Christoph Lameter >> >=20 > It was done for a long time successfully and is running in numerous > configurations. Enforcing pageblock_order < MAX_ORDER would mean that runtime allocation=20 of gigantic (here:huge) pages (HUGETLB_PAGE_ORDER >=3D MAX_ORDER) via=20 alloc_contig_pages() becomes less reliable. To compensate, relevant=20 archs could switch to "hugetlb_cma=3D", to improve the reliability of=20 runtime allocation. I wonder which configurations we are talking about: a) ia64 At least I couldn't care less; it's a dead architecture -- not sure how much people care about "more reliable runtime allocation of gigantic (here: huge) pages". Also, not sure about which=20 exact configurations. b) ppc64 We have variable hpage size only with CONFIG_PPC_BOOK3S_64. We=20 initialize the hugepage either to 1M, 2M or 16M. 16M seems to be the=20 primary choice. ppc64 has CONFIG_FORCE_MAX_ZONEORDER default "9" if PPC64 && PPC_64K_PAGES -> 16M effective buddy maximum size default "13" if PPC64 && !PPC_64K_PAGES -> 16M effective buddy maximum size So I fail to see in which scenario we even could end up with=20 pageblock_order < MAX_ORDER. I did not check ppc32. --=20 Thanks, David / dhildenb