From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3F4FC47082 for ; Tue, 8 Jun 2021 13:04:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 76F716128A for ; Tue, 8 Jun 2021 13:04:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 76F716128A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0E6C96B006C; Tue, 8 Jun 2021 09:04:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BD996B006E; Tue, 8 Jun 2021 09:04:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2AEC6B0070; Tue, 8 Jun 2021 09:04:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0030.hostedemail.com [216.40.44.30]) by kanga.kvack.org (Postfix) with ESMTP id AE31D6B006C for ; Tue, 8 Jun 2021 09:04:27 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 40536AF98 for ; Tue, 8 Jun 2021 13:04:27 +0000 (UTC) X-FDA: 78230575374.18.6DEE331 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf10.hostedemail.com (Postfix) with ESMTP id 26064421108C for ; Tue, 8 Jun 2021 13:04:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1623157466; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=63K+n8ra9EjWBL1NetjRp60dPBs6gFo+zF/xCw+Leec=; b=VupLRvgdI83sy0wpd3D6cgAknOwcLvyHWh5tkHo/bRpLuzwOOM+dbiT1NzvRPLg+gvKn3e FnU2QUFe4JLJcji0x0DHjMlxnnjyRqI6eNNGaoT59oTrYNfK5ETeqh4I8SoSZs+d0sOInj yFr1vEyPetW5hmjSEVFyIUIhUKXcvqw= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-276-Lw8Rs6R3NWOw1iUdC_VSAA-1; Tue, 08 Jun 2021 09:04:25 -0400 X-MC-Unique: Lw8Rs6R3NWOw1iUdC_VSAA-1 Received: by mail-wm1-f70.google.com with SMTP id v20-20020a05600c2154b029019a6368bfe4so737206wml.2 for ; Tue, 08 Jun 2021 06:04:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=63K+n8ra9EjWBL1NetjRp60dPBs6gFo+zF/xCw+Leec=; b=cGpjKlNKr0eyeL8UNkOKmQzBm9v/YG90hujUeuv8oc2/5UHEHV/THRQLJba54fmBQY RYmVM90F0iezz7lBcC34Hh2DbP0eVy0fdAeHbqWznFyg20kO/woZWxAJ7GF3dtAiEzk5 yO4uZryWBGw18finRBi+GLgt8e3deGKr9S2G9gXlqzBqKJYItvrtor2KDOUcfCI9IWOs jpj3eBNRGWwZmKkXxC4YG4APzxDdJsK7Yr6qGzGOHtuW6uMbf/4jWJVlFkU1fr053Cnv o3BWMRKd2RtKdnT070mGMe+yWzqvTIGrxKGpJVyKBanIe7hRTQEG7wnotKoepOwZyoUx se7g== X-Gm-Message-State: AOAM530hJf0nDLTVFeBlKRPwPoMHBba9YCuH10TN1U6L9/5klEL9AxM8 2MbNgKDxnpLjsZz8rK+QvwovzXjwYSdt3W+8Ubk9ZsawSbtHebLcuzluUSPP881Lh+tB3OPpbZa 2/cs517gdK50= X-Received: by 2002:adf:f748:: with SMTP id z8mr23474718wrp.115.1623157463793; Tue, 08 Jun 2021 06:04:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzKX+jixFhuThJLNsGu42jFOnM7fD2bzodnOrMQCqVuLLv2T84a1VIDJINmkFwrSXU0UYLhBg== X-Received: by 2002:adf:f748:: with SMTP id z8mr23474659wrp.115.1623157463456; Tue, 08 Jun 2021 06:04:23 -0700 (PDT) Received: from [192.168.3.132] (p5b0c61cf.dip0.t-ipconnect.de. [91.12.97.207]) by smtp.gmail.com with ESMTPSA id o5sm9828081wrw.65.2021.06.08.06.04.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 08 Jun 2021 06:04:20 -0700 (PDT) To: Mike Rapoport Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Oscar Salvador , Michal Hocko , Mike Kravetz , Dave Hansen , Matthew Wilcox , Anshuman Khandual , Muchun Song , Pavel Tatashin , Jonathan Corbet , Stephen Rothwell , linux-doc@vger.kernel.org References: <20210525102604.8770-1-david@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1] memory-hotplug.rst: complete admin-guide overhaul Message-ID: <5e01bd6f-4073-1ebb-489d-2e5c529909a2@redhat.com> Date: Tue, 8 Jun 2021 15:04:19 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VupLRvgd; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf10.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Stat-Signature: kwbbyswrjbwmepu6wnjz7bih5kdus5ig X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 26064421108C X-HE-Tag: 1623157464-321594 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> +ZONE_MOVABLE >> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> + >> +ZONE_MOVABLE is an important mechanism for more reliable memory offli= ning. >> +Further, having system RAM managed by ZONE_MOVABLE instead of one of = the >> +kernel zones can increase the number of possible transparent huge pag= es and >> +dynamically allocated huge pages. >> + >=20 > I'd move the first two paragraphs from "Zone Imbalances" here to provid= e > some context what is movable and what is unmovable allocation. Makes sense. [...] >> -How to offline memory >> ---------------------- >> +Considerations >=20 > ZONE_MOVABLE Sizing Considerations ? >=20 Ack > I'd also move the contents of "Boot Memory and ZONE_MOVABLE" here (with > some adjustments): >=20 > By default, all the memory configured at boot time is managed by the= kernel > zones and ZONE_MOVABLE is not used. >=20 > To enable ZONE_MOVABLE to include the memory present at boot and to > control the ratio between movable and kernel zones there are two com= mand > line options: ``kernelcore=3D`` and ``movablecore=3D``. See > Documentation/admin-guide/kernel-parameters.rst for their descriptio= n. >=20 Makes sense. I'll move it to the end of the "ZONE_MOVABLE Sizing=20 Considerations" section. >> +-------------- >> =20 >> -You can offline a memory block by using the same sysfs interface that= was used >> -in memory onlining:: >> +We usually expect that a large portion of available system RAM will a= ctually >> +be consumed by user space, either directly or indirectly via the page= cache. In >> +the normal case, ZONE_MOVABLE can be used when allocating such pages = just fine. >> =20 >> - % echo offline > /sys/devices/system/memory/memoryXXX/state >> +With that in mind, it makes sense that we can have a big portion of s= ystem RAM >> +managed by ZONE_MOVABLE. However, there are some things to consider w= hen >> +using ZONE_MOVABLE, especially when fine-tuning zone ratios: >> =20 >> -If offline succeeds, the state of the memory block is changed to be "= offline". >> -If it fails, some error core (like -EBUSY) will be returned by the ke= rnel. >> -Even if a memory block does not belong to ZONE_MOVABLE, you can try t= o offline >> -it. If it doesn't contain 'unmovable' memory, you'll get success. >> +- Having a lot of offline memory blocks. Even offline memory blocks c= onsume >> + memory for metadata and page tables in the direct map; having a lot= of >> + offline memory blocks is not a typical case, though. >> + >> +- Memory ballooning. Some memory ballooning implementations, such as >> + the Hyper-V balloon, the XEN balloon, the vbox balloon and the VMWa= re >=20 > So, everyone except virtio-mem? ;-) Well, virtio-mem does not classify as memory balloon in that sense, as=20 it only operates on own device memory ;) virtio-balloon and pseries CMM support balloon compaction. > I'd drop the names because if some of those will implement balloon > compaction they surely will forget to update the docs. I can do the opposite and mention the ones that already do. Some most=20 probably will never support it. "Memory ballooning without balloon compaction is incompatible with=20 ZONE_MOVABLE. Only some implementations, such as virtio-balloon and=20 pseries CMM, fully support balloon compaction." >=20 >> + balloon with huge pages don't support balloon compaction and, there= by >> + ZONE_MOVABLE. >> + >> + Further, CONFIG_BALLOON_COMPACTION might be disabled. In that case,= balloon >> + inflation will only perform unmovable allocations and silently crea= te a >> + zone imbalance, usually triggered by inflation requests from the >> + hypervisor. >> + >> +- Gigantic pages are unmovable, resulting in user space consuming a >> + lot of unmovable memory. >> + >> +- Huge pages are unmovable when an architectures does not support hug= e >> + page migration, resulting in a similar issue as with gigantic pages= . >> + >> +- Page tables are unmovable. Excessive swapping, mapping extremely la= rge >> + files or ZONE_DEVICE memory can be problematic, although only >> + really relevant in corner cases. When we manage a lot of user space= memory >> + that has been swapped out or is served from a file/pmem/... we stil= l need >=20 > ^ persistent memo= ry Agreed. >=20 >> + a lot of page tables to manage that memory once user space accessed= that >> + memory once. >> + >> +- DAX: when we have a lot of ZONE_DEVICE memory added to the system a= s DAX >> + and we are not using an altmap to allocate the memmap from device m= emory >> + directly, we will have to allocate the memmap for this memory from = the >> + kernel zones. >=20 > I'm not sure admin-guide reader will know when we use altmap when we do= n't. > Maybe >=20 > DAX: in certain DAX configurations the memory map for the device mem= ory will > be allocated from the kernel zones. Indeed, simpler and communicates the same message. I'll also add "KASAN can have a significant memory overhead, for example, consuming=20 1/8th of the total system memory size as (unmovable) tracking metadata." Thanks Mike! --=20 Thanks, David / dhildenb