From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC0E0C433EF for ; Fri, 17 Dec 2021 14:51:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A81666B0071; Fri, 17 Dec 2021 09:51:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A31146B0072; Fri, 17 Dec 2021 09:51:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D15E6B0073; Fri, 17 Dec 2021 09:51:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id 7E2A46B0071 for ; Fri, 17 Dec 2021 09:51:47 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3843D8913D for ; Fri, 17 Dec 2021 14:51:37 +0000 (UTC) X-FDA: 78927575034.10.3A04E52 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 096CF80017 for ; Fri, 17 Dec 2021 14:51:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639752696; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7G7kLBf3cDTmAHD3fTDS43u/Jr145OqPL9rk19ivgYE=; b=UUz+gOHfc+erydeAbam+9JIDP1dT2dbfvrfhP6gt0MHHmiQZmJ/GLg2UfRjjnyoHI4IVtg 7VbaLW9x6WaKZZtjCqbpBF8DTIThCNy+SQRA9OB8Tt6QGase+ZyZrGWU5rp4MaVqJQHHh9 tp9JFJOO0JVi5zebCp3VCOW8yVo22ng= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-629-XRjSXPDKNnS0S4b2K7FwvA-1; Fri, 17 Dec 2021 09:51:33 -0500 X-MC-Unique: XRjSXPDKNnS0S4b2K7FwvA-1 Received: by mail-wr1-f69.google.com with SMTP id a11-20020adffb8b000000b001a0b0f4afe9so697560wrr.13 for ; Fri, 17 Dec 2021 06:51:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=7G7kLBf3cDTmAHD3fTDS43u/Jr145OqPL9rk19ivgYE=; b=hKlJcYxVsXqM5C1DgClziCHS9Nfg+S/PJypv1yGpZWTOZLb1w12knYx2ME+Cis3jE2 b20aHyV09ldrTtcp1gBPIJJEgcgc5rAITTt39ohOUIRriN66EUJzmmLTAFxII4VVX40p VsfuWPxZlvezkfNPdtCz2ru2NbrHH+ey0hVVPPpHs8ZBFj6VUJclntkBBvnXObmQfhwg UvgdZnbtFgjszuWRJ3aBpfhgXNM1pI+ytU/jEY0a9xE/2gdu0dgh0hAk0AA7uNJCNfcr NPR3iZxO3xXQpOYjK9lzy8hOjofGJvsoUIzOAYTKwiiwC9K11RkxE6fqevcIBdzkHlZ8 gXaw== X-Gm-Message-State: AOAM533Q2RP4SPMNDoJgutkEmjr6GA/muVkKtQzzWoSG2tEpa9JfHd9d C3dUIyCwu09BFQKtZkS8RJjMqb45WqXD4t7AoHvfEN4yvu+2xe9a6T1ieC2ooUAUrKKrs2GnbX5 8yL/QNPecGnM= X-Received: by 2002:a5d:6785:: with SMTP id v5mr2885213wru.380.1639752692681; Fri, 17 Dec 2021 06:51:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJzNytwfA2+wqF/zQUFyw6Kzo54wqJWMA415K6kBZSHpX1HJ4kxzLXiIcvCZ83/k5QSVwQrpfg== X-Received: by 2002:a5d:6785:: with SMTP id v5mr2885198wru.380.1639752692369; Fri, 17 Dec 2021 06:51:32 -0800 (PST) Received: from [192.168.3.132] (p4ff234b8.dip0.t-ipconnect.de. [79.242.52.184]) by smtp.gmail.com with ESMTPSA id d2sm7258316wmb.31.2021.12.17.06.51.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Dec 2021 06:51:31 -0800 (PST) Message-ID: <6e5ebc19-890c-b6dd-1924-9f25c441010d@redhat.com> Date: Fri, 17 Dec 2021 15:51:31 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 To: Michal Hocko , Andrew Morton , Alexey Makhalov Cc: LKML , linux-mm@kvack.org, Dennis Zhou , Eric Dumazet , Oscar Salvador , Tejun Heo , Christoph Lameter , Nico Pache References: <20211214100732.26335-1-mhocko@kernel.org> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 0/4] mm, memory_hotplug: handle unitialized numa node gracefully In-Reply-To: <20211214100732.26335-1-mhocko@kernel.org> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UUz+gOHf; spf=none (imf02.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 096CF80017 X-Stat-Signature: qd1h79pn8txjt9z64f3947hccfyamz56 X-HE-Tag: 1639752693-561216 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 14.12.21 11:07, Michal Hocko wrote: > Hi, > this should be the full bundle for now. I have ended up with 4 patches. > The primary fix is patch 2 (should be reasonably easy to backport to > older kernels if there is any need for that). Patches 3 and 4 are mere > clean ups. > > I will repost once this can get some testing from Alexey. Shouldn't be > too much different from http://lkml.kernel.org/r/YbHfBgPQMkjtuHYF@dhcp22.suse.cz > with the follow up fix squashed in. > > I would really appreciate to hear more about http://lkml.kernel.org/r/YbMZsczMGpChaWz0@dhcp22.suse.cz > because I would like to add that information to the changelog as well. > > Thanks for the review and testing. Playing with memory hotplug only (only one hotpluggable node is possible with QEMU right now as only one will get added to SRAT with the hotplug range) Start with one empty node: #! /bin/bash sudo qemu/build/qemu-system-x86_64 \ --enable-kvm \ -m 8G,slots=2,maxmem=16G \ -object memory-backend-ram,id=mem0,size=4G \ -object memory-backend-ram,id=mem1,size=4G \ -numa node,cpus=0-1,nodeid=0,memdev=mem0 \ -numa node,cpus=2-3,nodeid=1,memdev=mem1 \ -numa node,nodeid=2 \ -smp 4 \ -drive file=/home/dhildenb/git/Fedora-Cloud-Base-33-1.2.x86_64.qcow2,format=qcow2,if=virtio \ -cpu host \ -machine q35 \ -nographic \ -nodefaults \ -monitor unix:/var/tmp/monitor,server,nowait \ -chardev stdio,id=serial,signal=off \ -device isa-serial,chardev=serial 1. Guest state when booting [ 0.002506] SRAT: PXM 0 -> APIC 0x00 -> Node 0 [ 0.002508] SRAT: PXM 0 -> APIC 0x01 -> Node 0 [ 0.002510] SRAT: PXM 1 -> APIC 0x02 -> Node 1 [ 0.002511] SRAT: PXM 1 -> APIC 0x03 -> Node 1 [ 0.002513] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] [ 0.002515] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x7fffffff] [ 0.002517] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x17fffffff] [ 0.002518] ACPI: SRAT: Node 1 PXM 1 [mem 0x180000000-0x27fffffff] [ 0.002520] ACPI: SRAT: Node 2 PXM 2 [mem 0x280000000-0x4ffffffff] hotplug [ 0.002523] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0x7fffffff] -> [mem 0x00000000 -0x7fffffff] [ 0.002525] NUMA: Node 0 [mem 0x00000000-0x7fffffff] + [mem 0x100000000-0x17fffffff] -> [mem 0x000000 00-0x17fffffff] [ 0.002533] NODE_DATA(0) allocated [mem 0x17ffd5000-0x17fffffff] [ 0.002716] NODE_DATA(1) allocated [mem 0x27ffd5000-0x27fffffff] [ 0.017960] Zone ranges: [ 0.017966] DMA [mem 0x0000000000001000-0x0000000000ffffff] [ 0.017969] DMA32 [mem 0x0000000001000000-0x00000000ffffffff] [ 0.017971] Normal [mem 0x0000000100000000-0x000000027fffffff] [ 0.017972] Device empty [ 0.017974] Movable zone start for each node [ 0.017976] Early memory node ranges [ 0.017977] node 0: [mem 0x0000000000001000-0x000000000009efff] [ 0.017979] node 0: [mem 0x0000000000100000-0x000000007ffd5fff] [ 0.017980] node 0: [mem 0x0000000100000000-0x000000017fffffff] [ 0.017982] node 1: [mem 0x0000000180000000-0x000000027fffffff] [ 0.017984] Initmem setup node 0 [mem 0x0000000000001000-0x000000017fffffff] [ 0.017990] Initmem setup node 1 [mem 0x0000000180000000-0x000000027fffffff] [ 0.017993] Node 2 uninitialized by the platform. Please report with boot dmesg. [ 0.018008] Initmem setup node 2 [mem 0x0000000000000000-0x0000000000000000] [ 0.018011] On node 0, zone DMA: 1 pages in unavailable ranges [ 0.018031] On node 0, zone DMA: 97 pages in unavailable ranges [ 0.023622] On node 0, zone Normal: 42 pages in unavailable ranges # numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 node 0 size: 3921 MB node 0 free: 3638 MB node 1 cpus: 2 3 node 1 size: 4022 MB node 1 free: 3519 MB node distances: node 0 1 0: 10 20 1: 20 10 # cat /sys/devices/system/node/online 0-1 # cat /sys/devices/system/node/possible 0-2 3. Hotplug a DIMM and online it to ZONE_MOVABLE # echo online_movable > /sys/devices/system/memory/auto_online_blocks $ echo "object_add memory-backend-ram,id=hmem0,size=8G" | sudo nc -U /var/tmp/monitor ; echo $ echo "device_add pc-dimm,id=dimm0,memdev=hmem0,node=2" | sudo nc -U /var/tmp/monitor ; echo 4. Guest state after hotplug [ 334.541452] Built 2 zonelists, mobility grouping on. Total pages: 1999733 [ 334.541908] Policy zone: Normal [ 334.559853] Fallback order for Node 0: 0 2 1 [ 334.560234] Fallback order for Node 1: 1 2 0 [ 334.560524] Fallback order for Node 2: 2 0 1 [ 334.560810] Built 3 zonelists, mobility grouping on. Total pages: 2032501 [ 334.561281] Policy zone: Normal # numactl --hardware available: 3 nodes (0-2) node 0 cpus: 0 1 node 0 size: 3921 MB node 0 free: 3529 MB node 1 cpus: 2 3 node 1 size: 4022 MB node 1 free: 3564 MB node 2 cpus: node 2 size: 8192 MB node 2 free: 8192 MB node distances: node 0 1 2 0: 10 20 20 1: 20 10 20 2: 20 20 10 # cat /sys/devices/system/node/online 0-2 # cat /sys/devices/system/node/possible 0-2 # cat /sys/devices/system/node/has_memory 0-2 # cat /sys/devices/system/node/has_normal_memory 0-1 # cat /sys/devices/system/node/has_cpu 0-1 5. Unplug DIMM $ echo "device_del dimm0" | sudo nc -U /var/tmp/monitor ; echo 6. Guest state after unplug [ 494.218938] Fallback order for Node 0: 0 2 1 [ 494.219315] Fallback order for Node 1: 1 2 0 [ 494.219626] Fallback order for Node 2: 2 0 1 [ 494.220430] Built 3 zonelists, mobility grouping on. Total pages: 1999736 [ 494.221024] Policy zone: Normal # numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 node 0 size: 3921 MB node 0 free: 3661 MB node 1 cpus: 2 3 node 1 size: 4022 MB node 1 free: 3565 MB node distances: node 0 1 0: 10 20 1: 20 10 # cat /sys/devices/system/node/online 0-1 # cat /sys/devices/system/node/possible 0-2 7. Hotplug DIMM + online to ZONE_NORMAL # echo online_kernel > /sys/devices/system/memory/auto_online_blocks $ echo "device_add pc-dimm,id=dimm0,memdev=hmem0,node=2" | sudo nc -U /var/tmp/monitor ; echo 8. Guest state after hotplug # numactl --hardware available: 3 nodes (0-2) node 0 cpus: 0 1 node 0 size: 3921 MB node 0 free: 3534 MB node 1 cpus: 2 3 node 1 size: 4022 MB node 1 free: 3567 MB node 2 cpus: node 2 size: 8192 MB node 2 free: 8192 MB node distances: node 0 1 2 0: 10 20 20 1: 20 10 20 2: 20 20 10 # cat /sys/devices/system/node/online 0-2 # cat /sys/devices/system/node/possible 0-2 # cat /sys/devices/system/node/has_memory 0-2 # cat /sys/devices/system/node/has_normal_memory 0-2 # cat /sys/devices/system/node/has_cpu 0-1 No surprises found so far. I'll be most offline for the next 2 weeks, so an official review might take some more time. -- Thanks, David / dhildenb