From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8100EEB64DA for ; Fri, 14 Jul 2023 08:35:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17A946B0072; Fri, 14 Jul 2023 04:35:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 103906B0074; Fri, 14 Jul 2023 04:35:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9AD66B0075; Fri, 14 Jul 2023 04:35:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D72B26B0072 for ; Fri, 14 Jul 2023 04:35:57 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id AADB7C016F for ; Fri, 14 Jul 2023 08:35:57 +0000 (UTC) X-FDA: 81009559554.18.4CF60D7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf03.hostedemail.com (Postfix) with ESMTP id 54F952000A for ; Fri, 14 Jul 2023 08:35:55 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZB2qwl9+; spf=pass (imf03.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689323755; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fH2TVmHvkuGKbv2hcI4xKi7nRAASFrz5xNlTKuAGh58=; b=JTe803AfS7axscrGloBRq2hx10sa6C5e2JWlQJSTQKb0nK9IrMA81i1jvUvHEzVF2/PnpP QAg6ifdWC5M1lCmcnzxwhwqZNoJw0rW+8yCLU2xYqc1b7Dp6REq6lWOa29T/7+MNsqMsMq 9VGCjG4SpGJToP4OTKrZi3+gios7Zig= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZB2qwl9+; spf=pass (imf03.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689323755; a=rsa-sha256; cv=none; b=dRhyfxeSwdd6FOzFFDl1Kki5pNpu+xVdRNCAv0ObP14PROOCLovGkXNkK2KAQBuWLZnOzb Ssp3c7T3B+ucYctK9DH57btH72yWWZMxb2jFcc6kNxDFC2/H3GFTWxb1+1c+4W6RPVcrKI Y5T9N5gFuZ+StDzUuV9l0K+LxDlELMo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689323754; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fH2TVmHvkuGKbv2hcI4xKi7nRAASFrz5xNlTKuAGh58=; b=ZB2qwl9+GeGxinzIq0YGNbKe4bwCd+SZ4LWHo/RKqC5T8MaqWMxisCdN8WVOEPYFu7MODR T9GB6wg5R5qqPp4TLL69kcfbAO6cZm84Nh/qQeZji8fxYki3cZ+zVBnojYHKqhHjD8HPsf 15EZ/NiXUJSoA5cXsWy5HIbYyf/tPv0= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-540-Yq0YdznUNe2MOXYQOg5IzA-1; Fri, 14 Jul 2023 04:35:51 -0400 X-MC-Unique: Yq0YdznUNe2MOXYQOg5IzA-1 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-3113da8b778so990979f8f.3 for ; Fri, 14 Jul 2023 01:35:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689323750; x=1691915750; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fH2TVmHvkuGKbv2hcI4xKi7nRAASFrz5xNlTKuAGh58=; b=bsuIQdHFM3/tLGhcqQXBGscoyW2YZyogHNBCRiRMSLl7+DI4RNgeqFStdnteZhT20x lyhVoxLGEJy3r6vGzNEvIFYCxpZehtylW4cZlAxxrVUZiRM3GnEYRkRuy+VBGxu5orl6 M6annWXCY22kjR939j1pNvAlTjn8hE77uA2G/qegWQgulfx5T66wSeylMuUwYhGGGQGf c+cSvlJPFP4AJ1frbztrvwbOVIgcTsDe+bpawWiH2XnuJoh9IVIHRO3tQBo3Au04KN0g OmHOWX8VN/X4CP+/QK5yN8uBh+SUnM4giKwrcjeX1LMqSBjX0DZOju+oep6SYQcVm08y wiiA== X-Gm-Message-State: ABy/qLYSySZycIJQeoiZvg9EO5XHIAAH7bHSKK4n80NYKvIsAyWs8IGn t3fNmJFAxTcoQE4uefQzTj8ALhd62sDWkrG2wvufK1csq7jRndhyjigUAvine8f3VV4W2LJ9eud rZ52VfwLV9A4= X-Received: by 2002:a5d:4006:0:b0:314:21b4:8322 with SMTP id n6-20020a5d4006000000b0031421b48322mr3144968wrp.10.1689323750010; Fri, 14 Jul 2023 01:35:50 -0700 (PDT) X-Google-Smtp-Source: APBJJlGnmP51HKlsUWs7J70Wsu4Nis9jH3dnqaKZpH2XGHxORN4tmFKjwWx8/wDgKSU/KYu85DXo5Q== X-Received: by 2002:a5d:4006:0:b0:314:21b4:8322 with SMTP id n6-20020a5d4006000000b0031421b48322mr3144944wrp.10.1689323749637; Fri, 14 Jul 2023 01:35:49 -0700 (PDT) Received: from ?IPV6:2003:cb:c70a:4500:8a9e:a24a:133d:86bb? (p200300cbc70a45008a9ea24a133d86bb.dip0.t-ipconnect.de. [2003:cb:c70a:4500:8a9e:a24a:133d:86bb]) by smtp.gmail.com with ESMTPSA id l18-20020a5d6752000000b0031434cebcd8sm10132122wrw.33.2023.07.14.01.35.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 14 Jul 2023 01:35:49 -0700 (PDT) Message-ID: Date: Fri, 14 Jul 2023 10:35:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Jeff Moyer , Dan Williams Cc: Vishal Verma , "Rafael J. Wysocki" , Len Brown , Andrew Morton , Oscar Salvador , Dave Jiang , linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, Huang Ying , Dave Hansen References: <20230613-vv-kmem_memmap-v1-0-f6de9c6af2c6@intel.com> <29c9b998-f453-59f2-5084-9b4482b489cf@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH 0/3] mm: use memmap_on_memory semantics for dax/kmem In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 54F952000A X-Rspam-User: X-Stat-Signature: edp1aonpnkjpzttuybfi6j1sqmioui8p X-Rspamd-Server: rspam01 X-HE-Tag: 1689323755-479317 X-HE-Meta: U2FsdGVkX18cxYAYDpWQY5abMlH0dztIOnRiUZO6itnRFLjyQW+l75gM7Pk8pUZXVLfge/B+lziuz6rQ7d8zTn2BEO9+PBa9xwtyjAJe+vMvzDcXLM9r/CKGPGlkh49prtnyINALk+nZI9XfzYeINPZjNuCtFkISO7G6fPAaFU713G8sr/iH9TMFMXPybrptzLyrQAfKkarMWLeISlsvodR+6vSJn+5usy/4tMN9DL/kny25Xcjql2vJtSbC9E4BwxiWKXWPUp4pWYFsjXhwqEBL1XAKC+e5RcEKCuCK8IzMQy7Wd5oc4aGqtd4Vp816nRMTgUD5ptV8YLhUgyCLTUuZ/yDgJ6PQ8QoPgWeyi/x0CXUopZ7RkXNe+RcqOMfxHDubePC82ZT6EgwEhP7FRKIgtw456wLLlcA7w7JFsy912YZ0fVvrRtbaQDoPgfwNQErTGIxHlUVt8bHpNOOzxcILQfzUcm7X0QEVv1ZxyR+sR4gVZclt04Dk0xNcNGc3fHDdcO7JP2GiDbJwxKfhDmVPZjd/9wXP10xAp6NmnbUrnHsr35WatDpoql2Mdx8ZniaKSLKq/W970B1kSGiFAV7MqF+RenQamblkk8Okaq40mb/lpgRjHQQPgZ0F4OgvkAonPXeMHXBiVdP4HXBKpkpdrfYW7IZNaPhbMv9NR3ccqVyX40dM2EsOgBmntEQXkhpW5z4hZ6c7mp9eWBr2Cn/NiHCJ2k3OjSBvB847X9p3BSRB9CvHPRE8w3fvMV9XL8FxaIueuedsyuTZa/Ncimn8LcazbjH++EQZoXhDeX72UKJ3KHp/0svPb1bICFU7Mv/3bxqYxWm4YU4kE5u30WWK1wwZlFGozVzqiy/WTEA1PpeizLvIbDJOLtJC2Ai5VjGcgXG+xJB3p4r9l2DazTDwzt7D/RM43QDTfNCK6rh0RS8/2msJUuJcRo1E/XHHkDS1ArGOoPEelO+VaQJ 9MQzQhjH SxwSNcCU2pNSwlFRHmMado4KLOLrr4vd4SUmvp1QdL7qG4B+Ww+jTnwamQQfKHScP8sgR/d18kZ8IIwdFgdmqMVUPi+xOkumsfBHYVms0UnEMhYsI2hAT68qp26W/cZDfFzvSDdThMuDJo/0luwGt2XJ/QShIwB+y8qEKsOA4LWi2pftvtEUEB6igQrSxU6hlFyk0pN7NvNwHTQ74phI19+UhbHFaqH0riYvO+V8vKoegfYVYoHSgh2sF+0DU6JL4tZiLZ5rRWX30WGgBFONnfMwpU+PnIvjUCaliGJ531jy1vWd4RnHRPaC99I36EhMxhA1biH+Cn1nAv5kbhiQTDe4642/iRdRAkCZR/1A8ufb23Xa6ofHnRusnpFvUEU2oVzR8mUoWwy3RRK/OA7/dOloqZKi0DIOnzld/zWn1Yl9tN1CVlxQ1IAchnV/G7Iv8f2cK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 13.07.23 21:12, Jeff Moyer wrote: > David Hildenbrand writes: > >> On 16.06.23 00:00, Vishal Verma wrote: >>> The dax/kmem driver can potentially hot-add large amounts of memory >>> originating from CXL memory expanders, or NVDIMMs, or other 'device >>> memories'. There is a chance there isn't enough regular system memory >>> available to fit ythe memmap for this new memory. It's therefore >>> desirable, if all other conditions are met, for the kmem managed memory >>> to place its memmap on the newly added memory itself. >>> >>> Arrange for this by first allowing for a module parameter override for >>> the mhp_supports_memmap_on_memory() test using a flag, adjusting the >>> only other caller of this interface in dirvers/acpi/acpi_memoryhotplug.c, >>> exporting the symbol so it can be called by kmem.c, and finally changing >>> the kmem driver to add_memory() in chunks of memory_block_size_bytes(). >> >> 1) Why is the override a requirement here? Just let the admin >> configure it then then add conditional support for kmem. >> >> 2) I recall that there are cases where we don't want the memmap to >> land on slow memory (which online_movable would achieve). Just imagine >> the slow PMEM case. So this might need another configuration knob on >> the kmem side. > > From my memory, the case where you don't want the memmap to land on > *persistent memory* is when the device is small (such as NVDIMM-N), and > you want to reserve as much space as possible for the application data. > This has nothing to do with the speed of access. Now that you mention it, I also do remember the origin of the altmap -- to achieve exactly that: place the memmap on the device. commit 4b94ffdc4163bae1ec73b6e977ffb7a7da3d06d3 Author: Dan Williams Date: Fri Jan 15 16:56:22 2016 -0800 x86, mm: introduce vmem_altmap to augment vmemmap_populate() In support of providing struct page for large persistent memory capacities, use struct vmem_altmap to change the default policy for allocating memory for the memmap array. The default vmemmap_populate() allocates page table storage area from the page allocator. Given persistent memory capacities relative to DRAM it may not be feasible to store the memmap in 'System Memory'. Instead vmem_altmap represents pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf() requests. In PFN_MODE_PMEM (and only then), we use the altmap (don't see a way to configure it). BUT that case is completely different from the "System RAM" mode. The memmap of an NVDIMM in pmem mode is barely used by core-mm (i.e., not the buddy). In comparison, if the buddy and everybody else works on the memmap in "System RAM", it's much more significant if that resides on slow memory. Looking at commit 9b6e63cbf85b89b2dbffa4955dbf2df8250e5375 Author: Michal Hocko Date: Tue Oct 3 16:16:19 2017 -0700 mm, page_alloc: add scheduling point to memmap_init_zone memmap_init_zone gets a pfn range to initialize and it can be really large resulting in a soft lockup on non-preemptible kernels NMI watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [kworker/u642:5:1720] [...] task: ffff88ecd7e902c0 ti: ffff88eca4e50000 task.ti: ffff88eca4e50000 RIP: move_pfn_range_to_zone+0x185/0x1d0 [...] Call Trace: devm_memremap_pages+0x2c7/0x430 pmem_attach_disk+0x2fd/0x3f0 [nd_pmem] nvdimm_bus_probe+0x64/0x110 [libnvdimm] It's hard to tell if that was only required due to the memmap for these devices being that large, or also partially because the access to the memmap is slower that it makes a real difference. I recall that we're also often using ZONE_MOVABLE on such slow memory to not end up placing other kernel data structures on there: especially, user space page tables as I've been told. @Dan, any insight on the performance aspects when placing the memmap on (slow) memory and having that memory be consumed by the buddy where we frequently operate on the memmap? -- Cheers, David / dhildenb