From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBB86C7618D for ; Thu, 6 Apr 2023 12:27:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 567B86B007B; Thu, 6 Apr 2023 08:27:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 518086B007D; Thu, 6 Apr 2023 08:27:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B79B6B007E; Thu, 6 Apr 2023 08:27:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2746F6B007B for ; Thu, 6 Apr 2023 08:27:18 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DDF2C1612D9 for ; Thu, 6 Apr 2023 12:27:17 +0000 (UTC) X-FDA: 80650891314.19.8C8742B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 9A4E9A000E for ; Thu, 6 Apr 2023 12:27:15 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=a9RraZsF; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680784035; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v8VPDq2rzp/RMvuM0GkwvstrEUPv56pcKGMAu1vG79k=; b=eHSwyfrI13ggr70qHSwOQEOrIc9Jqic33IroFeIsjkCwvm/JdtSwYvl0fPWCP9YAxSef8i UHnSfK2aosGnSEih1nKUv00n7f9eKgodkuDLbHtqpWWM3jB9ueIDdE1EeI/rOrDVp0zhRw jt7YlqI2fohEOBNUrJs3MJBxLPWDAxQ= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=a9RraZsF; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680784035; a=rsa-sha256; cv=none; b=XeMygc6AEmaav6AWapSufMvSDULcBzuCWQv3iEPPHr5kf2CmZKCe5f5YwbmqYQHEhs979z zZ99ILBYgY/9nAMu16tciZCURSuFkPcZ8lQL25sC82mwiXwwf/YBwHry/a6MZj7GX9vb3i wzWm95ppfMizi5Cil+1I0os44LeiH0I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680784034; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v8VPDq2rzp/RMvuM0GkwvstrEUPv56pcKGMAu1vG79k=; b=a9RraZsFPSMrxw+eIFM6tSXZLCksAEJPdO9otz+wClItoboMB533Jmu7iSHRIdscmJYxzA Gn3Co7Wwr9CGUAqQpbnymIrsI9GZFV28ShoK+iwUA9D/1oAKjOzjdurmjqpIbQX/aXuwAF j0JjoBGkz1qtbwO99FbZufMG216YbEU= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-124-uEdJQxwdNguhuQ1FmswXkw-1; Thu, 06 Apr 2023 08:27:13 -0400 X-MC-Unique: uEdJQxwdNguhuQ1FmswXkw-1 Received: by mail-wr1-f72.google.com with SMTP id h18-20020adfa4d2000000b002cea098a651so4889560wrb.3 for ; Thu, 06 Apr 2023 05:27:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680784032; x=1683376032; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=v8VPDq2rzp/RMvuM0GkwvstrEUPv56pcKGMAu1vG79k=; b=Knpu0SMgco3SVJM0yOSGxRSmMgFhfEobYifkZN+oSwZ5m5ueG8NCC8me94bZg00dDL ZzmynRwrMk1O8u6Xq2jwhi8+4KU7DebEeB5Ko2B02yVaJU0eGmUIfMBoKhZBkI2qw6H7 gFcpHLt7DTeAZJ/RiE6CGt7JpUebnNUkkfVbnEFIJeoWI8zxptATX1qxhVIz1ywrbobW hjs2YcMDfHlv4xPlH64hoJTE8ymuFhD4NtYgdrZNmRtLv9Ps3jbNjF8qLsiQNhrPThsm /rKly4aUAna3jEEju6riJrcBJMw6QY90aX4XygyNGKkS0OIAhjog5rjYEanr+yWgTFqR PRzA== X-Gm-Message-State: AAQBX9e8JTQQWs/lQFOfivIgprIkEiv8cRUlHWQrB9nKSnZS87fW+Zo1 1Zpsf3WYIWjk7O6yG2Cl8WcVhSXE7usan/5wWoAlnFWluejiG/u8vxV5TA19AiQ6xgeNd51Vv5m a4MNY4vHviNA= X-Received: by 2002:a1c:7406:0:b0:3ee:18e:a1ef with SMTP id p6-20020a1c7406000000b003ee018ea1efmr6734880wmc.1.1680784032396; Thu, 06 Apr 2023 05:27:12 -0700 (PDT) X-Google-Smtp-Source: AKy350Zj5O+SuV6Tw/J9ZVlMtHct578hAy7ZuVMILZV4dGS4JekWIUunN5jJ+9DL6SdhkTn/sDSC0A== X-Received: by 2002:a1c:7406:0:b0:3ee:18e:a1ef with SMTP id p6-20020a1c7406000000b003ee018ea1efmr6734862wmc.1.1680784031963; Thu, 06 Apr 2023 05:27:11 -0700 (PDT) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id r6-20020a05600c35c600b003ede03e4369sm5268391wmq.33.2023.04.06.05.27.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Apr 2023 05:27:11 -0700 (PDT) Message-ID: <6ebf38f1-b7c4-cb38-b72f-2e406d2a2fdc@redhat.com> Date: Thu, 6 Apr 2023 14:27:10 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 To: Dan Williams , Matthew Wilcox Cc: Kyungsan Kim , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-cxl@vger.kernel.org, a.manzanares@samsung.com, viacheslav.dubeyko@bytedance.com, seungjun.ha@samsung.com, wj28.lee@samsung.com References: <20230405020027.413578-1-ks0204.kim@samsung.com> <642cfda9ccd64_21a8294fd@dwillia2-xfh.jf.intel.com.notmuch> <642dcf4169ae5_21a8294f@dwillia2-xfh.jf.intel.com.notmuch> From: David Hildenbrand Organization: Red Hat Subject: Re: FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL In-Reply-To: <642dcf4169ae5_21a8294f@dwillia2-xfh.jf.intel.com.notmuch> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 9A4E9A000E X-Stat-Signature: 6hpq3k4n16ibcd9cc9tbksz5oks4eby5 X-HE-Tag: 1680784035-457679 X-HE-Meta: U2FsdGVkX19wbb1ucgvs8Rc++D9gXD3LkBHAP1aWHl/E8tY9DtN46hN1itYVC50exTpK0WyxQ0vcFw8bD/gPdg1EVEviF7xc+7gwusFmAA6Fe2Hu2TMFlCG5cC2w3QWGYOpoUMtg8e896jGUsOJL6Q++oI586P7oEq4ZF7873ufTeUEKTTD5NE/qekGMTka+DRmuTWD3v+PzHdGlOUJN8pfZVSXO8M8Q+ApfmvCamhWWOf042SzTC53jNF2IK41B6rfq/V0ONgL3PL54k01uVeWIvTU/zpxOYCM2+GaV8R9jheoIYVOmq5DsgShzSB2Y7puaKQ6NZ8wUBvsNef+jAJtpz0GzXI1HZBdcQjikoHBr+kwen2hp6i5xTxdIEjvaf86y+zyblO7Dj7+JNNkwU4TOePxXsZFsQHa5LDy7P1GMuzAx9pbFVQa24bKRx54leEowkcl2S8kuflD8Wmm71DCc/5oIcFTwjShzdTJ6LofZP3wioBkV3MG8r7D6EZKq/8VbiUxZK1mA+jlVAEav8WzSjJ5eWcIuAazPDsyEBH4Yfj0yGfbwvBvDYh3/fnTEPnSKiJqWgBble738wUpDmW/2Pw7PquPbe9eCxfzrEjCGqaT0JAm1K15X6+D5MFR2hTkrLQ9hjDLqGLZAe33Bx8rFo4ZZUI3bE5wwLh0dg36mqyFEhbVmK44KAVyDUZmEV9yDC9idoz4aPRaBc1xcmRJYsKBX9M6byUv69hRemurVUXuyY6QiSXt6fwLBmBDhP0L+ybKhVx77BgVr6EvJWpq6j2gb8ZV95f5/8TGmifzRK2CQI3JYstlT8JG/jAbJtPzCPVTTNDDNwLquVg9JxbnrglFsTnDM/s8iDVpmf6wZTAJGa9vK4L4sgFvKURze9/FqXXy+40GE/om5lp7mcLL3wxtbUT5SKAM+1AJi0Ncfee6zpZ2mn41mFGD7v8eaqbWawfa4MGHquQ483r+ lKfBxB7r cZpJ6Tcl8wtExAmmK7HmToSEPZDbpNqcLqz8VjbxulAT3apEGkRl/bdI2mdkwLRJ79qC/dWyHkf01yNRhuINMvLZRsNyylQ29tBw9dt3g5RViGEJorzzmREzvsUDgoBsAXNIjYBlGwBNg9QbBGJbUhcoS0gDEXWs3k5zwT4Pm/3aEz42qERiS0HG7x4qEIWti29d+Nc5J9MsJ9HgVbHEs5YA6JSTq0NjJqjawWDQ6cUewLg9VpZMvKWbJylT3wlz3TbihEPODg6XHAetarll5y3cjWzes9soBhFbECXSiM2AcfQ/jNDMmvQLuCJglRF4fklPs89dgnYdj/EingOEPDNzLkQhVHxN9a9Pv8cowotW3uaf608xJDpFZKYZQ5mjoPfjSCDXGpM6IZrjzhbMt9ETjfcyc1amvzorMVffqiyMr6cA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000025, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05.04.23 21:42, Dan Williams wrote: > Matthew Wilcox wrote: >> On Tue, Apr 04, 2023 at 09:48:41PM -0700, Dan Williams wrote: >>> Kyungsan Kim wrote: >>>> We know the situation. When a CXL DRAM channel is located under ZONE_NORMAL, >>>> a random allocation of a kernel object by calling kmalloc() siblings makes the entire CXL DRAM unremovable. >>>> Also, not all kernel objects can be allocated from ZONE_MOVABLE. >>>> >>>> ZONE_EXMEM does not confine a movability attribute(movable or unmovable), rather it allows a calling context can decide it. >>>> In that aspect, it is the same with ZONE_NORMAL but ZONE_EXMEM works for extended memory device. >>>> It does not mean ZONE_EXMEM support both movability and kernel object allocation at the same time. >>>> In case multiple CXL DRAM channels are connected, we think a memory consumer possibly dedicate a channel for movable or unmovable purpose. >>>> >>> >>> I want to clarify that I expect the number of people doing physical CXL >>> hotplug of whole devices to be small compared to dynamic capacity >>> devices (DCD). DCD is a new feature of the CXL 3.0 specification where a >>> device maps 1 or more thinly provisioned memory regions that have >>> individual extents get populated and depopulated by a fabric manager. >>> >>> In that scenario there is a semantic where the fabric manager hands out >>> 100G to a host and asks for it back, it is within the protocol that the >>> host can say "I can give 97GB back now, come back and ask again if you >>> need that last 3GB". >> >> Presumably it can't give back arbitrary chunks of that 100GB? There's >> some granularity that's preferred; maybe on 1GB boundaries or something? > > The device picks a granularity that can be tiny per spec, but it makes > the hardware more expensive to track in small extents, so I expect > something reasonable like 1GB, but time will tell once actual devices > start showing up. It all sounds a lot like virtio-mem using real hardware [I know, there are important differences, but for the dynamic aspect there are very similar issues to solve] Fir virtio-mem, the current best way to support hotplugging of large memory to a VM to eventually be able to unplug a big fraction again is using a combination of ZONE_MOVABLE and ZONE_NORMAL -- "auto-movable" memory onlining policy. What's online to ZONE_MOVABLE can get (fairly) reliably unplugged again. What's onlined to ZONE_NORMAL is possibly lost forever. Like (incrementally) hotplugging 1 TiB to a 4 GiB VM. Being able to unplug 1 TiB reliably again is pretty much out of scope. But the more memory we can reliably get back the better. And the more memory we can get in the common case, the better. With a ZONE_NORMAL vs. ZONE_MOVABLE ration of 1:3 on could unplug ~768 GiB again reliably. The remainder depends on fragmentation on the actual system and the unplug granularity. The original plan was to use ZONE_PREFER_MOVABLE as a safety buffer to reduce ZONE_NORMAL memory without increasing ZONE_MOVABLE memory (and possibly harming the system). The underlying idea was that in many setups that memory in ZONE_PREFER_MOVABLE would not get used for unmovable allocations and it could, therefore, get unplugged fairly reliably in these setups. For all other setups, unmmovable allocations could leak into ZONE_PREFER_MOVABLE and reduce the number of memory we could unplug again. But the system would try to keep unmovable allocations to ZONE_NORMAL, so in most cases with some ZONE_PREFER_MOVABLE memory we would perform better than with only ZONE_NORMAL. -- Thanks, David / dhildenb