From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 378D1C25B50 for ; Mon, 23 Jan 2023 09:58:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0E716B0071; Mon, 23 Jan 2023 04:58:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ABECE6B0072; Mon, 23 Jan 2023 04:58:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 960556B0073; Mon, 23 Jan 2023 04:58:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 86E666B0071 for ; Mon, 23 Jan 2023 04:58:57 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 54B191C5E5D for ; Mon, 23 Jan 2023 09:58:57 +0000 (UTC) X-FDA: 80385615114.03.4EBE80C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 30F981C0006 for ; Mon, 23 Jan 2023 09:58:54 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gbFvHiQ0; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674467934; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xqxVJDgcGhaPuWMeQPrfnb0idJyOysfRMbpPEBhyxTU=; b=4d1kp0S5bh02NO7j2lIC6/q50FxXUU7g+CoAiWAIn/rM2A346K6y9Xo+HrkkhquKD5o1pV /QSIw72tVrfIENP2sP2kzU6zr45QGVOXP1ZdvCOOSD8+VbcyEhtS4ULWxJiEwP38BYOIsG BLKODsRN965zQefbs9I9zXBrUNYj5JI= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gbFvHiQ0; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674467934; a=rsa-sha256; cv=none; b=chWkIpdcsjq/HKa4sL8qPDh5PU8D00BoFhtaCEwgwYGHvJhvSARbKfb7roYKhzjCDkR8VO 7KlHI1+wZBoVovb9xWYp+8vZrX9b/TtXP7ID8DAalHHRKrq/o4jzl/AL7AZoxgc5pZnD19 g2009BcJaoKpZipfTd04RGhUAaspA+E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674467933; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xqxVJDgcGhaPuWMeQPrfnb0idJyOysfRMbpPEBhyxTU=; b=gbFvHiQ06zyjCNWw5hRVTBnQbawrlY375aMPs2VEmoXi8UHDEyr541QOsGs+DZVtqenCrs RCqbjpZvaz+hGoSFQFpw7sM9OKALpzY0S5mrBHfHIk27qAABUROecEIznMg4KWxUUa9PV3 wxVnGezrRvw74SHTrhe/utklur854y8= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-376-y9cI2vHkPO6N93gEYtnKHw-1; Mon, 23 Jan 2023 04:58:52 -0500 X-MC-Unique: y9cI2vHkPO6N93gEYtnKHw-1 Received: by mail-wr1-f71.google.com with SMTP id e29-20020adf9bdd000000b002bb0d0ea681so1858386wrc.20 for ; Mon, 23 Jan 2023 01:58:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xqxVJDgcGhaPuWMeQPrfnb0idJyOysfRMbpPEBhyxTU=; b=1P1LNOp2lakot4nSA3grZHyXomONg+kKY6dTcE31/dRBnPAlyCyXCZ/XgDhqu5wdhh D0PDuw3FkzpaNW3TX2NkdRFu2wfmsDEVRO08usvzORHa5tIVQVlf7fHwJp70Ibhrl5FT 1Dzy+FL6cJPuwFEKPruxXEBucMhV0889iy5gJYo2AduRuZAHR3X9tnGneM0AKHcoLnOG kTaGNN1D3DO+IXrg4qr0JPvp69vul5+yiZMS0+Ur1izJvQFVhSN+6nWCqvVh57iio8Ph Dw8PyrMjAznIxAKeDWBEqq1RoJS3G1yNB1KHhthLkUqzoGmDbye71Hrhq+skDgeO6+iv kY+g== X-Gm-Message-State: AFqh2kpIfVUwUemf397UF8FjVrcBojBveGUGtveONTl8UncQv1LfdW6Q aX7qSn+KlhouPrgpy+aW8XOAjJg0Cgx4Fi5gPCIEOqKcR0myh5Ky3WweLtzKu43dYtfwf8EUuYJ 7kWSI5QwcQ0k= X-Received: by 2002:adf:f6cf:0:b0:2bc:858a:3def with SMTP id y15-20020adff6cf000000b002bc858a3defmr29393206wrp.5.1674467930997; Mon, 23 Jan 2023 01:58:50 -0800 (PST) X-Google-Smtp-Source: AMrXdXstu8PcgHyGx2MksVilHw4EWMLVNPr4EaFuPknof9TNabDZ9AHTOwZsX8ISDp5PVXt4YaBMkQ== X-Received: by 2002:adf:f6cf:0:b0:2bc:858a:3def with SMTP id y15-20020adff6cf000000b002bc858a3defmr29393191wrp.5.1674467930702; Mon, 23 Jan 2023 01:58:50 -0800 (PST) Received: from ?IPV6:2003:cb:c704:1100:65a0:c03a:142a:f914? (p200300cbc704110065a0c03a142af914.dip0.t-ipconnect.de. [2003:cb:c704:1100:65a0:c03a:142a:f914]) by smtp.gmail.com with ESMTPSA id l14-20020a5d526e000000b0028e55b44a99sm24225816wrc.17.2023.01.23.01.58.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 23 Jan 2023 01:58:50 -0800 (PST) Message-ID: Date: Mon, 23 Jan 2023 10:58:48 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: Sudarshan Rajagopalan , Johannes Weiner , Suren Baghdasaryan , Mike Rapoport , Oscar Salvador , Anshuman Khandual , mark.rutland@arm.com, will@kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-arm-msm@vger.kernel.org Cc: "Trilok Soni (QUIC)" , "Sukadev Bhattiprolu (QUIC)" , "Srivatsa Vaddagiri (QUIC)" , "Patrick Daly (QUIC)" References: <072de3f4-6bd3-f9ce-024d-e469288fc46a@quicinc.com> <2faf67fe-b1df-d110-6d57-67f284cd5584@quicinc.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory In-Reply-To: <2faf67fe-b1df-d110-6d57-67f284cd5584@quicinc.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 30F981C0006 X-Rspam-User: X-Stat-Signature: ua6eqsonaurukyjqs16gwn1jujjaoyg8 X-HE-Tag: 1674467934-242144 X-HE-Meta: U2FsdGVkX1+ptnel1CW6FjoyUL5cbod2oiMEmxbFa3sQHcRXK5xRlp1ZP/3A3QivLFPXQzd9P2OfDii80a5qgEVUovOW1w8LwZgfZAzX7M1QN9El9jD4D+LiPAHv81gHl3kpqeJyptvUEpwVg5jEkpDmzDIiDpmyDjR5BP4Y5B42lSBy+NREVYG7+RPrSBBkHZlD07dGaA/04NeiNnste+Z7NXTzyxWXogYBHoYekr5sQHmYC0J7ezOIIDGg4hI+EoawhsGOFkhbC+JfIxxgJRjkbmNEbQbtZcCbHTphYGF053xCk7SIDXzcEJxK5mrWO5K797ywytP2asv+RNIS7vSAIMCDNwW++POVoLbs9TgnVaMQVyq/SxcFs8lAXdWGdT81+M9t+EQi+0xlzHR1DC87dL9ANWYWM/S1awwf8AbcH37VQ2XffO4daFRNXFEc8Lmg5Tbga+MOoD7tl4nsPWAwaNz8OQ4lXt5KXlSsG+lAJRUIi8IgplP3WgsOcOxYeO6tva6G/oOBy6XN2LUybYQBjS94UNumNYCfKbQWg/1giQFLozhuNs+/IrnLX9ljln1jZLsGB2zVNiMavxVJHCyWA2IKcDITWhfd6IdrBFNY99M1xPSf9cKctbeQ4YjdPOl5RD8Q4WqcrGcuxR4BI+4/J1m1+lpNpUsyg1L8K5FObzL1FZaR3XIcbNHlSVp3i5y6Wjh7njJsua/psTpXKVqFkdJxevhjdXSbP91ScBvMyG6Tkoqj0OCRx5U33QJsm84mPhIxNSA/PaIar2yFy0MRKiFHrwbS20QeeY64eEvuDOKaf6iHUaGr+DS0Nv3WBa8yAXs/MDtrlbrZe0Gjqide8PP2KKcRZnIFkKtXLrCs4ADFkiPriigFNWqC+o4iivNqmm62S9oxxo1pHnGQJiUkYkaGHbitXbLcFUllEWyZIoYUb8JgiFJSV94dUNT6Ock5HgS2F9QUtypovFN KhsEmJgW KIXglz4LEcDWRnYHz7BlFT1vzWhQQ+gTB16qNoA2A6r8FgOn5871FWpPCfwmut+3mKoyCMUMEt37xlThJ+j50rh3VRI8QT4xRSCOwX97VlLetrTpNwEyRpy+PMPdYrzBTRI0N5Ol9xoWSSV5qQLVV+DsDTBhgIh1wUDCEFWQQOnGaouyzgrxMBJ/0ePupye8AeZA84gZQgAuh7fWbXqyZAS4O82QSLgU8xNdzVoDvBy+/UFUEBAMMpvH3nvv67u+w1Jt3/+i6rrjevSFgOKiT01TbMvsndyw2PWvyV6JC/cBIx5gdPAyDEYJesw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >>> >>> 1. This will be a native userspace daemon that will be running only in >>> the Linux VM which will use virtio-mem driver that uses memory hotplug >>> to add/remove memory. The VM (aka Secondary VM, SVM) will request for >>> memory from the host which is Primary VM, PVM via the backend hypervisor >>> which takes care of cross-VM communication. >>> >>> 2. This will be guest driver. This daemon will use PSI mechanism to >>> monitor memory pressure to keep track of memory demands in the system. >>> It will register to few memory pressure events and make an educated >>> guess on when demand for memory in system is increasing. >> >> Is that running in the primary or the secondary VM? > > The userspace PSI daemon will be running on secondary VM. It will talk > to a kernel driver (running on secondary VM itself) via ioctl. This > kernel driver will talk to slightly modified version of virtio-mem > driver where it can call the virtio_mem_config_changed(virtiomem_device) > function for resizing the secondary VM. So its mainly "guest driven" now. Okay, thanks. [...] >>> >>> This daemon is currently in just Beta stage now and we have basic >>> functionality running. We are yet to add more flesh to this scheme to >> >> Good to hear that the basics are running with virtio-mem (I assume :) ). >> >>> make sure any potential risks or security concerns are taken care as >>> well. >> >> It would be great to draw/explain the architecture in more detail. > > We will be looking into solving any potential security concerns where > hypervisor would restrict few actions of resizing of memory. Right now, > we are experimenting to see if PSI mechanism itself can be used for ways > of detecting memory pressure in the system and add memory to secondary > VM when memory is in need. Taking into account all the latencies > involved in the PSI scheme (i.e. time when one does malloc call till > when extra memory gets added to SVM system). And wanted to know > upstream's opinion on such a scheme using PSI mechanism for detecting > memory pressure and resizing SVM accordingly. One problematic thing is that adding memory to Linux by virtio-mem eventually consumes memory (e.g., the memmap), especially when having to to add a completely new memory block to Linux. So if you're already under severe memory pressure, these allocations to bring up new memory can fail. The question is, if PSI can notify "early" enough such that this barely happens in practice. There are some possible ways to mitigate: 1) Always keep spare memory blocks by virtio-mem added to Linux, that don't expose any memory yet. Memory from these block can be handed over to Linux without additional Linux allocations. Of course, they consume metadata, so one might want to limit them. 2) Implement memmap_on_memory support for virtio-mem. This might help in some setups, where the device block size is suitable. Did you run into that scenario already during your experiments, and how did you deal with that? -- Thanks, David / dhildenb