From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8812C12002 for ; Wed, 21 Jul 2021 13:39:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6BA8360FF4 for ; Wed, 21 Jul 2021 13:39:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6BA8360FF4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 059276B0033; Wed, 21 Jul 2021 09:39:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 00AFB6B0071; Wed, 21 Jul 2021 09:39:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E134D6B0072; Wed, 21 Jul 2021 09:39:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0208.hostedemail.com [216.40.44.208]) by kanga.kvack.org (Postfix) with ESMTP id C5F606B0033 for ; Wed, 21 Jul 2021 09:39:55 -0400 (EDT) Received: from forelay.prod.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by fograve04.hostedemail.com (Postfix) with ESMTP id F2B18243A1 for ; Wed, 21 Jul 2021 10:02:04 +0000 (UTC) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 90000211DB for ; Wed, 21 Jul 2021 10:02:04 +0000 (UTC) X-FDA: 78386154168.29.BA98652 Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by imf28.hostedemail.com (Postfix) with ESMTP id 18D5A9009F1C for ; Wed, 21 Jul 2021 10:02:03 +0000 (UTC) Received: by mail-lf1-f50.google.com with SMTP id s13so2278984lfi.12 for ; Wed, 21 Jul 2021 03:02:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=MiQvc2L4nWsOsT8WGwoykJFvMgwXVPW9/HnWKa+est0=; b=aTJK2fXS52O17YRbhxVV2w5b+pncDOm6pIgVXt8jr9xU1jFz5kLqvwMUzFhTSfZmnv H46yHtq9R8o4tXjrc3iPd0K6TqtuIWGx9HTr/Q30m0e+nZjRYXOWP9/p1aGvjLFDv3f/ 57w79iXds/E7L3a38b3kuRLxLhST6WfcZVneLb/cl34N5nQSCmRc7RwW5yGv5Q4t82Ts JsGgLGy6qdV/dDVZauRNaTCxLF7XKJmtX52IdDeGUXmYQKDFrXr2lk0dzP+frlivBbKk NMH0aS/zF9dxmyU/nQgC1rkObQra+hIE2tdUyCxyGybkxLSq3AxDcaULquLO8AkAn9UQ g4Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=MiQvc2L4nWsOsT8WGwoykJFvMgwXVPW9/HnWKa+est0=; b=npyqsCd7rRcy6D8ODFcvUB/i/p8AR8w7YmT7ThZPNjcV0veGxvGLsakUfW+jrvO/7D +A7RbhxIpdxBiGiUde1TfLXFcON3UkUaZUFCuSK0NbtqHtvvWDvy11+AC8H6IDLtk+z+ tu+xB49yMKYMLERUs9b5eINgKtTDxNZ1ifCJcgYNG3e+7VYHDFcWG6Klm8hY0I9Vjzeb IKAaIYkQJ2hW4pFQD+5hQvXz9K6HYVZRcqcwsu9u5b2FDCGs2bj7v79g5I8PFQbEoXSF o73QENqdAhnNtS/f7ROk7K2t7JMSf8eUNpY8hQ3cZ1B6uLz7r3wP6ZcxnX2juO0WnPyV /lUg== X-Gm-Message-State: AOAM531z9Uaj2nZ387pOECPFq5kBhXMof4RaSW9v380jpcWoCAWDmM5K OMxNH89xdySJir9s6iiIhA6yGvRb4oecmF4y X-Google-Smtp-Source: ABdhPJzzAHu/iJcjc4hbXc4sBbAEmv71knVK9aDXNomPYl/cPJMCqArm444ZFwclMVIayKqEO8UKKA== X-Received: by 2002:a05:6512:512:: with SMTP id o18mr24382404lfb.452.1626861722315; Wed, 21 Jul 2021 03:02:02 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id h1sm1711629lfk.187.2021.07.21.03.02.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Jul 2021 03:02:01 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id D80E21029B9; Wed, 21 Jul 2021 13:02:06 +0300 (+03) Date: Wed, 21 Jul 2021 13:02:06 +0300 From: "Kirill A. Shutemov" To: Mike Rapoport Cc: Joerg Roedel , David Rientjes , Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Vlastimil Babka , "Kirill A. Shutemov" , Andi Kleen , Brijesh Singh , Tom Lendacky , Jon Grimm , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , "Kaplan, David" , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev Subject: Re: Runtime Memory Validation in Intel-TDX and AMD-SNP Message-ID: <20210721100206.mfldptiwiothowpz@box> References: <20210720173004.ucrliup5o7l3jfq3@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 18D5A9009F1C X-Stat-Signature: dq8fzf6ft1i5e1c4ytnaodqypn8gbm43 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=aTJK2fXS; dmarc=none; spf=none (imf28.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.167.50) smtp.mailfrom=kirill@shutemov.name X-HE-Tag: 1626861723-223184 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 21, 2021 at 12:20:17PM +0300, Mike Rapoport wrote: > On Tue, Jul 20, 2021 at 08:30:04PM +0300, Kirill A. Shutemov wrote: > > On Mon, Jul 19, 2021 at 02:58:22PM +0200, Joerg Roedel wrote: > > > Hi, > > > > > > I'd like to get some movement again into the discussion around how to > > > implement runtime memory validation for confidential guests and wrote up > > > some thoughts on it. > > > Below are the results in form of a proposal I put together. Please let > > > me know your thoughts on it and whether it fits everyones requirements. > > > > Thanks for bringing it up. I'm working on the topic for Intel TDX. See > > comments below. > > > > > > > > Thanks, > > > > > > Joerg > > > > > > Proposal for Runtime Memory Validation in Secure Guests on x86 > > > ============================================================== > > [ snip ] > > > > 8. When memory is returned to the memblock or page allocators, > > > it is _not_ invalidated. In fact, all memory which is freed > > > need to be valid. If it was marked invalid in the meantime > > > (e.g. if it the memory was used for DMA buffers), the code > > > owning the memory needs to validate it again before freeing > > > it. > > > > > > The benefit of doing memory validation at allocation time is > > > that it keeps the exception handler for invalid memory > > > simple, because no exceptions of this kind are expected under > > > normal operation. > > > > During early boot I treat unaccepted memory as a usable RAM. It only > > requires special treatment on memblock_reserve(), which used for early > > memory allocation: unaccepted usable RAM has to be accepted, before > > reserving. > > memblock_reserve() is not always used for early allocations and some of the > early allocations on x86 don't use memblock at all. Do you mean any codepath in particular? > Hooking > validation/acceptance to memblock_reserve() should be fine for PoC but I > suspect there will be caveats for production. That's why I do PoC. Will see. So far so good. Maybe it will be visible with smaller pre-accepted memory size. > > For fine-grained accepting/validation tracking I use PageOffline() flags > > (it's encoded into mapcount): before adding an unaccepted page to free > > list I set the PageOffline() to indicate that the page has to be accepted > > before returning from the page allocator. Currently, we never have > > PageOffline() set for pages on free lists, so we won't have confusion with > > ballooning or memory hotplug. > > > > I try to keep pages accepted in 2M or 4M chunks (pageblock_order or > > MAX_ORDER). It is reasonable compromise on speed/latency. > > Keeping fine grained accepting/validation information in the memory map > means it cannot be reused across reboots/kexec and there should be an > additional data structure to carry this information. It could be the same > structure that is used by firmware to inform kernel about usable memory, > just it needs to live after boot and get updates about new (in)validations. > Doing those in 2M/4M chunks will help to prevent this structure from > exploding. Yeah, we would need to reconstruct the EFI map somehow. Or we can give most of memory back to the host and accept/validate the memory again after reboot/kexec. I donno. > BTW, as Dave mentioned, the deferred struct page init can also take care of > the validation. That was my first thought too and I tried it just to realize that it is not what we want. If we would accept page on page struct init it means we would make host allocate all memory assigned to the guest on boot even if guest actually use small portion of it. Also deferred page init only allows to scale validation across multiple CPUs, but doesn't allow to get to userspace before we done with it. See wait_for_completion(&pgdat_init_all_done_comp). -- Kirill A. Shutemov