From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D7BFC636C9 for ; Wed, 21 Jul 2021 14:00:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3DB1A61241 for ; Wed, 21 Jul 2021 14:00:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3DB1A61241 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D19058D0001; Wed, 21 Jul 2021 10:00:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC0A16B0080; Wed, 21 Jul 2021 10:00:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B3B198D0001; Wed, 21 Jul 2021 10:00:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0020.hostedemail.com [216.40.44.20]) by kanga.kvack.org (Postfix) with ESMTP id 906906B007D for ; Wed, 21 Jul 2021 10:00:30 -0400 (EDT) Received: from forelay.prod.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by fograve02.hostedemail.com (Postfix) with ESMTP id CB6DB23E42 for ; Wed, 21 Jul 2021 10:23:10 +0000 (UTC) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6DF821801C36E for ; Wed, 21 Jul 2021 10:23:10 +0000 (UTC) X-FDA: 78386207340.12.63EA32E Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf07.hostedemail.com (Postfix) with ESMTP id 0FB6E100BC2E for ; Wed, 21 Jul 2021 10:23:09 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 9097B610F7; Wed, 21 Jul 2021 10:23:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1626862988; bh=Wb597ASYs1DZYLZWGbZMlny2FEpE9Alkatv9sjs/NOY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=m319YrnVagHucetjsbElEITvwcLKdCSfufuK0V023/6UAMPKVPY9jiZAZY+EA8ItA 3EZ2+0slJfmALSoHPpCNygBXNzCUlh/FXu86jzuiP7+8iLKLkct/kk9/NX0TD2YOxY 6mRJQbkSOOSrm3bcB1P2jMihi7MuXt6IkBeFiTUY/EYoMBDxKJYyO03pKGidm+XkkM edXlP5/TwbetOIv06NXnswM/I+VyxN9KbT72n4C0YvZFLwLOsdXGsWgErfXJ93a7NZ g/h9nnNZvV8b0eAq4sgUxPemiQo0csosdt1gE2j/Et0jjwrtQC0C5iVG39bac8LbCg etXfsyIbhcyCQ== Date: Wed, 21 Jul 2021 13:22:56 +0300 From: Mike Rapoport To: "Kirill A. Shutemov" Cc: Joerg Roedel , David Rientjes , Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Vlastimil Babka , "Kirill A. Shutemov" , Andi Kleen , Brijesh Singh , Tom Lendacky , Jon Grimm , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , "Kaplan, David" , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev Subject: Re: Runtime Memory Validation in Intel-TDX and AMD-SNP Message-ID: References: <20210720173004.ucrliup5o7l3jfq3@box.shutemov.name> <20210721100206.mfldptiwiothowpz@box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210721100206.mfldptiwiothowpz@box> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0FB6E100BC2E X-Stat-Signature: 6dk84t88c6trmmq3z3km5ic3e7intwf6 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=m319YrnV; spf=pass (imf07.hostedemail.com: domain of rppt@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-HE-Tag: 1626862989-811409 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 21, 2021 at 01:02:06PM +0300, Kirill A. Shutemov wrote: > On Wed, Jul 21, 2021 at 12:20:17PM +0300, Mike Rapoport wrote: > > On Tue, Jul 20, 2021 at 08:30:04PM +0300, Kirill A. Shutemov wrote: > > > On Mon, Jul 19, 2021 at 02:58:22PM +0200, Joerg Roedel wrote: > > > > Hi, > > > > > > > > I'd like to get some movement again into the discussion around how to > > > > implement runtime memory validation for confidential guests and wrote up > > > > some thoughts on it. > > > > Below are the results in form of a proposal I put together. Please let > > > > me know your thoughts on it and whether it fits everyones requirements. > > > > > > Thanks for bringing it up. I'm working on the topic for Intel TDX. See > > > comments below. > > > > > > > > > > > Thanks, > > > > > > > > Joerg > > > > > > > > Proposal for Runtime Memory Validation in Secure Guests on x86 > > > > ============================================================== > > > > [ snip ] > > > > > > 8. When memory is returned to the memblock or page allocators, > > > > it is _not_ invalidated. In fact, all memory which is freed > > > > need to be valid. If it was marked invalid in the meantime > > > > (e.g. if it the memory was used for DMA buffers), the code > > > > owning the memory needs to validate it again before freeing > > > > it. > > > > > > > > The benefit of doing memory validation at allocation time is > > > > that it keeps the exception handler for invalid memory > > > > simple, because no exceptions of this kind are expected under > > > > normal operation. > > > > > > During early boot I treat unaccepted memory as a usable RAM. It only > > > requires special treatment on memblock_reserve(), which used for early > > > memory allocation: unaccepted usable RAM has to be accepted, before > > > reserving. > > > > memblock_reserve() is not always used for early allocations and some of the > > early allocations on x86 don't use memblock at all. > > Do you mean any codepath in particular? I don't have examples handy, but in general there are calls to e820__range_update() that make memory !RAM and it never gets into memblock. On the other side, memblock_reserve() can be called to reserve memory owned y firmware that may be already accepted. > > Hooking > > validation/acceptance to memblock_reserve() should be fine for PoC but I > > suspect there will be caveats for production. > > That's why I do PoC. Will see. So far so good. Maybe it will be visible > with smaller pre-accepted memory size. Maybe some of my concerns only apply to systems with BIOSes weirder than usual and for VMs all would be fine. I'd suggest to experiment with "memmap=" to manually assign various e820 types to memory chunks to see if there are any strange effects. > > > For fine-grained accepting/validation tracking I use PageOffline() flags > > > (it's encoded into mapcount): before adding an unaccepted page to free > > > list I set the PageOffline() to indicate that the page has to be accepted > > > before returning from the page allocator. Currently, we never have > > > PageOffline() set for pages on free lists, so we won't have confusion with > > > ballooning or memory hotplug. > > > > > > I try to keep pages accepted in 2M or 4M chunks (pageblock_order or > > > MAX_ORDER). It is reasonable compromise on speed/latency. > > > > Keeping fine grained accepting/validation information in the memory map > > means it cannot be reused across reboots/kexec and there should be an > > additional data structure to carry this information. It could be the same > > structure that is used by firmware to inform kernel about usable memory, > > just it needs to live after boot and get updates about new (in)validations. > > Doing those in 2M/4M chunks will help to prevent this structure from > > exploding. > > Yeah, we would need to reconstruct the EFI map somehow. Or we can give > most of memory back to the host and accept/validate the memory again after > reboot/kexec. I donno. > > > BTW, as Dave mentioned, the deferred struct page init can also take care of > > the validation. > > That was my first thought too and I tried it just to realize that it is > not what we want. If we would accept page on page struct init it means we > would make host allocate all memory assigned to the guest on boot even if > guest actually use small portion of it. Yep, you are right. > Also deferred page init only allows to scale validation across multiple > CPUs, but doesn't allow to get to userspace before we done with it. See > wait_for_completion(&pgdat_init_all_done_comp). True. -- Sincerely yours, Mike.