From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F829C12002 for ; Wed, 21 Jul 2021 14:04:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1A2AB61248 for ; Wed, 21 Jul 2021 14:04:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A2AB61248 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 961B66B0072; Wed, 21 Jul 2021 10:04:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 911D06B0075; Wed, 21 Jul 2021 10:04:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D96B6B007B; Wed, 21 Jul 2021 10:04:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0109.hostedemail.com [216.40.44.109]) by kanga.kvack.org (Postfix) with ESMTP id 609636B0072 for ; Wed, 21 Jul 2021 10:04:56 -0400 (EDT) Received: from forelay.prod.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by fograve04.hostedemail.com (Postfix) with ESMTP id 538901CB0A for ; Wed, 21 Jul 2021 10:25:34 +0000 (UTC) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F30C51BCAB for ; Wed, 21 Jul 2021 10:25:33 +0000 (UTC) X-FDA: 78386213346.12.F551BC6 Received: from mail-lj1-f169.google.com (mail-lj1-f169.google.com [209.85.208.169]) by imf11.hostedemail.com (Postfix) with ESMTP id 9CA26F0005BF for ; Wed, 21 Jul 2021 10:25:33 +0000 (UTC) Received: by mail-lj1-f169.google.com with SMTP id t3so2365243ljc.3 for ; Wed, 21 Jul 2021 03:25:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=kie/urE5nx66pvhPwWJ6T3paQR9EnJexO1ZP+ZKXMvU=; b=UKu4TjZkooEj6xBeuKNUqG+Hg7EMTK6phxhF6kj6m3BsAk3ju+KgnTcecNnDjnthL1 YxwURZtTjUEeYcQMDfe/m8+ErLFBqJuhN0Iy5ajHCUC3sWALZMiuf0nuFKok2f+sQ1Dq fc0CK0PDRX+En3OsOQE/GlwQoI5w04ScpJx4nTDEm7lkYjaA3vnJ+kbRc0kGEDTo4bvA S2d+otPlQFML807l91B8qsx1GsjlcQgARpTdOvdut11MpYpOZKPE2PrYmMKnZSYCL0LZ nyQB70/BOyoNbYP6IOF7eV+QjFZ5kMORq573BloAfsD9zsgh0q2B/41bxMGazSuLUJw1 d2QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=kie/urE5nx66pvhPwWJ6T3paQR9EnJexO1ZP+ZKXMvU=; b=iK3A3o/yesqHzaq9OThYRvKS8VIIHEGRZl2YPVJ0qhJVl4yFHRsWBWtqRjTGqnyEgf ciXkpowf5By29WfXKAkqXp7XZikCylIt/YKPnoH5AQeiLVb0k/vYWr+5g1HlhLibwRq+ oz8StsQ/nMssUz4bQ5Qay/70icozNMFnARJ971kY2t4SvGSJrXy6F1UhSLBiah8iB4Mi 2SWR0V7rwkjiSygwxVb/cMHiOopmQH3fe3iXVV04JMVbyrLOAxSRJlJaoMgNu94NwntJ M0gQssIr5/vuny3BuWdzFOjz/AVedfIsmDJdgMispffrTPkdg8n9ZWXpGiTdkydCUmdc uwdg== X-Gm-Message-State: AOAM5307btJkRvaVYQmyBDRtMpVXtQhyRVxh10ZIx+RE4jXVisLxcphh OkZD0wFdvouLxW5LboFoo2M60w== X-Google-Smtp-Source: ABdhPJy0uiUGoQ0STqsv0iCi3yztbn3e/eWxYlftVeSb5gn2Qe562YoOVcbfAyEtwve0a7riseB1gA== X-Received: by 2002:a2e:6c15:: with SMTP id h21mr30419875ljc.321.1626863132247; Wed, 21 Jul 2021 03:25:32 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id u24sm1714402lfc.162.2021.07.21.03.25.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Jul 2021 03:25:31 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id E7F9D1029B9; Wed, 21 Jul 2021 13:25:36 +0300 (+03) Date: Wed, 21 Jul 2021 13:25:36 +0300 From: "Kirill A. Shutemov" To: Joerg Roedel Cc: David Rientjes , Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Vlastimil Babka , "Kirill A. Shutemov" , Andi Kleen , Brijesh Singh , Tom Lendacky , Jon Grimm , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , "Kaplan, David" , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev Subject: Re: Runtime Memory Validation in Intel-TDX and AMD-SNP Message-ID: <20210721102536.aaaamqd5cdvbvce2@box> References: <20210720173004.ucrliup5o7l3jfq3@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9CA26F0005BF X-Stat-Signature: q7mfaghpnesnrrjmetqcfpwtroeice7h Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=UKu4TjZk; spf=none (imf11.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.208.169) smtp.mailfrom=kirill@shutemov.name; dmarc=none X-HE-Tag: 1626863133-464603 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 21, 2021 at 11:25:25AM +0200, Joerg Roedel wrote: > Hi Kirill, > > On Tue, Jul 20, 2021 at 08:30:04PM +0300, Kirill A. Shutemov wrote: > > On Mon, Jul 19, 2021 at 02:58:22PM +0200, Joerg Roedel wrote: > > We use EFI unaccepted memory type to pass this information between > > firmware and kernel. In my WIP patch I translate it to a new E820 memory > > type: E820_TYPE_UNACCEPTED. > > Yeah, that is what I meant with a new E820 entry type. > > > E820 can also be used during early boot for tracking what memory got > > accepted by kernel too. > > Won't this get very fragmented? How do you handle overlaps with other > E820 regions? I modify E820 as needed: e820__range_update(start, end, E820_TYPE_UNACCEPTED, E820_TYPE_RAM); I also ask memblock for bottom-up allocation as it helps with using per-accepted pages first and reduces fragmentation: memblock_set_bottom_up(true); > > For now, I debug with 256MiB accepted by firmware. It allows to avoid > > dealing with decompression code at this stage of the project. I plan to > > lower the number later. > > Yes, this can be experimented with, the proposal allows a custom amount > of memory to be pre-validated/accepted. > > > I would argue for per-range, not per-page, tracking of accepted/validated > > memory for decompresser and early boot code, until page allocator is fully > > functional. I have reasonable success with this approach so far. > > What do you mean by 'reasonable' success? It appears to work fine with 256MiB of pre-accepted memory, but more testing is required. > Especially, how robust is that against unrelated changes to the boot > code? As with SEV-SNP, I guess there will be no broad testing of > unrelated kernel changes in a TDX environment, so some robustness is key > to keep things working. Hard to say. Let me get the prototype functional first. It's easier to discuss with code on hands. > > During early boot I treat unaccepted memory as a usable RAM. It only > > requires special treatment on memblock_reserve(), which used for early > > memory allocation: unaccepted usable RAM has to be accepted, before > > reserving. > > What happens before memblock is active, say in the decompressor. Will > unaccepted memory be considered for KASLR placement? I tried to postpone thinking about decompresser as long as possible :P I guess we need pass down information about memory accepted in decompresser to the main kernel so it can record in E820. I think it will a single range. > > For fine-grained accepting/validation tracking I use PageOffline() flags > > (it's encoded into mapcount): before adding an unaccepted page to free > > list I set the PageOffline() to indicate that the page has to be accepted > > before returning from the page allocator. Currently, we never have > > PageOffline() set for pages on free lists, so we won't have confusion with > > ballooning or memory hotplug. > > Okay, I think that could also easily break with unrelated memory > management changes, but should work for now in TDX. > > > I try to keep pages accepted in 2M or 4M chunks (pageblock_order or > > MAX_ORDER). It is reasonable compromise on speed/latency. > > Makes sense, SEV-SNP will likely do something similar. > > > I'm not sure a bitmap is needed. I hope we can use E820 for early > > tracking. But let's see if it works. > > We should find a solution which works for TDX and SNP, given that the > required changes are intrusive and that it is much easier to just > support one way to handle this. > > That said, the Validation Bitmap has a clear benefit for SEV-SNP in that > it makes it trivial to support kexec/kdump scenarios. Further the > bitmap makes it trivial to transport the information through the whole > boot process. It also won't be big, SNP (and I think TDX too) would > be okay with one bit per 4k page, so the bitmap would need 32kb of > memory per GB of guest RAM. Yes, the bitmap is small, but it going to be rather hot structure. It has to be consulted on every page allocation, right? How to do plan to make bitmap scalable? What the locking rules around it? > And keeping the information separate from struct page will make the code > more robust against unrelated code changes. -- Kirill A. Shutemov