From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lVtFN-0066Ix-Bw for kexec@lists.infradead.org; Mon, 12 Apr 2021 09:52:53 +0000 Date: Mon, 12 Apr 2021 17:52:31 +0800 From: Baoquan He Subject: Re: [PATCH] x86/efi: Do not release sub-1MB memory regions when the crashkernel option is specified Message-ID: <20210412095231.GC4282@MiWiFi-R3L-srv> References: <20210412011347.GA4282@MiWiFi-R3L-srv> <8FAA2A0E-0A09-4308-B936-CDD2C0568BAE@amacapital.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <8FAA2A0E-0A09-4308-B936-CDD2C0568BAE@amacapital.net> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: Andy Lutomirski Cc: "H. Peter Anvin" , Lianbo Jiang , linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org, platform-driver-x86@vger.kernel.org, x86@kernel.org, ardb@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dvhart@infradead.org, andy@infradead.org, kexec@lists.infradead.org, dyoung@redhat.com T24gMDQvMTEvMjEgYXQgMDY6NDlwbSwgQW5keSBMdXRvbWlyc2tpIHdyb3RlOgo+IAo+IAo+ID4g T24gQXByIDExLCAyMDIxLCBhdCA2OjE0IFBNLCBCYW9xdWFuIEhlIDxiaGVAcmVkaGF0LmNvbT4g d3JvdGU6Cj4gPiAKPiA+IO+7v09uIDA0LzA5LzIxIGF0IDA3OjU5cG0sIEguIFBldGVyIEFudmlu IHdyb3RlOgo+ID4+IFdoeSBkb24ndCB3ZSBkbyB0aGlzIHVuY29uZGl0aW9uYWxseT8gQXQgdGhl IHZlcnkgYmVzdCB3ZSBnYWluIGhhbGYgYSBtZWdhYnl0ZSBvZiBtZW1vcnkgKGV4Y2VwdCB0aGUg dHJhbXBvbGluZSwgd2hpY2ggaGFzIHRvIGxpdmUgdGhlcmUsIGJ1dCBpdCBpcyBvbmx5IGEgZmV3 IGtpbG9ieXRlcy4pCj4gPiAKPiA+IFRoaXMgaXMgYSBncmVhdCBzdWdnZXN0aW9uLCB0aGFua3Mu IEkgdGhpbmsgd2UgY2FuIGZpeCBpdCBpbiB0aGlzIHdheSB0bwo+ID4gbWFrZSBjb2RlIHNpbXBs ZXIuIFRoZW4gdGhlIHNwZWNpZmljIGNhcmluZyBvZiByZWFsIG1vZGUgaW4KPiA+IGVmaV9mcmVl X2Jvb3Rfc2VydmljZXMoKSBjYW4gYmUgcmVtb3ZlZCB0b28uCj4gPiAKPiAKPiBUaGlzIHdob2xl IHNpdHVhdGlvbiBtYWtlcyBtZSB0aGluayB0aGF0IHRoZSBjb2RlIGlzIGJ1Z2d5IGJlZm9yZSBh bmQgYnVnZ3kgYWZ0ZXIuCj4gCj4gVGhlIGlzc3VlIGhlcmUgKEkgdGhpbmspIGlzIHRoYXQgdmFy aW91cyBwaWVjZXMgb2YgY29kZSB3YW50IHRvIHJlc2VydmUgc3BlY2lmaWMgcGllY2VzIG9mIG90 aGVyd2lzZS1hdmFpbGFibGUgbG93IG1lbW9yeSBmb3IgdGhlaXIgb3duIG5lZmFyaW91cyB1c2Vz LiBJIGRvbuKAmXQga25vdyAqd2h5KiBjcmFzaCBrZXJuZWwgbmVlZHMgdGhpcywgYnV0IHRoYXQg ZG9lc27igJl0IG1hdHRlciB0b28gbXVjaC4KCktkdW1wIGtlcm5lbCBhbHNvIG5lZWQgZ28gdGhy b3VnaCByZWFsIG1vZGUgY29kZSBwYXRoIGR1cmluZyBib290dXAuIEl0CmlzIG5vdCBkaWZmZXJl bnQgdGhhbiBub3JtYWwga2VybmVsIGV4Y2VwdCB0aGF0IGl0IHNraXBzIHRoZSBmaXJtd2FyZQpy ZXNldHRpbmcuIFNvIGtkdW1wIGtlcm5lbCBuZWVkcyBsb3cgMU0gYXMgc3lzdGVtIFJBTSBqdXN0 IGFzIG5vcm1hbAprZXJuZWwgZG9lcy4gSGVyZSB3ZSByZXNlcnZlIHRoZSB3aG9sZSBsb3cgMU0g d2l0aCBtZW1ibG9ja19yZXNlcnZlKCkKdG8gYXZvaWQgYW55IGxhdGVyIGtlcm5lbCBvciBkcml2 ZXIgZGF0YSByZXNpZGUgaW4gdGhpcyBhcmVhLiBPdGhlcndpc2UsCndlIG5lZWQgZHVtcCB0aGUg Y29udGVudCBvZiB0aGlzIGFyZWEgdG8gdm1jb3JlLiBBcyB3ZSBrbm93LCB3aGVuIGNyYXNoCmhh cHBlbmVkLCB0aGUgb2xkIG1lbW9yeSBvZiAxc3Qga2VybmVsIHNob3VsZCBiZSB1bnRvdWNoZWQg dW50aWwgdm1jb3JlCmR1bXBpbmcgcmVhZCBvdXQgaXRzIGNvbnRlbnQuIE1lYW53aGlsZSwga2R1 bXAga2VybmVsIG5lZWQgcmV1c2UgbG93IDFNLgpJbiB0aGUgcGFzdCwgd2UgdXNlZCBhIGJhY2sg dXAgcmVnaW9uIHRvIGNvcHkgb3V0IHRoZSBsb3cgMU0gYXJlYSwgYW5kCm1hcCB0aGUgYmFjayB1 cCByZWdpb24gaW50byB0aGUgbG93IDFNIGFyZWEgaW4gdm1jb3JlIGVsZiBmaWxlLiBJbgo2ZjU5 OWQ4NDIzMWZkMjcgKCJ4ODYva2R1bXA6IEFsd2F5cyByZXNlcnZlIHRoZSBsb3cgMU0gd2hlbiB0 aGUgY3Jhc2hrZXJuZWwKb3B0aW9uIGlzIHNwZWNpZmllZCIpLCB3ZSBjaGFuZ2VkIHRvIGxvY2sg dGhlIHdob2xlIGxvdyAxTSB0byBhdm9pZAp3cml0dGluZyBhbnkga2VybmVsIGRhdGEgaW50bywg bGlrZSB0aGlzIHdlIGNhbiBza2lwIHRoaXMgYXJlYSB3aGVuCmR1bXBpbmcgdm1jb3JlLgoKQWJv dmUgaXMgd2h5IHdlIHRyeSB0byBtZW1ibG9jayByZXNlcnZlIHRoZSB3aG9sZSBsb3cgMU0uIFdl IGRvbid0IHdhbnQKdG8gdXNlIGl0LCBqdXN0IGRvbid0IHdhbnQgYW55b25lIHRvIHVzZSBpdCBp biAxc3Qga2VybmVsLgoKPiAKPiBJIHByb3Bvc2UgdGhhdCB0aGUgcmlnaHQgc29sdXRpb24gaXMg dG8gZ2l2ZSBsb3ctbWVtb3J5LXJlc2VydmluZyBjb2RlIHBhdGhzIHR3byBjaGFuY2VzIHRvIGRv IHdoYXQgdGhleSBuZWVkOiBvbmNlIGF0IHRoZSB2ZXJ5IGJlZ2lubmluZyBhbmQgb25jZSBhZnRl ciBFRkkgYm9vdCBzZXJ2aWNlcyBhcmUgZnJlZWQuCj4gCj4gQWx0ZXJuYXRpdmVseSwganVzdCBy ZXNlcnZlICphbGwqIG90aGVyd2lzZSB1bnVzZWQgc3ViIDFNIG1lbW9yeSB1cCBmcm9udCwgdGhl biByZWxlYXNlIGl0IHJpZ2h0IGFmdGVyIHJlbGVhc2luZyBib290IHNlcnZpY2VzLCBhbmQgdGhl biBpbnZva2UgdGhlIHNwZWNpYWwgY2FzZXMgZXhhY3RseSBvbmNlLgoKSSBhbSBub3Qgc3VyZSBp ZiBJIGdvdCBib3RoIHN1Z2dlc3RlZCB3YXlzIGNsZWFybHkuIFRoZXkgbG9vayBhIGxpdHRsZQpj b21wbGljYXRlZCBpbiBvdXIgY2FzZS4gQXMgSSBleHBsYWluZWQgYXQgYWJvdmUsIHdlIHdhbnQg dGhlIHdob2xlIGxvdwoxTSBsb2NrZWQgdXAsIG5vdCBvbmUgcGllY2Ugb3Igc29tZSBwaWVjZXMg b2YgaXQuCgo+IAo+IEluIGVpdGhlciBjYXNlLCB0aGUgcmVzdWx0IGlzIHRoYXQgdGhlIGNyYXNo a2VybmVsIG1lc3MgZ2V0cyB1bmlmaWVkIHdpdGggdGhlIHRyYW1wb2xpbmUgbWVzcy4gIE9uZSB3 YXkgdGhlIHJlc3VsdCBpcyBjYWxsZWQgdHdpY2UgYW5kIG5lZWRzIHRvIGJlIG1vcmUgY2FyZWZ1 bCwgYW5kIHRoZSBvdGhlciB3YXkgaXTigJlzIGNhbGxlZCBvbmx5IG9uY2UuCj4gCj4gSnVzdCBz a2lwcGluZyBmcmVlaW5nIGJvb3Qgc2VydmljZXMgc2VlbXMgd3JvbmcuICBJdCBkb2VzbuKAmXQg dW5tYXAgYm9vdCBzZXJ2aWNlcywgYW5kIHNraXBwaW5nIHRoYXQgaXMgaW5jb3JyZWN0LCBJIHRo aW5rLiBBbmQgaXQgc2VlbXMgdG8gcmVzdWx0IGluIGEgYm9ndXMgbWVtb3J5IG1hcCBpbiB3aGlj aCB0aGUgc3lzdGVtIHRoaW5rcyB0aGF0IHNvbWUgY3Jhc2hrZXJuZWwgbWVtb3J5IGlzIEVGSSBt ZW1vcnkgaW5zdGVhZC4KCkkgbGlrZSBocGEncyB0aG91Z2h0IHRvIGxvY2sgdGhlIHdob2xlIGxv dyAxTSB1bmNvbmRpdGlvbmFsbHkgc2luY2Ugb25seQphIGZldyBLQiBleGNlcHQgb2YgdHJhbXBv bGluZSBhcmVhIGlzIHRoZXJlLiBSZXRoaW5raW5nIGFib3V0IGl0LCBkb2luZwppdCBpbiBjYW5f ZnJlZV9yZWdpb24oKSBtYXkgYmUgcmlza3kgYmVjYXVzZSBlZmkgbWVtb3J5IHJlZ2lvbiBjb3Vs ZApjcm9zcyB0aGUgMU0gYm91bmRhcnksIGUuZyBbNjQwSywgMTAwTV0gd2l0aCB0eXBlIG9mCkVG SV9CT09UX1NFUlZJQ0VTX0NPREV8RUZJX0JPT1RfU0VSVklDRVNfREFUQSwgaXQgY291bGQgY2F1 c2UgbG9zcyBvZiBtZW1vcnkuCkp1c3QgYSB3aWxkIGd1ZXNzLCBub3QgdmVyeSBzdXJlIGlmIHRo ZSAxTSBib3VuZGFyeSBjb3Jzc2luZyBjYW4gcmVhbGx5CmhhcHBlbi4gZWZpX3Jlc2VydmVfYm9v dF9zZXJ2aWNlcygpIHdvbid0IHNwbGl0IHJlZ2lvbnMuCgpJZiBtb3ZpbmcgZWZpX3Jlc2VydmVf Ym9vdF9zZXJ2aWNlcygpIGFmdGVyIHJlc2VydmVfcmVhbF9tb2RlKCkgaXMgbm90CmFjY2VwdGVk LCBtYXliZSB3ZSBjYW4gY2FsbCBlZmlfbWVtX3Jlc2VydmUoMCwgMU0pIGp1c3QgYXMKZWZpX2Vz cnRfaW5pdCgpIGhhcyBkb25lLgoKCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fCmtleGVjIG1haWxpbmcgbGlzdAprZXhlY0BsaXN0cy5pbmZyYWRlYWQub3Jn Cmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3JnL21haWxtYW4vbGlzdGluZm8va2V4ZWMK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7184FC433B4 for ; Mon, 12 Apr 2021 09:57:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 325FA61245 for ; Mon, 12 Apr 2021 09:57:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238073AbhDLJ51 (ORCPT ); Mon, 12 Apr 2021 05:57:27 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:36562 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241890AbhDLJxG (ORCPT ); Mon, 12 Apr 2021 05:53:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1618221165; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=50xPjNRYwkB4O0C//G0YJK7rbJ/9YSCi+4Syr5FFbG8=; b=X/qELho/J0PNtlb+lpsfHVb13x5Nni2QE2DNC2YIJpUxfWZTejHTH7HxfEobHxSNS2Akhs CNmk94dDrMsvKZQrHCL7b6YETV2aB664S1joaST7CpaMUoQF4IDS2h33mCpBEBnjROiiNi 8LLwap/EDxZkegdIw0z8xJgnEVR3A2I= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-476-SU0GMsnZNS-H2JQ8A7xDTw-1; Mon, 12 Apr 2021 05:52:41 -0400 X-MC-Unique: SU0GMsnZNS-H2JQ8A7xDTw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7F7E8107ACC7; Mon, 12 Apr 2021 09:52:39 +0000 (UTC) Received: from localhost (ovpn-13-33.pek2.redhat.com [10.72.13.33]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7FDBF100238C; Mon, 12 Apr 2021 09:52:34 +0000 (UTC) Date: Mon, 12 Apr 2021 17:52:31 +0800 From: Baoquan He To: Andy Lutomirski Cc: "H. Peter Anvin" , Lianbo Jiang , linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org, platform-driver-x86@vger.kernel.org, x86@kernel.org, ardb@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dvhart@infradead.org, andy@infradead.org, kexec@lists.infradead.org, dyoung@redhat.com Subject: Re: [PATCH] x86/efi: Do not release sub-1MB memory regions when the crashkernel option is specified Message-ID: <20210412095231.GC4282@MiWiFi-R3L-srv> References: <20210412011347.GA4282@MiWiFi-R3L-srv> <8FAA2A0E-0A09-4308-B936-CDD2C0568BAE@amacapital.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8FAA2A0E-0A09-4308-B936-CDD2C0568BAE@amacapital.net> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org On 04/11/21 at 06:49pm, Andy Lutomirski wrote: > > > > On Apr 11, 2021, at 6:14 PM, Baoquan He wrote: > > > > On 04/09/21 at 07:59pm, H. Peter Anvin wrote: > >> Why don't we do this unconditionally? At the very best we gain half a megabyte of memory (except the trampoline, which has to live there, but it is only a few kilobytes.) > > > > This is a great suggestion, thanks. I think we can fix it in this way to > > make code simpler. Then the specific caring of real mode in > > efi_free_boot_services() can be removed too. > > > > This whole situation makes me think that the code is buggy before and buggy after. > > The issue here (I think) is that various pieces of code want to reserve specific pieces of otherwise-available low memory for their own nefarious uses. I don’t know *why* crash kernel needs this, but that doesn’t matter too much. Kdump kernel also need go through real mode code path during bootup. It is not different than normal kernel except that it skips the firmware resetting. So kdump kernel needs low 1M as system RAM just as normal kernel does. Here we reserve the whole low 1M with memblock_reserve() to avoid any later kernel or driver data reside in this area. Otherwise, we need dump the content of this area to vmcore. As we know, when crash happened, the old memory of 1st kernel should be untouched until vmcore dumping read out its content. Meanwhile, kdump kernel need reuse low 1M. In the past, we used a back up region to copy out the low 1M area, and map the back up region into the low 1M area in vmcore elf file. In 6f599d84231fd27 ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified"), we changed to lock the whole low 1M to avoid writting any kernel data into, like this we can skip this area when dumping vmcore. Above is why we try to memblock reserve the whole low 1M. We don't want to use it, just don't want anyone to use it in 1st kernel. > > I propose that the right solution is to give low-memory-reserving code paths two chances to do what they need: once at the very beginning and once after EFI boot services are freed. > > Alternatively, just reserve *all* otherwise unused sub 1M memory up front, then release it right after releasing boot services, and then invoke the special cases exactly once. I am not sure if I got both suggested ways clearly. They look a little complicated in our case. As I explained at above, we want the whole low 1M locked up, not one piece or some pieces of it. > > In either case, the result is that the crashkernel mess gets unified with the trampoline mess. One way the result is called twice and needs to be more careful, and the other way it’s called only once. > > Just skipping freeing boot services seems wrong. It doesn’t unmap boot services, and skipping that is incorrect, I think. And it seems to result in a bogus memory map in which the system thinks that some crashkernel memory is EFI memory instead. I like hpa's thought to lock the whole low 1M unconditionally since only a few KB except of trampoline area is there. Rethinking about it, doing it in can_free_region() may be risky because efi memory region could cross the 1M boundary, e.g [640K, 100M] with type of EFI_BOOT_SERVICES_CODE|EFI_BOOT_SERVICES_DATA, it could cause loss of memory. Just a wild guess, not very sure if the 1M boundary corssing can really happen. efi_reserve_boot_services() won't split regions. If moving efi_reserve_boot_services() after reserve_real_mode() is not accepted, maybe we can call efi_mem_reserve(0, 1M) just as efi_esrt_init() has done.