From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E51C9C636CC for ; Tue, 31 Jan 2023 23:32:45 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pN077-0001gf-6Z; Tue, 31 Jan 2023 18:32:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pN070-0001fU-15; Tue, 31 Jan 2023 18:32:26 -0500 Received: from dfw.source.kernel.org ([2604:1380:4641:c500::1]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pN06x-0004VF-Sa; Tue, 31 Jan 2023 18:32:25 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9464661741; Tue, 31 Jan 2023 23:32:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 15389C433EF; Tue, 31 Jan 2023 23:32:17 +0000 (UTC) Authentication-Results: smtp.kernel.org; dkim=pass (1024-bit key) header.d=zx2c4.com header.i=@zx2c4.com header.b="UalxFW9U" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1675207935; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=21s7evhZm93ChFPBeJ+wL6RNLmUmHt/IpCujBE7YM1Q=; b=UalxFW9UGtjtIhHRebYQETlESvtiAd2hUCEl70/OaG81q5Xd6c246JxUU3geNwUV79yyPv P8WLK2CnaAn8Q/XFhFELJeGJz6I+k/OM6YeeqKgjY39M6YPqmioIMHHJN5+ciRgSCg5XsR KO19kJW4Z9KHpEbu6bkKyTXWpfPf0Mg= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 7c14657c (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Tue, 31 Jan 2023 23:32:14 +0000 (UTC) Date: Wed, 1 Feb 2023 00:32:13 +0100 From: "Jason A. Donenfeld" To: "Michael S. Tsirkin" Cc: qemu-devel@nongnu.org, Peter Maydell , x86@kernel.org, Philippe =?utf-8?Q?Mathieu-Daud=C3=A9?= , "H . Peter Anvin" , Borislav Petkov , Eric Biggers , Eric Biggers , Mathias Krause , Sergio Lopez , Paolo Bonzini , Richard Henderson , Eduardo Habkost , Marcel Apfelbaum , Gerd Hoffmann , qemu-stable@nongnu.org Subject: Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data Message-ID: References: <20230130201810.11518-1-mst@redhat.com> <20230130201810.11518-11-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230130201810.11518-11-mst@redhat.com> Received-SPF: pass client-ip=2604:1380:4641:c500::1; envelope-from=SRS0=PxBQ=54=zx2c4.com=Jason@kernel.org; helo=dfw.source.kernel.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote: > From: "Jason A. Donenfeld" > > The setup_data links are appended to the compressed kernel image. Since > the kernel image is typically loaded at 0x100000, setup_data lives at > `0x100000 + compressed_size`, which does not get relocated during the > kernel's boot process. > > The kernel typically decompresses the image starting at address > 0x1000000 (note: there's one more zero there than the compressed image > above). This usually is fine for most kernels. > > However, if the compressed image is actually quite large, then > setup_data will live at a `0x100000 + compressed_size` that extends into > the decompressed zone at 0x1000000. In other words, if compressed_size > is larger than `0x1000000 - 0x100000`, then the decompression step will > clobber setup_data, resulting in crashes. > > Visually, what happens now is that QEMU appends setup_data to the kernel > image: > > kernel image setup_data > |--------------------------||----------------| > 0x100000 0x100000+l1 0x100000+l1+l2 > > The problem is that this decompresses to 0x1000000 (one more zero). So > if l1 is > (0x1000000-0x100000), then this winds up looking like: > > kernel image setup_data > |--------------------------||----------------| > 0x100000 0x100000+l1 0x100000+l1+l2 > > d e c o m p r e s s e d k e r n e l > |-------------------------------------------------------------| > 0x1000000 0x1000000+l3 > > The decompressed kernel seemingly overwriting the compressed kernel > image isn't a problem, because that gets relocated to a higher address > early on in the boot process, at the end of startup_64. setup_data, > however, stays in the same place, since those links are self referential > and nothing fixes them up. So the decompressed kernel clobbers it. > > Fix this by appending setup_data to the cmdline blob rather than the > kernel image blob, which remains at a lower address that won't get > clobbered. > > This could have been done by overwriting the initrd blob instead, but > that poses big difficulties, such as no longer being able to use memory > mapped files for initrd, hurting performance, and, more importantly, the > initrd address calculation is hard coded in qboot, and it always grows > down rather than up, which means lots of brittle semantics would have to > be changed around, incurring more complexity. In contrast, using cmdline > is simple and doesn't interfere with anything. > > The microvm machine has a gross hack where it fiddles with fw_cfg data > after the fact. So this hack is updated to account for this appending, > by reserving some bytes. > > Fixup-by: Michael S. Tsirkin > Cc: x86@kernel.org > Cc: Philippe Mathieu-Daudé > Cc: H. Peter Anvin > Cc: Borislav Petkov > Cc: Eric Biggers > Signed-off-by: Jason A. Donenfeld > Message-Id: <20221230220725.618763-1-Jason@zx2c4.com> > Message-ID: <20230128061015-mutt-send-email-mst@kernel.org> > Reviewed-by: Michael S. Tsirkin > Signed-off-by: Michael S. Tsirkin > Tested-by: Eric Biggers > Tested-by: Mathias Krause > --- > include/hw/i386/microvm.h | 5 ++-- > include/hw/nvram/fw_cfg.h | 9 +++++++ > hw/i386/microvm.c | 15 +++++++---- > hw/i386/x86.c | 52 +++++++++++++++++++++------------------ > hw/nvram/fw_cfg.c | 9 +++++++ > 5 files changed, 59 insertions(+), 31 deletions(-) Cc: qemu-stable@nongnu.org Fixes: 67f7e426 ("hw/i386: pass RNG seed via setup_data entry")