From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1A8D2C636CC for ; Tue, 31 Jan 2023 19:39:28 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pMwTS-0008Gk-Bl; Tue, 31 Jan 2023 14:39:22 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pMwTQ-0008DO-NV for qemu-devel@nongnu.org; Tue, 31 Jan 2023 14:39:20 -0500 Received: from dfw.source.kernel.org ([2604:1380:4641:c500::1]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pMwTO-0003ND-BI for qemu-devel@nongnu.org; Tue, 31 Jan 2023 14:39:20 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 00E88615C9; Tue, 31 Jan 2023 19:39:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB954C433D2; Tue, 31 Jan 2023 19:39:11 +0000 (UTC) Authentication-Results: smtp.kernel.org; dkim=pass (1024-bit key) header.d=zx2c4.com header.i=@zx2c4.com header.b="cRsfDKVq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1675193949; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uC+1Lb4WJpTQLCFlanMsHGHtfrK3uUs+OJAj+lmCL9Y=; b=cRsfDKVqP+4H8DuwsmZyLwzOk38hQ7FWUZM3l6KDT5P2mhiiwbQJbFD8GaBELK3mwUa4xE on9VHL7ibD6k82bH6/7nH7MCnJaL21Dlw3GnkQ1e0ryZmtGXiO6jpY26UpL1G4FxBnH4ef MQYqJ1CA8DnJrTyUGflNii2bHqQWC4g= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 66cd4b9d (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Tue, 31 Jan 2023 19:39:09 +0000 (UTC) Date: Tue, 31 Jan 2023 20:39:08 +0100 From: "Jason A. Donenfeld" To: "Michael S. Tsirkin" Cc: qemu-devel@nongnu.org, Peter Maydell , x86@kernel.org, Philippe =?utf-8?Q?Mathieu-Daud=C3=A9?= , "H . Peter Anvin" , Borislav Petkov , Eric Biggers , Eric Biggers , Mathias Krause , Sergio Lopez , Paolo Bonzini , Richard Henderson , Eduardo Habkost , Marcel Apfelbaum , Gerd Hoffmann Subject: Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data Message-ID: References: <20230130201810.11518-1-mst@redhat.com> <20230130201810.11518-11-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230130201810.11518-11-mst@redhat.com> Received-SPF: pass client-ip=2604:1380:4641:c500::1; envelope-from=SRS0=PxBQ=54=zx2c4.com=Jason@kernel.org; helo=dfw.source.kernel.org X-Spam_score_int: -67 X-Spam_score: -6.8 X-Spam_bar: ------ X-Spam_report: (-6.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote: > From: "Jason A. Donenfeld" > > The setup_data links are appended to the compressed kernel image. Since > the kernel image is typically loaded at 0x100000, setup_data lives at > `0x100000 + compressed_size`, which does not get relocated during the > kernel's boot process. > > The kernel typically decompresses the image starting at address > 0x1000000 (note: there's one more zero there than the compressed image > above). This usually is fine for most kernels. > > However, if the compressed image is actually quite large, then > setup_data will live at a `0x100000 + compressed_size` that extends into > the decompressed zone at 0x1000000. In other words, if compressed_size > is larger than `0x1000000 - 0x100000`, then the decompression step will > clobber setup_data, resulting in crashes. > > Visually, what happens now is that QEMU appends setup_data to the kernel > image: > > kernel image setup_data > |--------------------------||----------------| > 0x100000 0x100000+l1 0x100000+l1+l2 > > The problem is that this decompresses to 0x1000000 (one more zero). So > if l1 is > (0x1000000-0x100000), then this winds up looking like: > > kernel image setup_data > |--------------------------||----------------| > 0x100000 0x100000+l1 0x100000+l1+l2 > > d e c o m p r e s s e d k e r n e l > |-------------------------------------------------------------| > 0x1000000 0x1000000+l3 > > The decompressed kernel seemingly overwriting the compressed kernel > image isn't a problem, because that gets relocated to a higher address > early on in the boot process, at the end of startup_64. setup_data, > however, stays in the same place, since those links are self referential > and nothing fixes them up. So the decompressed kernel clobbers it. > > Fix this by appending setup_data to the cmdline blob rather than the > kernel image blob, which remains at a lower address that won't get > clobbered. > > This could have been done by overwriting the initrd blob instead, but > that poses big difficulties, such as no longer being able to use memory > mapped files for initrd, hurting performance, and, more importantly, the > initrd address calculation is hard coded in qboot, and it always grows > down rather than up, which means lots of brittle semantics would have to > be changed around, incurring more complexity. In contrast, using cmdline > is simple and doesn't interfere with anything. > > The microvm machine has a gross hack where it fiddles with fw_cfg data > after the fact. So this hack is updated to account for this appending, > by reserving some bytes. > > Fixup-by: Michael S. Tsirkin > Cc: x86@kernel.org > Cc: Philippe Mathieu-Daudé > Cc: H. Peter Anvin > Cc: Borislav Petkov > Cc: Eric Biggers > Signed-off-by: Jason A. Donenfeld > Message-Id: <20221230220725.618763-1-Jason@zx2c4.com> > Message-ID: <20230128061015-mutt-send-email-mst@kernel.org> > Reviewed-by: Michael S. Tsirkin > Signed-off-by: Michael S. Tsirkin > Tested-by: Eric Biggers > Tested-by: Mathias Krause This one should wind up in the stable point release too. Dunno what the procedure for that is. Jason