Re: [PATCH qemu v3] x86: don't let decompressed kernel image clobber setup_data

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Philippe Mathieu-Daudé" <philmd@linaro.org>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>,
	pbonzini@redhat.com, ebiggers@kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	ardb@kernel.org, kraxel@redhat.com, hpa@zytor.com, bp@alien8.de
Subject: Re: [PATCH qemu v3] x86: don't let decompressed kernel image clobber setup_data
Date: Mon, 23 Jan 2023 09:26:17 +0100	[thread overview]
Message-ID: <329fc9bc-017d-412e-a03e-2c3f1f3bcede@linaro.org> (raw)
In-Reply-To: <20221230220725.618763-1-Jason@zx2c4.com>

On 30/12/22 23:07, Jason A. Donenfeld wrote:
> The setup_data links are appended to the compressed kernel image. Since
> the kernel image is typically loaded at 0x100000, setup_data lives at
> `0x100000 + compressed_size`, which does not get relocated during the
> kernel's boot process.
> 
> The kernel typically decompresses the image starting at address
> 0x1000000 (note: there's one more zero there than the compressed image
> above). This usually is fine for most kernels.
> 
> However, if the compressed image is actually quite large, then
> setup_data will live at a `0x100000 + compressed_size` that extends into
> the decompressed zone at 0x1000000. In other words, if compressed_size
> is larger than `0x1000000 - 0x100000`, then the decompression step will
> clobber setup_data, resulting in crashes.
> 
> Visually, what happens now is that QEMU appends setup_data to the kernel
> image:
> 
>            kernel image            setup_data
>     |--------------------------||----------------|
> 0x100000                  0x100000+l1     0x100000+l1+l2
> 
> The problem is that this decompresses to 0x1000000 (one more zero). So
> if l1 is > (0x1000000-0x100000), then this winds up looking like:
> 
>            kernel image            setup_data
>     |--------------------------||----------------|
> 0x100000                  0x100000+l1     0x100000+l1+l2
> 
>                                   d e c o m p r e s s e d   k e r n e l
>                       |-------------------------------------------------------------|
>                  0x1000000                                                     0x1000000+l3
> 
> The decompressed kernel seemingly overwriting the compressed kernel
> image isn't a problem, because that gets relocated to a higher address
> early on in the boot process, at the end of startup_64. setup_data,
> however, stays in the same place, since those links are self referential
> and nothing fixes them up.  So the decompressed kernel clobbers it.
> 
> Fix this by appending setup_data to the cmdline blob rather than the
> kernel image blob, which remains at a lower address that won't get
> clobbered.
> 
> This could have been done by overwriting the initrd blob instead, but
> that poses big difficulties, such as no longer being able to use memory
> mapped files for initrd, hurting performance, and, more importantly, the
> initrd address calculation is hard coded in qboot, and it always grows
> down rather than up, which means lots of brittle semantics would have to
> be changed around, incurring more complexity. In contrast, using cmdline
> is simple and doesn't interfere with anything.
> 
> The microvm machine has a gross hack where it fiddles with fw_cfg data
> after the fact. So this hack is updated to account for this appending,
> by reserving some bytes.
> 
> Cc: x86@kernel.org
> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Changes v2->v3:
> - Fix mistakes in string handling.
> Changes v1->v2:
> - Append setup_data to cmdline instead of kernel image.
> 
>   hw/i386/microvm.c         | 13 ++++++----
>   hw/i386/x86.c             | 50 +++++++++++++++++++--------------------
>   hw/nvram/fw_cfg.c         |  9 +++++++
>   include/hw/i386/microvm.h |  5 ++--
>   include/hw/nvram/fw_cfg.h |  9 +++++++
>   5 files changed, 54 insertions(+), 32 deletions(-)

> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> index a00881bc64..432754eda4 100644
> --- a/hw/nvram/fw_cfg.c
> +++ b/hw/nvram/fw_cfg.c
> @@ -741,6 +741,15 @@ void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, void *data, size_t len)
>       fw_cfg_add_bytes_callback(s, key, NULL, NULL, NULL, data, len, true);
>   }
>   
> +void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key)
> +{
> +    int arch = !!(key & FW_CFG_ARCH_LOCAL);
> +
> +    key &= FW_CFG_ENTRY_MASK;
> +    assert(key < fw_cfg_max_entry(s));
> +    return s->entries[arch][key].data;

Shouldn't it be safer to provide a size argument, and return
NULL if s->entries[arch][key].len < size?

Maybe this API should return a (casted) const pointer, so the
only way to update the key is via fw_cfg_add_bytes().

> +}
> +

> diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
> index 2e503904dc..990dcdbb2e 100644
> --- a/include/hw/nvram/fw_cfg.h
> +++ b/include/hw/nvram/fw_cfg.h
> @@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s, uint16_t key,
>                                  void *data, size_t len,
>                                  bool read_only);
>   
> +/**
> + * fw_cfg_read_bytes_ptr:
> + * @s: fw_cfg device being modified
> + * @key: selector key value for new fw_cfg item
> + *
> + * Reads an existing fw_cfg data pointer.
> + */
> +void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key);
> +
>   /**
>    * fw_cfg_add_string:
>    * @s: fw_cfg device being modified

next prev parent reply	other threads:[~2023-01-23  8:26 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-30 18:38 [PATCH qemu v2] x86: don't let decompressed kernel image clobber setup_data Jason A. Donenfeld
2022-12-30 21:59 ` Jason A. Donenfeld
2022-12-30 22:07   ` [PATCH qemu v3] " Jason A. Donenfeld
2023-01-05  5:16     ` Eric Biggers
2023-01-10 12:10     ` Mathias Krause
2023-01-10 15:34     ` Jason A. Donenfeld
2023-01-10 17:50       ` Michael S. Tsirkin
2023-01-23  4:21         ` Eric Biggers
2023-01-23 12:12           ` Michael S. Tsirkin
2023-01-23 12:37             ` Jason A. Donenfeld
2023-01-28 11:15               ` Michael S. Tsirkin
2023-01-30  9:31                 ` Daniel P. Berrangé
2023-01-23  8:26     ` Philippe Mathieu-Daudé [this message]
2023-02-08 17:45     ` Nathan Chancellor
2023-02-08 17:54       ` Jason A. Donenfeld
2023-02-08 18:09         ` Jason A. Donenfeld
2023-02-08 18:10         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=329fc9bc-017d-412e-a03e-2c3f1f3bcede@linaro.org \
    --to=philmd@linaro.org \
    --cc=Jason@zx2c4.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=ebiggers@kernel.org \
    --cc=hpa@zytor.com \
    --cc=kraxel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).