From: Pratyush Anand <panand@redhat.com>
To: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Simon Horman <horms@verge.net.au>,
kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org,
Baoquan He <bhe@redhat.com>, Kees Cook <keescook@google.com>
Subject: Re: kexec failures with DEBUG_RODATA
Date: Wed, 15 Jun 2016 13:25:08 +0530 [thread overview]
Message-ID: <20160615075508.GB21202@dhcppc6> (raw)
In-Reply-To: <20160614175920.GD1041@n2100.armlinux.org.uk>
Hi Russell,
On 14/06/2016:06:59:20 PM, Russell King - ARM Linux wrote:
> Guys,
>
> Having added Keystone2 support to kexec, and asking TI to validate
> linux-next with mainline kexec-tools, I received two reports from
> them.
>
> The first was a report of success, but was kexecing a 4.4 kernel
> from linux-next.
>
> The second was a failure report, kexecing current linux-next from
> linux-next on this platform. However, my local tests (using my
> 4.7-rc3 derived kernel) showed there to be no problem.
>
> Building my 4.7-rc3 derived kernel with TI's configuration they
> were using with linux-next similarly failed. So, it came down to
> a configuration difference.
>
> After trialling several configurations, it turns out that the
> failure is, in part, caused by CONFIG_DEBUG_RODATA being enabled
> on TI's kernel but not mine. Why should this make any difference?
>
> Well, CONFIG_DEBUG_RODATA has the side effect that the kernel
> contains a lot of additional padding - we pad out to section size
> (1MB) the ELF sections with differing attributes. This should not
> normally be a problem, except kexec contains this assumption:
>
> /* Otherwise, assume the maximum kernel compression ratio
> * is 4, and just to be safe, place ramdisk after that */
> initrd_base = base + _ALIGN(len * 4, getpagesize());
>
> Now, first things first. Don't get misled by the comment - it's
> totally false. That may be what's desired, but that is far from
> what actually happens in reality.
>
> "base" is _not_ the address of the start of the kernel image, but
> is the base address of the start of the region that the kernel is
> to be loaded into - remember that the kernel is normally loaded
> 32k higher than the start of memory. This 32k offset is _not_
> included in either "base" nor "len". So, even if we did want to
> assume that there was a maximum compression ratio of 4, the above
> always calculates 32k short of that value.
>
> The other invalid thing here is this whole "maximum kernel compression
> ratio" assumption. Consider this non-DEBUG_RODATA kernel image:
>
> text data bss dec hex filename
> 6583513 2273816 215344 9072673 8a7021 ../build/ks2/vmlinux
>
> This results in an image and zimage of:
> -rwxrwxr-x 1 rmk rmk 8871936 Jun 14 18:02 ../build/ks2/arch/arm/boot/Image
> -rwxrwxr-x 1 rmk rmk 4381592 Jun 14 18:02 ../build/ks2/arch/arm/boot/zImage
>
> which is a ratio of about a 49%. On entry to the decompressor, the
> compressed image will be relocated above the expected resulting
> kernel size. So, let's say that it's relocated to 9MB. This means
> the zImage will occupy around 9MB-14MB above the start of memory.
> Going by the 4x ratio, we place the other images at 16.7MB. This
> leaves around 2.7MB free. So that's probably fine... but think
> about this. We assumed a ratio of 4x, but really we're in a rather
> tight squeeze - we actually have only about 50% of the compressed
> image size spare.
>
> Now let's look at the DEBUG_RODATA case:
>
> text data bss dec hex filename
> 6585305 2273952 215344 9074601 8a77a9 ../build/ks2/vmlinux
>
> And the resulting sizes:
> -rwxrwxr-x 1 rmk rmk 15024128 Jun 14 18:49 ../build/ks2/arch/arm/boot/Image
> -rwxrwxr-x 1 rmk rmk 4399040 Jun 14 18:49 ../build/ks2/arch/arm/boot/zImage
>
> That's a compression ratio of about 29%. Still within the 4x limit,
> but going through the same calculation above shows that we end up
> totally overflowing the available space this time.
>
> That's exactly the same kernel configuration except for
> CONFIG_DEBUG_RODATA - enabling this has almost _doubled_ the
> decompressed image size without affecting the compressed size.
>
> We've known for some time that this ratio of 4x doesn't work - we
> used to use the same assumption in the decompressor when self-
> relocating, and we found that there are images which achieve a
> better compression ratio and make this invalid. Yet, the 4x thing
> has persisted in kexec code... and buggily too.
>
> Since the kernel now has CONFIG_DEBUG_RODATA by default, this means
> that these kinds of ratio-based assumptions are even more invalid
> than they have been.
>
> Right now, a zImage doesn't advertise the size of its uncompressed
> image, but I think with things like CONFIG_DEBUG_RODATA, we can no
> longer make assumptions like we have done in the past, and we need
> the zImage to provide this information so that the boot environment
> can be setup sanely by boot loaders/kexec rather than relying on
> broken heuristics like this.
>
> Thoughts?
Sure, having a header information would be handy to do it. Other alternative
could be that we define "HAVE_LIBZ" and then we can have something like
kexec-Image-arm.c which handles plane Image. We can also have something like
get_zlib_decompressed_length() which can give us exact length we need for kernel
and then we can place initrd accordingly in zImage_arm_load().
Other than that:
I see at least another issue clearly in ARM kernel code with CONFIG_DEBUG_RODATA
enabled. When CONFIG_DEBUG_RODATA is enabled, we can not write text area.
kexec_start_address has been defined in relocate_kernel.S as a text area.
machine_kexec() writes at kexec_start_address with image->start. Similarly there
would be issues for overwriting of kexec_indirection_page, kexec_mach_type and
kexec_boot_atags. If arm mmu mapping configures text pages as RO, then we should
see an abort as soon as we do these writes.
I think there could be two alternatives to fix it (i) We pass all of above
values as function argument to soft_restart() till relocate_new_kernel(). (ii)
Remove kexec_start_address and others from relocate_new_kernel(). Now in
machine_kexec(), copy kexec_start_address and friends directly to
(reboot_code_buffer+relocate_new_kernel_size)
~Pratyush
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: panand@redhat.com (Pratyush Anand)
To: linux-arm-kernel@lists.infradead.org
Subject: kexec failures with DEBUG_RODATA
Date: Wed, 15 Jun 2016 13:25:08 +0530 [thread overview]
Message-ID: <20160615075508.GB21202@dhcppc6> (raw)
In-Reply-To: <20160614175920.GD1041@n2100.armlinux.org.uk>
Hi Russell,
On 14/06/2016:06:59:20 PM, Russell King - ARM Linux wrote:
> Guys,
>
> Having added Keystone2 support to kexec, and asking TI to validate
> linux-next with mainline kexec-tools, I received two reports from
> them.
>
> The first was a report of success, but was kexecing a 4.4 kernel
> from linux-next.
>
> The second was a failure report, kexecing current linux-next from
> linux-next on this platform. However, my local tests (using my
> 4.7-rc3 derived kernel) showed there to be no problem.
>
> Building my 4.7-rc3 derived kernel with TI's configuration they
> were using with linux-next similarly failed. So, it came down to
> a configuration difference.
>
> After trialling several configurations, it turns out that the
> failure is, in part, caused by CONFIG_DEBUG_RODATA being enabled
> on TI's kernel but not mine. Why should this make any difference?
>
> Well, CONFIG_DEBUG_RODATA has the side effect that the kernel
> contains a lot of additional padding - we pad out to section size
> (1MB) the ELF sections with differing attributes. This should not
> normally be a problem, except kexec contains this assumption:
>
> /* Otherwise, assume the maximum kernel compression ratio
> * is 4, and just to be safe, place ramdisk after that */
> initrd_base = base + _ALIGN(len * 4, getpagesize());
>
> Now, first things first. Don't get misled by the comment - it's
> totally false. That may be what's desired, but that is far from
> what actually happens in reality.
>
> "base" is _not_ the address of the start of the kernel image, but
> is the base address of the start of the region that the kernel is
> to be loaded into - remember that the kernel is normally loaded
> 32k higher than the start of memory. This 32k offset is _not_
> included in either "base" nor "len". So, even if we did want to
> assume that there was a maximum compression ratio of 4, the above
> always calculates 32k short of that value.
>
> The other invalid thing here is this whole "maximum kernel compression
> ratio" assumption. Consider this non-DEBUG_RODATA kernel image:
>
> text data bss dec hex filename
> 6583513 2273816 215344 9072673 8a7021 ../build/ks2/vmlinux
>
> This results in an image and zimage of:
> -rwxrwxr-x 1 rmk rmk 8871936 Jun 14 18:02 ../build/ks2/arch/arm/boot/Image
> -rwxrwxr-x 1 rmk rmk 4381592 Jun 14 18:02 ../build/ks2/arch/arm/boot/zImage
>
> which is a ratio of about a 49%. On entry to the decompressor, the
> compressed image will be relocated above the expected resulting
> kernel size. So, let's say that it's relocated to 9MB. This means
> the zImage will occupy around 9MB-14MB above the start of memory.
> Going by the 4x ratio, we place the other images at 16.7MB. This
> leaves around 2.7MB free. So that's probably fine... but think
> about this. We assumed a ratio of 4x, but really we're in a rather
> tight squeeze - we actually have only about 50% of the compressed
> image size spare.
>
> Now let's look at the DEBUG_RODATA case:
>
> text data bss dec hex filename
> 6585305 2273952 215344 9074601 8a77a9 ../build/ks2/vmlinux
>
> And the resulting sizes:
> -rwxrwxr-x 1 rmk rmk 15024128 Jun 14 18:49 ../build/ks2/arch/arm/boot/Image
> -rwxrwxr-x 1 rmk rmk 4399040 Jun 14 18:49 ../build/ks2/arch/arm/boot/zImage
>
> That's a compression ratio of about 29%. Still within the 4x limit,
> but going through the same calculation above shows that we end up
> totally overflowing the available space this time.
>
> That's exactly the same kernel configuration except for
> CONFIG_DEBUG_RODATA - enabling this has almost _doubled_ the
> decompressed image size without affecting the compressed size.
>
> We've known for some time that this ratio of 4x doesn't work - we
> used to use the same assumption in the decompressor when self-
> relocating, and we found that there are images which achieve a
> better compression ratio and make this invalid. Yet, the 4x thing
> has persisted in kexec code... and buggily too.
>
> Since the kernel now has CONFIG_DEBUG_RODATA by default, this means
> that these kinds of ratio-based assumptions are even more invalid
> than they have been.
>
> Right now, a zImage doesn't advertise the size of its uncompressed
> image, but I think with things like CONFIG_DEBUG_RODATA, we can no
> longer make assumptions like we have done in the past, and we need
> the zImage to provide this information so that the boot environment
> can be setup sanely by boot loaders/kexec rather than relying on
> broken heuristics like this.
>
> Thoughts?
Sure, having a header information would be handy to do it. Other alternative
could be that we define "HAVE_LIBZ" and then we can have something like
kexec-Image-arm.c which handles plane Image. We can also have something like
get_zlib_decompressed_length() which can give us exact length we need for kernel
and then we can place initrd accordingly in zImage_arm_load().
Other than that:
I see at least another issue clearly in ARM kernel code with CONFIG_DEBUG_RODATA
enabled. When CONFIG_DEBUG_RODATA is enabled, we can not write text area.
kexec_start_address has been defined in relocate_kernel.S as a text area.
machine_kexec() writes at kexec_start_address with image->start. Similarly there
would be issues for overwriting of kexec_indirection_page, kexec_mach_type and
kexec_boot_atags. If arm mmu mapping configures text pages as RO, then we should
see an abort as soon as we do these writes.
I think there could be two alternatives to fix it (i) We pass all of above
values as function argument to soft_restart() till relocate_new_kernel(). (ii)
Remove kexec_start_address and others from relocate_new_kernel(). Now in
machine_kexec(), copy kexec_start_address and friends directly to
(reboot_code_buffer+relocate_new_kernel_size)
~Pratyush
next prev parent reply other threads:[~2016-06-15 7:55 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-14 17:59 kexec failures with DEBUG_RODATA Russell King - ARM Linux
2016-06-14 17:59 ` Russell King - ARM Linux
2016-06-14 18:05 ` Kees Cook
2016-06-14 18:05 ` Kees Cook
2016-06-15 2:43 ` Baoquan He
2016-06-15 2:43 ` Baoquan He
2016-06-15 21:13 ` Russell King - ARM Linux
2016-06-15 21:13 ` Russell King - ARM Linux
2016-06-15 22:20 ` Kees Cook
2016-06-15 22:20 ` Kees Cook
2016-06-15 22:42 ` Russell King - ARM Linux
2016-06-15 22:42 ` Russell King - ARM Linux
2016-06-15 22:54 ` Kees Cook
2016-06-15 22:54 ` Kees Cook
2016-06-15 23:13 ` Russell King - ARM Linux
2016-06-15 23:13 ` Russell King - ARM Linux
2016-06-21 11:48 ` Pratyush Anand
2016-06-21 11:48 ` Pratyush Anand
2016-06-21 15:37 ` Russell King - ARM Linux
2016-06-21 15:37 ` Russell King - ARM Linux
2016-07-07 10:20 ` Russell King - ARM Linux
2016-07-07 10:20 ` Russell King - ARM Linux
2016-07-07 14:01 ` [PATCH 1/2] arm: plug a zImage corner case Russell King
2016-07-07 14:01 ` Russell King
2016-07-15 4:13 ` Simon Horman
2016-07-15 4:13 ` Simon Horman
2016-08-02 23:09 ` libdrm-armada repository Joshua Clayton
2016-08-02 23:28 ` Russell King
2016-08-03 17:47 ` Joshua Clayton
2016-08-03 1:38 ` Fabio Estevam
2016-08-03 17:55 ` Joshua Clayton
2016-07-07 14:01 ` [PATCH 2/2] arm: use zImage size from header Russell King
2016-07-07 14:01 ` Russell King
2016-07-21 7:00 ` kexec failures with DEBUG_RODATA Tony Lindgren
2016-07-21 7:00 ` Tony Lindgren
2016-07-07 10:00 ` Russell King - ARM Linux
2016-07-07 10:00 ` Russell King - ARM Linux
2016-06-15 7:55 ` Pratyush Anand [this message]
2016-06-15 7:55 ` Pratyush Anand
2016-06-15 19:13 ` Russell King - ARM Linux
2016-06-15 19:13 ` Russell King - ARM Linux
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160615075508.GB21202@dhcppc6 \
--to=panand@redhat.com \
--cc=bhe@redhat.com \
--cc=horms@verge.net.au \
--cc=keescook@google.com \
--cc=kexec@lists.infradead.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux@armlinux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.