linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: takahiro.akashi@linaro.org (AKASHI Takahiro)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v30 05/11] arm64: kdump: protect crash dump kernel memory
Date: Mon, 30 Jan 2017 17:27:51 +0900	[thread overview]
Message-ID: <20170130082749.GH23406@linaro.org> (raw)
In-Reply-To: <20170127171513.GA3119@fireball>

James,

On Sat, Jan 28, 2017 at 02:15:16AM +0900, AKASHI Takahiro wrote:
> James,
> 
> On Fri, Jan 27, 2017 at 11:19:32AM +0000, James Morse wrote:
> > Hi Akashi,
> > 
> > On 26/01/17 11:28, AKASHI Takahiro wrote:
> > > On Wed, Jan 25, 2017 at 05:37:38PM +0000, James Morse wrote:
> > >> On 24/01/17 08:49, AKASHI Takahiro wrote:
> > >>> To protect the memory reserved for crash dump kernel once after loaded,
> > >>> arch_kexec_protect_crashres/unprotect_crashres() are meant to deal with
> > >>> permissions of the corresponding kernel mappings.
> > >>>
> > >>> We also have to
> > >>> - put the region in an isolated mapping, and
> > >>> - move copying kexec's control_code_page to machine_kexec_prepare()
> > >>> so that the region will be completely read-only after loading.
> > >>
> > >>
> > >>> Note that the region must reside in linear mapping and have corresponding
> > >>> page structures in order to be potentially freed by shrinking it through
> > >>> /sys/kernel/kexec_crash_size.
> > >>
> > >> Nasty! Presumably you have to build the crash region out of individual page
> > >> mappings,
> > > 
> > > This might be an alternative, but
> > > 
> > >> so that they can be returned to the slab-allocator one page at a time,
> > >> and still be able to set/clear the valid bits on the remaining chunk.
> > >> (I don't see how that happens in this patch)
> > > 
> > > As far as shrinking feature is concerned, I believe, crash_shrink_memory(),
> > > which eventually calls free_reserved_page(), will take care of all the things
> > > to do. I can see increased number of "MemFree" in /proc/meminfo.
> > 
> > Except for arch specific stuff like reformatting the page tables. Maybe I've
> > overlooked the way out this. What happens with this scenario:
> > 
> > We boot with crashkernel=1G on the commandline.
> > Memblock_reserve allocates a naturally aligned 1GB block of memory for the crash
> > region.
> > Your code in __map_memblock() calls __create_pgd_mapping() ->
> > alloc_init_pud() which decides use_1G_block() looks like a good idea.
> > 
> > Some time later, the user decides to free half of this region,
> > free_reserved_page() does its thing and half of those struct page's now belong
> > to the memory allocator.
> > 
> > Now we load a kdump kernel, which causes arch_kexec_protect_crashkres() to be
> > called for the 512MB region that was left.
> > 
> > create_mapping_late() needs to split the 1GB mapping it originally made into a
> > smaller table, with the first half using PAGE_KERNEL_INVALID, and the second
> > half using PAGE_KERNEL. It can't do break-before-make because these pages may be
> > in-use by another CPU because we gave them back to the memory allocator. (in the
> > worst-possible world, that second half contains our stack!)
> 
> Yeah, this is a horrible case.
> Now I understand why we should stick with page_mapping_only option.
> 
> > 
> > Making this behave more like debug_pagealloc where the region is only built of
> > page-size mappings should avoid this. The smallest change to what you have is to
> > always pass page_mappings_only for the kdump region.
> > 
> > Ideally we just disable this resize feature for ARM64 and support it with some
> > later kernel version, but I can't see a way of doing this without adding Kconfig
> > symbols to other architectures.
> > 
> > 
> > > (Please note that the region is memblock_reserve()'d at boot time.)
> > 
> > And free_reserved_page() does nothing to update memblock, so
> > memblock_is_reserved() says these pages are reserved, but in reality they
> > are in use by the memory allocator. This doesn't feel right.
> 
> Just FYI, no other architectures take care of this issue.
> 
> (and I don't know whether the memblock is reserved or not may have
> any impact after booting.)
> 
> > (Fortunately we can override crash_free_reserved_phys_range() so this can
> >  probably be fixed)
> > 
> > >> This secretly-unmapped is the sort of thing that breaks hibernate, it blindly
> > >> assumes pfn_valid() means it can access the page if it wants to. Setting
> > >> PG_Reserved is a quick way to trick it out of doing this, but that would leave
> > >> the crash kernel region un-initialised after resume, while kexec_crash_image
> > >> still has a value.
> > > 
> > > Ouch, I didn't notice this issue.
> > > 
> > >> I think the best fix for this is to forbid hibernate if kexec_crash_loaded()
> > >> arguing these are mutually-exclusive features, and the protect crash-dump
> > >> feature exists to prevent things like hibernate corrupting the crash region.
> > > 
> > > This restriction is really painful.
> > > Is there any hibernation hook that will be invoked before suspending and
> > > after resuming? If so, arch_kexec_unprotect_crashkres()/protect_crashkres()
> > > will be able to be called.
> > 
> > Those calls could go in swsusp_arch_suspend() in /arch/arm64/kernel/hibernate.c,
> 
> I will give it a try next week.

I took the following test scenario:
 - load the crash dump kernel
 - echo shutdown > /sys/power/disk
 - echo disk > /sys/power/state
 - power off and on the board
 - reboot with resume=...
 - after hibernate done, echo c > /proc/sysrq-trigger
 - after reboot, check /proc/vmcore

and everything looks to work fine.

If you think I'd better try more test cases, please let me know.

Thanks,
-Takahiro AKASHI
===8<===
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index fe301cbcb442..111a849333ee 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -16,6 +16,7 @@
  */
 #define pr_fmt(x) "hibernate: " x
 #include <linux/cpu.h>
+#include <linux/kexec.h>
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
 #include <linux/pm.h>
@@ -289,6 +290,12 @@ int swsusp_arch_suspend(void)
 	local_dbg_save(flags);
 
 	if (__cpu_suspend_enter(&state)) {
+#ifdef CONFIG_KEXEC_CORE
+		/* make the crash dump kernel region mapped */
+		if (kexec_crash_image)
+			arch_kexec_unprotect_crashkres();
+#endif
+
 		sleep_cpu = smp_processor_id();
 		ret = swsusp_save();
 	} else {
@@ -300,6 +307,12 @@ int swsusp_arch_suspend(void)
 		if (el2_reset_needed())
 			dcache_clean_range(__hyp_idmap_text_start, __hyp_idmap_text_end);
 
+#ifdef CONFIG_KEXEC_CORE
+		/* make the crash dump kernel region unmapped */
+		if (kexec_crash_image)
+			arch_kexec_protect_crashkres();
+#endif
+
 		/*
 		 * Tell the hibernation core that we've just restored
 		 * the memory

  parent reply	other threads:[~2017-01-30  8:27 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-24  8:46 [PATCH v30 00/11] arm64: add kdump support AKASHI Takahiro
2017-01-24  8:49 ` [PATCH v30 01/11] memblock: add memblock_cap_memory_range() AKASHI Takahiro
2017-01-24  8:49 ` [PATCH v30 02/11] arm64: limit memory regions based on DT property, usable-memory-range AKASHI Takahiro
2017-01-24  8:49 ` [PATCH v30 03/11] arm64: kdump: reserve memory for crash dump kernel AKASHI Takahiro
2017-01-24  8:49 ` [PATCH v30 04/11] arm64: mm: allow for unmapping memory region from kernel mapping AKASHI Takahiro
2017-01-24 11:32   ` Pratyush Anand
2017-01-25  6:37     ` AKASHI Takahiro
2017-01-25 15:49   ` James Morse
2017-01-26  8:08     ` AKASHI Takahiro
2017-01-24  8:49 ` [PATCH v30 05/11] arm64: kdump: protect crash dump kernel memory AKASHI Takahiro
2017-01-25 17:37   ` James Morse
2017-01-26 11:28     ` AKASHI Takahiro
2017-01-27 11:19       ` James Morse
2017-01-27 17:15         ` AKASHI Takahiro
2017-01-27 18:56           ` Mark Rutland
2017-01-30  8:42             ` AKASHI Takahiro
2017-01-30  8:27           ` AKASHI Takahiro [this message]
2017-01-27 13:59   ` James Morse
2017-01-27 15:42     ` AKASHI Takahiro
2017-01-27 19:41       ` Mark Rutland
2017-01-24  8:50 ` [PATCH v30 06/11] arm64: kdump: implement machine_crash_shutdown() AKASHI Takahiro
2017-01-24  8:50 ` [PATCH v30 07/11] arm64: kdump: add VMCOREINFO's for user-space tools AKASHI Takahiro
2017-01-24  8:50 ` [PATCH v30 08/11] arm64: kdump: provide /proc/vmcore file AKASHI Takahiro
2017-01-24  8:50 ` [PATCH v30 09/11] arm64: kdump: enable kdump in defconfig AKASHI Takahiro
2017-01-24  8:50 ` [PATCH v30 10/11] Documentation: kdump: describe arm64 port AKASHI Takahiro
2017-01-24  8:53 ` [PATCH v30 11/11] Documentation: dt: chosen properties for arm64 kdump AKASHI Takahiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170130082749.GH23406@linaro.org \
    --to=takahiro.akashi@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).