kexec.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Jiri Bohac <jbohac@suse.cz>
Cc: Toshi Kani <toshi.kani@hpe.com>, David Airlie <airlied@linux.ie>,
	yinghai@kernel.org, joro@8bytes.org, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dave Young <dyoung@redhat.com>, Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
Date: Tue, 9 Jan 2018 14:19:33 +0800	[thread overview]
Message-ID: <20180109061933.GC1935@localhost.localdomain> (raw)
In-Reply-To: <20180106010013.73suskgxm7lox7g6@dwarf.suse.cz>

On 01/06/18 at 02:00am, Jiri Bohac wrote:
> Hi Baoquan,
> 
> thank you very much for your review!
> 
> On Wed, Dec 27, 2017 at 03:44:49PM +0800, Baoquan He wrote:
> > > > 1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
> > > > will ignore the ram we need dump.
> > > 
> > > yes, instead of crashing the machine (because GART may be initialized in the
> > > 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> > > fake e820 map sees as unused).
> > > 
> > > I'd say this is an improvement.
> > 
> > I don't get what you said. If 'iommu=off' only specified in 1st kernel,
> > kdump kernel will think the memory which GART bar pointed as a hole.
> > This is incorrect. I don't see the improvement.
> 
> Without the patch, the kernel will crash/hang the machine, as it
> will (correctly) try to dump the first kernel's memory that is
> now mapped by the GART.
> 
> With the patch it will produce a dump with the data missing.
> But this is just a corner case, I don't see why someone would
> specify iommu=off in the first kernel and not the second.
> 
> So instead of crashing and producing no dump at all, we get a
> dump with some data possibly missing.
> 
> > I understand you could get a bug report from other people, and have to
> > fix it as an assignee. And this fix is located in aperture_64.c only,
> > I am fine it's done like this. Maybe you can try the way I suggested
> > that only removing the region from io resource, but not touching anything
> > else, if you have interest.
> > 
> > So if have to, could you add some code comments around your fix to notice
> > people why these code are introduced? Commit log can help to understand
> > added code, while sometime file moving may make this checking very hard.
> 
> Sure, the full patch with comments added is below. Do the
> comments make sense? Do you want me to re-post it as v3?:
> 
> ---
> On machines where the GART aperture is mapped over physical RAM
> /proc/vmcore contains the remapped range and reading it may
> cause hangs or reboots. 
> 
> In the past, the GART region was added into the resource map, implemented by
> commit 56dd669a138c ("[PATCH] Insert GART region into resource map")
> 
> However, inserting the iomem_resource from the early GART code caused
> resource conflicts with some AGP drivers (bko#72201), which got avoided by
> reverting the patch in commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
> region into resource map"). This revert introduced the /proc/vmcore bug.
> 
> The vmcore ELF header is either prepared by the kernel (when using the
> kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
> syscall). Since we no longer have the GART iomem resource, the userspace
> kexec has no way of knowing which region to exclude from the ELF header.
> 
> Changes from v1 of this patch:
> Instead of excluding the aperture from the ELF header, this patch
> makes /proc/vmcore return zeroes in the second kernel when attempting to
> read the aperture region. This is done by reusing the
> gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
> balooned memory. This works for both, the kexec_file_load and kexec_load
> syscalls.
> 
> [Note that the GART region is the same in the first and second kernels:
> regardless whether the first kernel fixed up the northbridge/bios setting
> and mapped the aperture over physical memory, the second kernel finds the
> northbridge properly configured by the first kernel and the aperture
> never overlaps with e820 memory because the second kernel has a fake e820
> map created from the crashkernel memory regions. Thus, the second kernel
> keeps the aperture address/size as configured by the first kernel.]
> 
> register_oldmem_pfn_is_ram can only register one callback and returns an error
> if the callback has been registered already. Since XEN used to be the only user
> of this function, it never checks the return value. Now that we have more than
> one user, I added a WARN_ON just in case agp, XEN, or any other future user of
> register_oldmem_pfn_is_ram were to step on each other's toes.
> 
> 
> Signed-off-by: Jiri Bohac <jbohac@suse.cz>
> Fixes: 707d4eefbdb3 ("Revert [PATCH] Insert GART region into resource map")

I am fine. There's no git branch for kdump since changes related
usually scatter in all components. For this one, you can post a new one
and send to Joerg since he maintains iommu code. And Andrew, he usually
helps to pick kdump kernel changes to his akpm tree.

Thanks
Baoquan

> 
> diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> index f5d92bc3b884..2c4d5ece7456 100644
> --- a/arch/x86/kernel/aperture_64.c
> +++ b/arch/x86/kernel/aperture_64.c
> @@ -30,6 +30,7 @@
>  #include <asm/dma.h>
>  #include <asm/amd_nb.h>
>  #include <asm/x86_init.h>
> +#include <linux/crash_dump.h>
>  
>  /*
>   * Using 512M as goal, in case kexec will load kernel_big
> @@ -56,6 +57,33 @@ int fallback_aper_force __initdata;
>  
>  int fix_aperture __initdata = 1;
>  
> +#ifdef CONFIG_PROC_VMCORE
> +/*
> + * If the first kernel maps the aperture over e820 RAM, the kdump kernel will
> + * use the same range because it will remain configured in the northbridge.
> + * Trying to dump this area via /proc/vmcore may crash the machine, so exclude
> + * it from vmcore.
> + */
> +static unsigned long aperture_pfn_start, aperture_page_count;
> +
> +static int gart_oldmem_pfn_is_ram(unsigned long pfn)
> +{
> +	return likely((pfn < aperture_pfn_start) ||
> +		      (pfn >= aperture_pfn_start + aperture_page_count));
> +}
> +
> +static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
> +{
> +	aperture_pfn_start = aper_base >> PAGE_SHIFT;
> +	aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT;
> +	WARN_ON(register_oldmem_pfn_is_ram(&gart_oldmem_pfn_is_ram));
> +}
> +#else
> +static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
> +{
> +}
> +#endif
> +
>  /* This code runs before the PCI subsystem is initialized, so just
>     access the northbridge directly. */
>  
> @@ -435,8 +463,16 @@ int __init gart_iommu_hole_init(void)
>  
>  out:
>  	if (!fix && !fallback_aper_force) {
> -		if (last_aper_base)
> +		if (last_aper_base) {
> +			/*
> +			 * If this is the kdump kernel, the first kernel
> +			 * may have allocated the range over its e820 RAM
> +			 * and fixed up the northbridge
> +			 */
> +			exclude_from_vmcore(last_aper_base, last_aper_order);
> +
>  			return 1;
> +		}
>  		return 0;
>  	}
>  
> @@ -473,6 +509,14 @@ int __init gart_iommu_hole_init(void)
>  		return 0;
>  	}
>  
> +	/*
> +	 * If this is the kdump kernel _and_ the first kernel did not
> +	 * configure the aperture in the northbridge, this range may
> +	 * overlap with the first kernel's memory. We can't access the
> +	 * range through vmcore even though it should be part of the dump.
> +	 */
> +	exclude_from_vmcore(aper_alloc, aper_order);
> +
>  	/* Fix up the north bridges */
>  	for (i = 0; i < amd_nb_bus_dev_ranges[i].dev_limit; i++) {
>  		int bus, dev_base, dev_limit;
> diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
> index 2cfcfe4f6b2a..dd2ad82eee80 100644
> --- a/arch/x86/xen/mmu_hvm.c
> +++ b/arch/x86/xen/mmu_hvm.c
> @@ -75,6 +75,6 @@ void __init xen_hvm_init_mmu_ops(void)
>  	if (is_pagetable_dying_supported())
>  		pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
>  #ifdef CONFIG_PROC_VMCORE
> -	register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
> +	WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram));
>  #endif
>  }
> 
> -- 
> Jiri Bohac <jbohac@suse.cz>
> SUSE Labs, Prague, Czechia
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

      reply	other threads:[~2018-01-09  6:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-16  0:15 [PATCH v2] x86/kexec: Exclude GART aperture from vmcore Jiri Bohac
2017-12-16  1:01 ` Baoquan He
2017-12-17 21:47   ` Borislav Petkov
2017-12-18 13:47     ` Baoquan He
     [not found]       ` <20171218143753.k7xyq6yiyjisnonh@pd.tnic>
2017-12-19  1:58         ` Baoquan He
2017-12-19 17:58           ` Jiri Bohac
2017-12-27  7:44             ` Baoquan He
     [not found]               ` <20171227122523.GC13909@nazgul.tnic>
2017-12-27 12:44                 ` Baoquan He
2018-01-06  1:00               ` Jiri Bohac
2018-01-09  6:19                 ` Baoquan He [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180109061933.GC1935@localhost.localdomain \
    --to=bhe@redhat.com \
    --cc=airlied@linux.ie \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=dyoung@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jbohac@suse.cz \
    --cc=joro@8bytes.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=toshi.kani@hpe.com \
    --cc=vgoyal@redhat.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).