devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>,
	robh+dt@kernel.org, hch@lst.de, ardb@kernel.org,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
	lorenzo.pieralisi@arm.com, will@kernel.org,
	jeremy.linton@arm.com, iommu@lists.linux-foundation.org,
	linux-rpi-kernel@lists.infradead.org, guohanjun@huawei.com,
	robin.murphy@arm.com, linux-arm-kernel@lists.infradead.org,
	Chen Zhou <chenzhou10@huawei.com>
Subject: Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()
Date: Thu, 12 Nov 2020 16:56:38 +0100	[thread overview]
Message-ID: <b5336064145a30aadcfdb8920226a8c63f692695.camel@suse.de> (raw)
In-Reply-To: <X6rZRvWyigCJxAVW@trantor>

[-- Attachment #1: Type: text/plain, Size: 5078 bytes --]

Hi Catalin,

On Tue, 2020-11-10 at 18:17 +0000, Catalin Marinas wrote:
> On Fri, Nov 06, 2020 at 07:46:29PM +0100, Nicolas Saenz Julienne wrote:
> > On Thu, 2020-11-05 at 16:11 +0000, James Morse wrote:
> > > On 03/11/2020 17:31, Nicolas Saenz Julienne wrote:
> > > > crashkernel might reserve memory located in ZONE_DMA. We plan to delay
> > > > ZONE_DMA's initialization after unflattening the devicetree and ACPI's
> > > > boot table initialization, so move it later in the boot process.
> > > > Specifically into mem_init(), this is the last place crashkernel will be
> > > > able to reserve the memory before the page allocator kicks in.
> > > > There
> > > > isn't any apparent reason for doing this earlier.
> > > 
> > > It's so that map_mem() can carve it out of the linear/direct map.
> > > This is so that stray writes from a crashing kernel can't accidentally corrupt the kdump
> > > kernel. We depend on this if we continue with kdump, but failed to offline all the other
> > > CPUs.
> > 
> > I presume here you refer to arch_kexec_protect_crashkres(), IIUC this will only
> > happen further down the line, after having loaded the kdump kernel image. But
> > it also depends on the mappings to be PAGE sized (flags == NO_BLOCK_MAPPINGS |
> > NO_CONT_MAPPINGS).
> 
> IIUC, arch_kexec_protect_crashkres() is only for the crashkernel image,
> not the whole reserved memory that the crashkernel will use. For the
> latter, we avoid the linear map by marking it as nomap in map_mem().

I'm not sure we're on the same page here, so sorry if this was already implied.

The crashkernel memory mapping is bypassed while preparing the linear mappings
but it is then mapped right away, with page granularity and !MTE.
See paging_init()->map_mem():

	/*
	 * Use page-level mappings here so that we can shrink the region
	 * in page granularity and put back unused memory to buddy system
	 * through /sys/kernel/kexec_crash_size interface.
	 */
	if (crashk_res.end) {
		__map_memblock(pgdp, crashk_res.start, crashk_res.end + 1,
			       PAGE_KERNEL,
			       NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
		memblock_clear_nomap(crashk_res.start,
				     resource_size(&crashk_res));
	}

IIUC the inconvenience here is that we need special mapping options for
crashkernel and updating those after having mapped that memory as regular
memory isn't possible/easy to do.

> > > We also depend on this when skipping the checksum code in purgatory, which can be
> > > exceedingly slow.
> > 
> > This one I don't fully understand, so I'll lazily assume the prerequisite is
> > the same WRT how memory is mapped. :)
> > 
> > Ultimately there's also /sys/kernel/kexec_crash_size's handling. Same
> > prerequisite.
> > 
> > Keeping in mind acpi_table_upgrade() and unflatten_device_tree() depend on
> > having the linear mappings available.
> 
> So it looks like reserve_crashkernel() wants to reserve memory before
> setting up the linear map with the information about the DMA zones in
> place but that comes later when we can parse the firmware tables.
> 
> I wonder, instead of not mapping the crashkernel reservation, can we not
> do an arch_kexec_protect_crashkres() for the whole reservation after we
> created the linear map?

arch_kexec_protect_crashkres() depends on __change_memory_common() which
ultimately depends on the memory to be mapped with PAGE_SIZE pages. As I
comment above, the trick would work as long as there is as way to update the
linear mappings with whatever crashkernel needs later in the boot process.

> > Let me stress that knowing the DMA constraints in the system before reserving
> > crashkernel's regions is necessary if we ever want it to work seamlessly on all
> > platforms. Be it small stuff like the Raspberry Pi or huge servers with TB of
> > memory.
> 
> Indeed. So we have 3 options (so far):
> 
> 1. Allow the crashkernel reservation to go into the linear map but set
>    it to invalid once allocated.
> 
> 2. Parse the flattened DT (not sure what we do with ACPI) before
>    creating the linear map. We may have to rely on some SoC ID here
>    instead of actual DMA ranges.
> 
> 3. Assume the smallest ZONE_DMA possible on arm64 (1GB) for crashkernel
>    reservations and not rely on arm64_dma_phys_limit in
>    reserve_crashkernel().
> 
> I think (2) we tried hard to avoid. Option (3) brings us back to the
> issues we had on large crashkernel reservations regressing on some
> platforms (though it's been a while since, they mostly went quiet ;)).
> However, with Chen's crashkernel patches we end up with two
> reservations, one in the low DMA zone and one higher, potentially above
> 4GB. Having a fixed 1GB limit wouldn't be any worse for crashkernel
> reservations than what we have now.
> 
> If (1) works, I'd go for it (James knows this part better than me),
> otherwise we can go for (3).

Overall, I'd prefer (1) as well, and I'd be happy to have a got at it. If not
I'll append (3) in this series.

Regards,
Nicolas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-11-12 15:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-03 17:31 [PATCH v6 0/7] arm64: Default to 32-bit wide ZONE_DMA Nicolas Saenz Julienne
2020-11-03 17:31 ` [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init() Nicolas Saenz Julienne
2020-11-05 16:11   ` James Morse
2020-11-06 18:46     ` Nicolas Saenz Julienne
2020-11-10 18:17       ` Catalin Marinas
2020-11-12 15:56         ` Nicolas Saenz Julienne [this message]
2020-11-13 11:29           ` Catalin Marinas
2020-11-19 14:09             ` Nicolas Saenz Julienne
2020-11-19 17:10               ` Catalin Marinas
2020-11-19 17:25                 ` Catalin Marinas
2020-11-19 17:25                 ` Nicolas Saenz Julienne
2020-11-19 17:45                   ` Catalin Marinas
2020-11-19 18:18       ` James Morse
2020-11-03 17:31 ` [PATCH v6 2/7] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init() Nicolas Saenz Julienne
2020-11-03 17:31 ` [PATCH v6 3/7] of/address: Introduce of_dma_get_max_cpu_address() Nicolas Saenz Julienne
2020-11-03 17:31 ` [PATCH v6 4/7] of: unittest: Add test for of_dma_get_max_cpu_address() Nicolas Saenz Julienne
2020-11-03 17:31 ` [PATCH v6 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges Nicolas Saenz Julienne
2020-11-03 17:31 ` [PATCH v6 6/7] arm64: mm: Set ZONE_DMA size based on early IORT scan Nicolas Saenz Julienne
2020-11-03 17:31 ` [PATCH v6 7/7] mm: Remove examples from enum zone_type comment Nicolas Saenz Julienne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b5336064145a30aadcfdb8920226a8c63f692695.camel@suse.de \
    --to=nsaenzjulienne@suse.de \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chenzhou10@huawei.com \
    --cc=devicetree@vger.kernel.org \
    --cc=guohanjun@huawei.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=james.morse@arm.com \
    --cc=jeremy.linton@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rpi-kernel@lists.infradead.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).