Kexec Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Baoquan He <bhe@redhat.com>, Donald Dutile <ddutile@redhat.com>,
	Jiri Bohac <jbohac@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>, Dave Young <dyoung@redhat.com>,
	kexec@lists.infradead.org, Philipp Rudo <prudo@redhat.com>,
	Pingfan Liu <piliu@redhat.com>, Tao Liu <ltao@redhat.com>,
	linux-kernel@vger.kernel.org,
	David Hildenbrand <dhildenb@redhat.com>,
	Michal Hocko <mhocko@suse.cz>
Subject: Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA
Date: Wed, 28 May 2025 23:01:04 +0200	[thread overview]
Message-ID: <e9c5c247-85fb-43f1-9aa8-47d62321f37b@redhat.com> (raw)
In-Reply-To: <Z8Z/gnbtiXT9QAZr@MiWiFi-R3L-srv>

On 04.03.25 05:20, Baoquan He wrote:
> On 03/03/25 at 09:17am, Donald Dutile wrote:
>>
>>
>> On 3/3/25 3:25 AM, David Hildenbrand wrote:
>>> On 20.02.25 17:48, Jiri Bohac wrote:
>>>> Hi,
>>>>
>>>> this series implements a way to reserve additional crash kernel
>>>> memory using CMA.
>>>>
>>>> Link to the v1 discussion:
>>>> https://lore.kernel.org/lkml/ZWD_fAPqEWkFlEkM@dwarf.suse.cz/
>>>> See below for the changes since v1 and how concerns from the
>>>> discussion have been addressed.
>>>>
>>>> Currently, all the memory for the crash kernel is not usable by
>>>> the 1st (production) kernel. It is also unmapped so that it can't
>>>> be corrupted by the fault that will eventually trigger the crash.
>>>> This makes sense for the memory actually used by the kexec-loaded
>>>> crash kernel image and initrd and the data prepared during the
>>>> load (vmcoreinfo, ...). However, the reserved space needs to be
>>>> much larger than that to provide enough run-time memory for the
>>>> crash kernel and the kdump userspace. Estimating the amount of
>>>> memory to reserve is difficult. Being too careful makes kdump
>>>> likely to end in OOM, being too generous takes even more memory
>>>> from the production system. Also, the reservation only allows
>>>> reserving a single contiguous block (or two with the "low"
>>>> suffix). I've seen systems where this fails because the physical
>>>> memory is fragmented.
>>>>
>>>> By reserving additional crashkernel memory from CMA, the main
>>>> crashkernel reservation can be just large enough to fit the
>>>> kernel and initrd image, minimizing the memory taken away from
>>>> the production system. Most of the run-time memory for the crash
>>>> kernel will be memory previously available to userspace in the
>>>> production system. As this memory is no longer wasted, the
>>>> reservation can be done with a generous margin, making kdump more
>>>> reliable. Kernel memory that we need to preserve for dumping is
>>>> never allocated from CMA. User data is typically not dumped by
>>>> makedumpfile. When dumping of user data is intended this new CMA
>>>> reservation cannot be used.
>>>
>>>
>>> Hi,
>>>
>>> I'll note that your comment about "user space" is currently the case, but will likely not hold in the long run. The assumption you are making is that only user-space memory will be allocated from MIGRATE_CMA, which is not necessarily the case. Any movable allocation will end up in there.
>>>
>>> Besides LRU folios (user space memory and the pagecache), we already support migration of some kernel allocations using the non-lru migration framework. Such allocations (which use __GFP_MOVABLE, see __SetPageMovable()) currently only include
>>> * memory balloon: pages we never want to dump either way
>>> * zsmalloc (->zpool): only used by zswap (-> compressed LRU pages)
>>> * z3fold (->zpool): only used by zswap (-> compressed LRU pages)
>>>
>>> Just imagine if we support migration of other kernel allocations, such as user page tables. The dump would be missing important information.
>>>
>> IOMMUFD is a near-term candidate for user page tables with multi-stage iommu support with going through upstream review atm.
>> Just saying, that David's case will be a norm in high-end VMs with performance-enhanced guest-driven iommu support (for GPUs).
> 
> Thank both for valuable inputs, David and Don. I agree that we may argue
> not every system have ballon or enabling swap for now, while future
> extending of migration on other kernel allocation could become obstacle
> we can't detour.
> 
> If we have known for sure this feature could be a bad code, we may need
> to stop it in advance.

Sorry for the late reply.

I think we just have to be careful to document it properly -- especially 
the shortcomings and that this feature might become a problem in the 
future. Movable user-space page tables getting placed on CMA memory 
would probably not be a problem if we don't care about ... user-space 
data either way.

The whole "Direct I/O takes max 1s" part is a bit shaky. Maybe it could 
be configurable how long to wait? 10s is certainly "safer".

But maybe, in the target use case: VMs, direct I/O will not be that common.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-05-28 21:01 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-20 16:48 [PATCH v2 0/5] kdump: crashkernel reservation from CMA Jiri Bohac
2025-02-20 16:51 ` [PATCH v2 1/5] Add a new optional ",cma" suffix to the crashkernel= command line option Jiri Bohac
2025-03-03  1:51   ` Baoquan He
2025-02-20 16:52 ` [PATCH v2 2/5] kdump: implement reserve_crashkernel_cma Jiri Bohac
2025-02-20 16:54 ` [PATCH v2 3/5] kdump, documentation: describe craskernel CMA reservation Jiri Bohac
2025-03-03  1:54   ` Baoquan He
2025-02-20 16:55 ` [PATCH v2 4/5] kdump: wait for DMA to finish when using CMA Jiri Bohac
2025-03-03  2:02   ` Baoquan He
2025-03-11 12:00     ` Jiri Bohac
2025-02-20 16:57 ` [PATCH v2 5/5] x86: implement crashkernel cma reservation Jiri Bohac
2025-03-03  2:08 ` [PATCH v2 0/5] kdump: crashkernel reservation from CMA Baoquan He
2025-03-03  8:25 ` David Hildenbrand
2025-03-03 14:17   ` Donald Dutile
2025-03-04  4:20     ` Baoquan He
2025-05-28 21:01       ` David Hildenbrand [this message]
2025-05-29  7:46         ` Michal Hocko
2025-05-29  9:19           ` Michal Hocko
2025-05-30  8:06           ` David Hildenbrand
2025-05-30  8:28             ` Michal Hocko
2025-05-30  8:39               ` David Hildenbrand
2025-05-30  9:07                 ` Michal Hocko
2025-05-30  9:11                   ` David Hildenbrand
2025-05-30  9:26                     ` Michal Hocko
2025-05-30  9:28                       ` David Hildenbrand
2025-05-30  9:34                     ` Jiri Bohac
2025-05-30  9:47                       ` David Hildenbrand
2025-05-30  9:54                         ` Michal Hocko
2025-05-30 10:06                         ` Jiri Bohac
2025-05-29 16:22         ` Jiri Bohac
2025-03-12 15:36   ` Jiri Bohac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9c5c247-85fb-43f1-9aa8-47d62321f37b@redhat.com \
    --to=david@redhat.com \
    --cc=bhe@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=dyoung@redhat.com \
    --cc=jbohac@suse.cz \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ltao@redhat.com \
    --cc=mhocko@suse.cz \
    --cc=piliu@redhat.com \
    --cc=prudo@redhat.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox