From: David Hildenbrand <david@redhat.com>
To: Baoquan He <bhe@redhat.com>, Donald Dutile <ddutile@redhat.com>,
Jiri Bohac <jbohac@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>, Dave Young <dyoung@redhat.com>,
kexec@lists.infradead.org, Philipp Rudo <prudo@redhat.com>,
Pingfan Liu <piliu@redhat.com>, Tao Liu <ltao@redhat.com>,
linux-kernel@vger.kernel.org,
David Hildenbrand <dhildenb@redhat.com>,
Michal Hocko <mhocko@suse.cz>
Subject: Re: [PATCH v2 0/5] kdump: crashkernel reservation from CMA
Date: Wed, 28 May 2025 23:01:04 +0200 [thread overview]
Message-ID: <e9c5c247-85fb-43f1-9aa8-47d62321f37b@redhat.com> (raw)
In-Reply-To: <Z8Z/gnbtiXT9QAZr@MiWiFi-R3L-srv>
On 04.03.25 05:20, Baoquan He wrote:
> On 03/03/25 at 09:17am, Donald Dutile wrote:
>>
>>
>> On 3/3/25 3:25 AM, David Hildenbrand wrote:
>>> On 20.02.25 17:48, Jiri Bohac wrote:
>>>> Hi,
>>>>
>>>> this series implements a way to reserve additional crash kernel
>>>> memory using CMA.
>>>>
>>>> Link to the v1 discussion:
>>>> https://lore.kernel.org/lkml/ZWD_fAPqEWkFlEkM@dwarf.suse.cz/
>>>> See below for the changes since v1 and how concerns from the
>>>> discussion have been addressed.
>>>>
>>>> Currently, all the memory for the crash kernel is not usable by
>>>> the 1st (production) kernel. It is also unmapped so that it can't
>>>> be corrupted by the fault that will eventually trigger the crash.
>>>> This makes sense for the memory actually used by the kexec-loaded
>>>> crash kernel image and initrd and the data prepared during the
>>>> load (vmcoreinfo, ...). However, the reserved space needs to be
>>>> much larger than that to provide enough run-time memory for the
>>>> crash kernel and the kdump userspace. Estimating the amount of
>>>> memory to reserve is difficult. Being too careful makes kdump
>>>> likely to end in OOM, being too generous takes even more memory
>>>> from the production system. Also, the reservation only allows
>>>> reserving a single contiguous block (or two with the "low"
>>>> suffix). I've seen systems where this fails because the physical
>>>> memory is fragmented.
>>>>
>>>> By reserving additional crashkernel memory from CMA, the main
>>>> crashkernel reservation can be just large enough to fit the
>>>> kernel and initrd image, minimizing the memory taken away from
>>>> the production system. Most of the run-time memory for the crash
>>>> kernel will be memory previously available to userspace in the
>>>> production system. As this memory is no longer wasted, the
>>>> reservation can be done with a generous margin, making kdump more
>>>> reliable. Kernel memory that we need to preserve for dumping is
>>>> never allocated from CMA. User data is typically not dumped by
>>>> makedumpfile. When dumping of user data is intended this new CMA
>>>> reservation cannot be used.
>>>
>>>
>>> Hi,
>>>
>>> I'll note that your comment about "user space" is currently the case, but will likely not hold in the long run. The assumption you are making is that only user-space memory will be allocated from MIGRATE_CMA, which is not necessarily the case. Any movable allocation will end up in there.
>>>
>>> Besides LRU folios (user space memory and the pagecache), we already support migration of some kernel allocations using the non-lru migration framework. Such allocations (which use __GFP_MOVABLE, see __SetPageMovable()) currently only include
>>> * memory balloon: pages we never want to dump either way
>>> * zsmalloc (->zpool): only used by zswap (-> compressed LRU pages)
>>> * z3fold (->zpool): only used by zswap (-> compressed LRU pages)
>>>
>>> Just imagine if we support migration of other kernel allocations, such as user page tables. The dump would be missing important information.
>>>
>> IOMMUFD is a near-term candidate for user page tables with multi-stage iommu support with going through upstream review atm.
>> Just saying, that David's case will be a norm in high-end VMs with performance-enhanced guest-driven iommu support (for GPUs).
>
> Thank both for valuable inputs, David and Don. I agree that we may argue
> not every system have ballon or enabling swap for now, while future
> extending of migration on other kernel allocation could become obstacle
> we can't detour.
>
> If we have known for sure this feature could be a bad code, we may need
> to stop it in advance.
Sorry for the late reply.
I think we just have to be careful to document it properly -- especially
the shortcomings and that this feature might become a problem in the
future. Movable user-space page tables getting placed on CMA memory
would probably not be a problem if we don't care about ... user-space
data either way.
The whole "Direct I/O takes max 1s" part is a bit shaky. Maybe it could
be configurable how long to wait? 10s is certainly "safer".
But maybe, in the target use case: VMs, direct I/O will not be that common.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-05-28 21:01 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-20 16:48 [PATCH v2 0/5] kdump: crashkernel reservation from CMA Jiri Bohac
2025-02-20 16:51 ` [PATCH v2 1/5] Add a new optional ",cma" suffix to the crashkernel= command line option Jiri Bohac
2025-03-03 1:51 ` Baoquan He
2025-02-20 16:52 ` [PATCH v2 2/5] kdump: implement reserve_crashkernel_cma Jiri Bohac
2025-02-20 16:54 ` [PATCH v2 3/5] kdump, documentation: describe craskernel CMA reservation Jiri Bohac
2025-03-03 1:54 ` Baoquan He
2025-02-20 16:55 ` [PATCH v2 4/5] kdump: wait for DMA to finish when using CMA Jiri Bohac
2025-03-03 2:02 ` Baoquan He
2025-03-11 12:00 ` Jiri Bohac
2025-02-20 16:57 ` [PATCH v2 5/5] x86: implement crashkernel cma reservation Jiri Bohac
2025-03-03 2:08 ` [PATCH v2 0/5] kdump: crashkernel reservation from CMA Baoquan He
2025-03-03 8:25 ` David Hildenbrand
2025-03-03 14:17 ` Donald Dutile
2025-03-04 4:20 ` Baoquan He
2025-05-28 21:01 ` David Hildenbrand [this message]
2025-05-29 7:46 ` Michal Hocko
2025-05-29 9:19 ` Michal Hocko
2025-05-30 8:06 ` David Hildenbrand
2025-05-30 8:28 ` Michal Hocko
2025-05-30 8:39 ` David Hildenbrand
2025-05-30 9:07 ` Michal Hocko
2025-05-30 9:11 ` David Hildenbrand
2025-05-30 9:26 ` Michal Hocko
2025-05-30 9:28 ` David Hildenbrand
2025-05-30 9:34 ` Jiri Bohac
2025-05-30 9:47 ` David Hildenbrand
2025-05-30 9:54 ` Michal Hocko
2025-05-30 10:06 ` Jiri Bohac
2025-05-29 16:22 ` Jiri Bohac
2025-03-12 15:36 ` Jiri Bohac
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e9c5c247-85fb-43f1-9aa8-47d62321f37b@redhat.com \
--to=david@redhat.com \
--cc=bhe@redhat.com \
--cc=ddutile@redhat.com \
--cc=dhildenb@redhat.com \
--cc=dyoung@redhat.com \
--cc=jbohac@suse.cz \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ltao@redhat.com \
--cc=mhocko@suse.cz \
--cc=piliu@redhat.com \
--cc=prudo@redhat.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox