From: Jiri Bohac <jbohac@suse.cz>
To: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>, Pingfan Liu <piliu@redhat.com>,
Tao Liu <ltao@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
Dave Young <dyoung@redhat.com>,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] kdump: crashkernel reservation from CMA
Date: Fri, 1 Dec 2023 13:35:41 +0100 [thread overview]
Message-ID: <ZWnTHWXNWg06pLMg@dwarf.suse.cz> (raw)
In-Reply-To: <ZWgJIAYwfVvA+r8h@MiWiFi-R3L-srv>
On Thu, Nov 30, 2023 at 12:01:36PM +0800, Baoquan He wrote:
> On 11/29/23 at 11:51am, Jiri Bohac wrote:
> > We get a lot of problems reported by partners testing kdump on
> > their setups prior to release. But even if we tune the reserved
> > size up, OOM is still the most common reason for kdump to fail
> > when the product starts getting used in real life. It's been
> > pretty frustrating for a long time.
>
> I remember SUSE engineers ever told you will boot kernel and do an
> estimation of kdump kernel usage, then set the crashkernel according to
> the estimation. OOM will be triggered even that way is taken? Just
> curious, not questioning the benefit of using ,cma to save memory.
Yes, we do that during the kdump package build. We use this to
find some baseline for memory requirements of the kdump kernel
and tools on that specific product. Using these numbers we
estimate the requirements on the system where kdump is
configured by adding extra memory for the size of RAM, number of
SCSI devices, etc. But apparently we get this wrong in too many cases,
because the actual hardware differs too much from the virtual
environment which we used to get the baseline numbers. We've been
adding silly constants to the calculations and we still get OOMs on
one hand and people hesitant to sacrifice the calculated amount
of memory on the other.
The result is that kdump basically cannot be trusted unless the
user verifies that the sacrificed memory is still enough after
every major upgrade.
This is the main motivation behind the CMA idea: to safely give
kdump enough memory, including a safe margin, without sacrificing
too much memory.
> > I feel the exact opposite about VMs. Reserving hundreds of MB for
> > crash kernel on _every_ VM on a busy VM host wastes the most
> > memory. VMs are often tuned to well defined task and can be set
> > up with very little memory, so the ~256 MB can be a huge part of
> > that. And while it's theoretically better to dump from the
> > hypervisor, users still often prefer kdump because the hypervisor
> > may not be under their control. Also, in a VM it should be much
> > easier to be sure the machine is safe WRT the potential DMA
> > corruption as it has less HW drivers. So I actually thought the
> > CMA reservation could be most useful on VMs.
>
> Hmm, we ever discussed this in upstream with David Hildend who works in
> virt team. VMs problem is much easier to solve if they complain the
> default crashkernel value is wasteful. The shrinking interface is for
> them. The crashkernel value can't be enlarged, but shrinking existing
> crashkernel memory is functioning smoothly well. They can adjust that in
> script in a very simple way.
The shrinking does not solve this problem at all. It solves a
different problem: the virtual hardware configuration can easily
vary between boots and so will the crashkernel size requirements.
And since crashkernel needs to be passed on the commandline, once
the system is booted it's impossible to change it without a
reboot. Here the shrinking mechanism comes in handy
- we reserve enough for all configurations on the command line and
during boot the requirements for the currently booted
configuration can be determined and the reservation shrunk to
the determined value. But determining this value is the same
unsolved problem as above and CMA could help in exactly the same
way.
> Anyway, let's discuss and figure out any risk of ,cma. If finally all
> worries and concerns are proved unnecessary, then let's have a new great
> feature. But we can't afford the risk if the ,cma area could be entangled
> with 1st kernel's on-going action. As we know, not like kexec reboot, we
> only shutdown CPUs, interrupt, most of devices are alive. And many of
> them could be not reset and initialized in kdump kernel if the relevant
> driver is not added in.
Well since my patchset makes the use of ,cma completely optional
and has _absolutely_ _no_ _effect_ on users that don't opt to use
it, I think you're not taking any risk at all. We will never know
how much DMA is a problem in practice unless we give users or
distros a way to try and come up with good ways of determining if
it's safe on whichever specific system based on the hardware,
drivers, etc.
I've successfully tested the patches on a few systems, physical
and virtual. Of course this is not proof that the DMA problem
does not exist but shows that it may be a solution that mostly
works. If nothing else, for systems where sacrificing ~400 MB of
memory is something that prevents the user from having any dump
at all, having a dump that mostly works with a sacrifice of ~100
MB may be useful.
Thanks,
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2023-12-01 12:35 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-24 19:54 [PATCH 0/4] kdump: crashkernel reservation from CMA Jiri Bohac
2023-11-24 19:57 ` [PATCH 1/4] kdump: add crashkernel cma suffix Jiri Bohac
2023-11-25 7:24 ` kernel test robot
2023-11-24 19:58 ` [PATCH 2/4] kdump: implement reserve_crashkernel_cma Jiri Bohac
2023-11-24 19:58 ` [PATCH 3/4] kdump, x86: implement crashkernel CMA reservation Jiri Bohac
2023-11-24 19:58 ` [PATCH 4/4] kdump, documentation: describe craskernel " Jiri Bohac
2023-11-25 1:51 ` [PATCH 0/4] kdump: crashkernel reservation from CMA Tao Liu
2023-11-25 21:22 ` Jiri Bohac
2023-11-28 1:12 ` Tao Liu
2023-11-28 2:11 ` Baoquan He
2023-11-28 9:08 ` Michal Hocko
2023-11-29 7:57 ` Baoquan He
2023-11-29 9:25 ` Michal Hocko
2023-11-30 2:42 ` Baoquan He
2023-11-29 10:51 ` Jiri Bohac
2023-11-30 4:01 ` Baoquan He
2023-12-01 12:35 ` Jiri Bohac [this message]
2023-11-29 8:10 ` Baoquan He
2023-11-29 15:03 ` Donald Dutile
2023-11-30 3:00 ` Baoquan He
2023-11-30 10:16 ` Michal Hocko
2023-11-30 12:04 ` Baoquan He
2023-11-30 12:31 ` Baoquan He
2023-11-30 13:41 ` Michal Hocko
2023-12-01 11:33 ` Philipp Rudo
2023-12-01 11:55 ` Michal Hocko
2023-12-01 15:51 ` Philipp Rudo
2023-12-01 16:59 ` Michal Hocko
2023-12-06 11:08 ` Philipp Rudo
2023-12-06 11:23 ` David Hildenbrand
2023-12-06 13:49 ` Michal Hocko
2023-12-06 15:19 ` Michal Hocko
2023-12-07 4:23 ` Baoquan He
2023-12-07 8:55 ` Michal Hocko
2023-12-07 11:13 ` Philipp Rudo
2023-12-07 11:52 ` Michal Hocko
2023-12-08 1:55 ` Baoquan He
2023-12-08 10:04 ` Michal Hocko
2023-12-08 2:10 ` Baoquan He
2023-12-07 11:13 ` Philipp Rudo
2023-11-30 13:29 ` Michal Hocko
2023-11-30 13:33 ` Pingfan Liu
2023-11-30 13:43 ` Michal Hocko
2023-12-01 0:54 ` Pingfan Liu
2023-12-01 10:37 ` Michal Hocko
2023-11-28 2:07 ` Pingfan Liu
2023-11-28 8:58 ` Michal Hocko
2023-12-01 11:34 ` Philipp Rudo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZWnTHWXNWg06pLMg@dwarf.suse.cz \
--to=jbohac@suse.cz \
--cc=bhe@redhat.com \
--cc=dyoung@redhat.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ltao@redhat.com \
--cc=mhocko@suse.com \
--cc=piliu@redhat.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox