Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

From: Michal Hocko <mhocko@suse.com>
To: Philipp Rudo <prudo@redhat.com>
Cc: Baoquan He <bhe@redhat.com>, Donald Dutile <ddutile@redhat.com>,
	Jiri Bohac <jbohac@suse.cz>, Pingfan Liu <piliu@redhat.com>,
	Tao Liu <ltao@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
	Dave Young <dyoung@redhat.com>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	David Hildenbrand <dhildenb@redhat.com>
Subject: Re: [PATCH 0/4] kdump: crashkernel reservation from CMA
Date: Wed, 6 Dec 2023 16:19:51 +0100	[thread overview]
Message-ID: <ZXCRF-bvm8ijXxr4@tiehlicka> (raw)
In-Reply-To: <ZXB7_rbC0GAkIp7p@tiehlicka>

On Wed 06-12-23 14:49:51, Michal Hocko wrote:
> On Wed 06-12-23 12:08:05, Philipp Rudo wrote:
[...]
> > If I understand Documentation/core-api/pin_user_pages.rst correctly you
> > missed case 1 Direct IO. In that case "short term" DMA is allowed for
> > pages without FOLL_LONGTERM. Meaning that there is a way you can
> > corrupt the CMA and with that the crash kernel after the production
> > kernel has panicked.
> 
> Could you expand on this? How exactly direct IO request survives across
> into the kdump kernel? I do understand the RMDA case because the IO is
> async and out of control of the receiving end.

OK, I guess I get what you mean. You are worried that there is 
DIO request
program DMA controller to read into CMA memory
<panic>
boot into crash kernel backed by CMA
DMA transfer is done.

DIO doesn't migrate the pinned memory because it is considered a very
quick operation which doesn't block the movability for too long. That is
why I have considered that a non-problem. RDMA on the other might pin
memory for transfer for much longer but that case is handled by
migrating the memory away.

Now I agree that there is a chance of the corruption from DIO. The
question I am not entirely clear about right now is how big of a real
problem that is. DMA transfers should be a very swift operation. Would
it help to wait for a grace period before jumping into the kdump kernel?

> Also if direct IO is a problem how come this is not a problem for kexec
> in general. The new kernel usually shares all the memory with the 1st
> kernel.

This is also more clear now. Pure kexec is shutting down all the devices
which should terminate the in-flight DMA transfers.

-- 
Michal Hocko
SUSE Labs

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec