Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Samiullah Khawaja <skhawaja@google.com>
To: Pranjal Shrivastava <praan@google.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>,
	 Will Deacon <will@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
	 Pasha Tatashin <pasha.tatashin@soleen.com>,
	Mike Rapoport <rppt@kernel.org>,
	 Pratyush Yadav <pratyush@kernel.org>,
	Alexander Graf <graf@amazon.com>,
	 Robin Murphy <robin.murphy@arm.com>,
	Kevin Tian <kevin.tian@intel.com>,
	iommu@lists.linux.dev,  kexec@lists.infradead.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 David Matlack <dmatlack@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Vipin Sharma <vipinsh@google.com>
Subject: Re: [RFC PATCH 3/4] dma-direct: Add API to preserve/restore allocations
Date: Wed, 24 Jun 2026 19:00:15 +0000	[thread overview]
Message-ID: <ajwmXYOW080zVRLw@google.com> (raw)
In-Reply-To: <aiceKYmc323G7tBs@google.com>

On Mon, Jun 08, 2026 at 07:55:21PM +0000, Pranjal Shrivastava wrote:
>On Tue, May 05, 2026 at 12:27:36AM +0000, Samiullah Khawaja wrote:
>> Add an API to preserve/restore the DMA direct allocation for liveupdate.
>> The underlying memory is preserved/restored using KHO. During restore
>> the memory is setup based on the device configuration, gfp flags and
>> allocation attributes. Once restored, the driver can use the usual
>> dma_free* API to deallocate the restored DMA allocation.
>>
>> This API will be used to add support in dma_alloc* APIs to
>> preseve/restore the DMA allocations.
>>
>> Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
>> ---
>>  include/linux/dma-direct.h |  29 +++++++
>>  kernel/dma/Kconfig         |   3 +
>>  kernel/dma/direct.c        | 163 +++++++++++++++++++++++++++++++++++++
>>  3 files changed, 195 insertions(+)
>>
>
>[...]
>
>> diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
>> index bfef21b4a9ae..d92852942c6c 100644
>> --- a/kernel/dma/Kconfig
>> +++ b/kernel/dma/Kconfig
>> @@ -265,3 +265,6 @@ config DMA_MAP_BENCHMARK
>>  	  performance of dma_(un)map_page.
>>
>>  	  See tools/testing/selftests/dma/dma_map_benchmark.c
>> +
>> +config DMA_LIVEUPDATE
>> +	bool "Enable preservation of DMA direct allocations"
>
>Nit: depends on LIVEUPDATE?

Agreed I will add this.
>
>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>> index ec887f443741..c2b98f91900a 100644
>> --- a/kernel/dma/direct.c
>> +++ b/kernel/dma/direct.c
>> @@ -6,6 +6,8 @@
>>   */
>>  #include <linux/memblock.h> /* for max_pfn */
>>  #include <linux/export.h>
>> +#include <linux/kexec_handover.h>
>> +#include <linux/kho/abi/dma_alloc.h>
>>  #include <linux/mm.h>
>>  #include <linux/dma-map-ops.h>
>>  #include <linux/scatterlist.h>
>> @@ -307,6 +309,167 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>>  	return NULL;
>>  }
>>
>> +#ifdef CONFIG_DMA_LIVEUPDATE
>> +int dma_direct_preserve_allocation(struct device *dev, void *cpu_addr,
>> +				   size_t size, dma_addr_t dma_handle,
>> +				   unsigned long attrs, u64 *state)
>> +{
>> +	struct dma_alloc_ser *ser;
>> +	int ret;
>> +
>> +	if (!kho_is_enabled())
>> +		return -EOPNOTSUPP;
>> +
>> +	if (IS_ENABLED(CONFIG_DMA_CMA))
>> +		return -EOPNOTSUPP;
>> +
>> +	if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
>> +	    !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev))
>> +		return -EOPNOTSUPP;
>> +
>> +	if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_ALLOC) &&
>> +	    !dev_is_dma_coherent(dev) &&
>> +	    !is_swiotlb_for_alloc(dev))
>> +		return -EOPNOTSUPP;
>> +
>> +	if (IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) &&
>> +	    !dev_is_dma_coherent(dev))
>> +		return -EOPNOTSUPP;
>> +
>> +	if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
>> +	    dma_is_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
>> +		return -EOPNOTSUPP;
>> +
>> +	ser = kho_alloc_preserve(sizeof(*ser));
>> +	if (IS_ERR(ser))
>> +		return PTR_ERR(ser);
>> +
>> +	ser->page_phys = dma_to_phys(dev, dma_handle);
>> +	ser->force_decrypted = force_dma_unencrypted(dev);
>> +	ser->size = size;
>> +
>> +	ret = kho_preserve_pages(phys_to_page(ser->page_phys),
>> +				 size >> PAGE_SHIFT);
>
>Should this be `PAGE_ALIGN(size) >> PAGE_SHIFT` OR
>`DIV_ROUND_UP(size, PAGE_SIZE)`?
>
>Otherwise, if size is small, say, size == 64-bytes, we preserve 0 pages?
>
>Also, IIRC, even with PAGE_ALIGN, preserving just the requested pgcount
>is not enough because buddy allocator allocates in order-N.
>
>For e.g. if a driver requests 20KB (5 pages), the buddy allocator
>fulfills it with an order-3 block (8 pages).
>
>Now, if we only tell KHO to preserve 5 pages, the remaining 3 pages are
>free in the new kernel. When the driver eventually tears down and calls
>dma_free_coherent(), dma_direct_free() will call
>__free_pages(page, get_order(size)), which will attempt to free all 8
>pages, causing a double-free panic on the 3 unpreserved pages?
>
>Should we be preserving exactly 1 << get_order(size) pages as per buddy?
>Same applies to unpreserve, and restore.

Agreed. I will update this and also make sure it is covered in the kunit
tests.
>
>> +	if (ret) {
>> +		kho_unpreserve_free(ser);
>> +		return ret;
>> +	}
>> +
>> +	*state = virt_to_phys(ser);
>> +	return 0;
>> +}
>> +
>> +void dma_direct_unpreserve_allocation(struct device *dev, u64 state)
>> +{
>> +	struct dma_alloc_ser *ser;
>> +
>> +	if (!kho_is_enabled())
>> +		return;
>> +
>> +	ser = phys_to_virt(state);
>> +	kho_unpreserve_pages(phys_to_page(ser->page_phys),
>> +			     ser->size >> PAGE_SHIFT);
>> +	kho_unpreserve_free(ser);
>> +}
>> +
>> +void *dma_direct_restore_allocation(struct device *dev, size_t size,
>> +				    dma_addr_t *dma_handle, gfp_t gfp,
>> +				    unsigned long attrs, u64 state)
>
>Are we relying on the caller to pass same attrs? So, a buffer with
>non-coherent attrs can be mapped with coherent attrs in the new kernel.
>Could this cause side-effects? Should we check for such driver bugs with
>a WARN here while comparing older attrs with the newer ones too?
>
>Coherency breaking due to subtle driver bugs is very painful to debug :/

Hmm... this is interesting. The dma_alloc API relies on the caller to
have consistent attrs accross allocation and free. But when updating
kernel where driver could have been updated, we have to be careful.

Agreed.. I will handle this properly by making sure that the new attr is
compatible with the preserved attr.
>
>> +{
>> +	bool remap = false, set_uncached = false;
>> +	struct dma_alloc_ser *ser = NULL;
>> +	struct page *page;
>> +	void *cpu_addr;
>> +
>> +	if (!kho_is_enabled())
>> +		return NULL;
>> +
>> +	ser = phys_to_virt(state);
>> +	page = phys_to_page(ser->page_phys);
>
>[...]
>
>> +
>> +	/*
>> +	 * Remapping will be blocking so return error. The preserved memory
>> +	 * might be already decrypted in the previous kernel, but the decryption
>> +	 * call is not guaranteed to be non-blocking so return error always if
>> +	 * decryption is required.
>> +	 */
>> +	if ((remap || force_dma_unencrypted(dev)) &&
>> +	    dma_direct_use_pool(dev, gfp))
>> +		return NULL;
>> +
>> +	/*
>> +	 * Encryption scheme changed between two kernels and this might cause
>> +	 * issues if device/driver is not handling it properly.
>> +	 */
>> +	WARN_ON_ONCE(ser->force_decrypted != force_dma_unencrypted(dev));
>> +
>> +	/*
>> +	 * arch_dma_prep_coherent() should make sure that any cache lines from
>> +	 * the previous kernel, if the device was coherent previously or cached
>> +	 * mapping in this kernel during init are not problamatic for
>> +	 * non-coherent allocations.
>> +	 */
>> +	if (remap) {
>> +		pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
>> +
>> +		if (force_dma_unencrypted(dev))
>> +			prot = pgprot_decrypted(prot);
>> +
>> +		arch_dma_prep_coherent(page, size);
>> +
>> +		cpu_addr = dma_common_contiguous_remap(page, size, prot,
>> +						       __builtin_return_address(0));
>> +		if (!cpu_addr)
>> +			return NULL;
>
>Should we be kho_restore_free-ing on all these error paths?
>We only seem to be kho_restore_free-ing on the success path.
>Same for kho_restore_pages.. if we return an error here, we don't
>restore the preserved pages? Are we leaking those too?

This is purposefully leaking the memory here. This is because during
liveupdate, this device could be using this memory and freeing it means
that this might cause a memory corruption that would pretty difficult to
debug.
>
>> +	} else {
>> +		cpu_addr = page_address(page);
>> +		if (dma_set_decrypted(dev, cpu_addr, size))
>> +			return NULL;
>> +	}
>> +
>> +	if (set_uncached) {
>> +		arch_dma_prep_coherent(page, size);
>> +		cpu_addr = arch_dma_set_uncached(cpu_addr, size);
>> +		if (IS_ERR(cpu_addr))
>> +			return NULL;
>> +	}
>> +
>> +	*dma_handle = phys_to_dma_direct(dev, ser->page_phys);
>> +
>> +	/*
>> +	 * Cannot free the restored pages on error here as these might be in use
>> +	 * by a device with direct allocation in the previous kernel.
>> +	 */


Check this comment that explains the logic behind not freeing. I think I
will move it up.
>> +	WARN_ON(!kho_restore_pages(ser->page_phys,
>> +				   ser->size >> PAGE_SHIFT));
>> +	kho_restore_free(ser);
>> +	return cpu_addr;
>> +}
>> +#endif
>> +
>>  void dma_direct_free(struct device *dev, size_t size,
>>  		void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs)
>>  {
>
>Thanks,
>Praan

Thanks,
Sami


  reply	other threads:[~2026-06-24 19:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05  0:27 [RFC PATCH 0/4] dma-mapping: Add preservation of direct allocations Samiullah Khawaja
2026-05-05  0:27 ` [RFC PATCH 1/4] dma: Add DMA allocation preservation KHO ABI Samiullah Khawaja
2026-06-08 17:53   ` Pranjal Shrivastava
2026-06-24 19:46     ` Samiullah Khawaja
2026-05-05  0:27 ` [RFC PATCH 2/4] dma/pool: Add an API to check if DMA allocation is from pool Samiullah Khawaja
2026-06-08 18:10   ` Pranjal Shrivastava
2026-06-24 19:10     ` Samiullah Khawaja
2026-05-05  0:27 ` [RFC PATCH 3/4] dma-direct: Add API to preserve/restore allocations Samiullah Khawaja
2026-06-01 12:35   ` Will Deacon
2026-06-02 20:14     ` Samiullah Khawaja
2026-06-08 19:55   ` Pranjal Shrivastava
2026-06-24 19:00     ` Samiullah Khawaja [this message]
2026-05-05  0:27 ` [RFC PATCH 4/4] dma-mapping: Add API to preserve/restore DMA allocation Samiullah Khawaja

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajwmXYOW080zVRLw@google.com \
    --to=skhawaja@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=kevin.tian@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=m.szyprowski@samsung.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=praan@google.com \
    --cc=pratyush@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=rppt@kernel.org \
    --cc=vipinsh@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox