Re: [RFC PATCH 3/4] dma-direct: Add API to preserve/restore allocations

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Pranjal Shrivastava <praan@google.com>
To: Samiullah Khawaja <skhawaja@google.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>,
	Will Deacon <will@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Mike Rapoport <rppt@kernel.org>,
	Pratyush Yadav <pratyush@kernel.org>,
	Alexander Graf <graf@amazon.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Kevin Tian <kevin.tian@intel.com>,
	iommu@lists.linux.dev, kexec@lists.infradead.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	David Matlack <dmatlack@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vipin Sharma <vipinsh@google.com>
Subject: Re: [RFC PATCH 3/4] dma-direct: Add API to preserve/restore allocations
Date: Mon, 8 Jun 2026 19:55:21 +0000	[thread overview]
Message-ID: <aiceKYmc323G7tBs@google.com> (raw)
In-Reply-To: <20260505002737.2213734-4-skhawaja@google.com>

On Tue, May 05, 2026 at 12:27:36AM +0000, Samiullah Khawaja wrote:
> Add an API to preserve/restore the DMA direct allocation for liveupdate.
> The underlying memory is preserved/restored using KHO. During restore
> the memory is setup based on the device configuration, gfp flags and
> allocation attributes. Once restored, the driver can use the usual
> dma_free* API to deallocate the restored DMA allocation.
> 
> This API will be used to add support in dma_alloc* APIs to
> preseve/restore the DMA allocations.
> 
> Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
> ---
>  include/linux/dma-direct.h |  29 +++++++
>  kernel/dma/Kconfig         |   3 +
>  kernel/dma/direct.c        | 163 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 195 insertions(+)
> 

[...]

> diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> index bfef21b4a9ae..d92852942c6c 100644
> --- a/kernel/dma/Kconfig
> +++ b/kernel/dma/Kconfig
> @@ -265,3 +265,6 @@ config DMA_MAP_BENCHMARK
>  	  performance of dma_(un)map_page.
>  
>  	  See tools/testing/selftests/dma/dma_map_benchmark.c
> +
> +config DMA_LIVEUPDATE
> +	bool "Enable preservation of DMA direct allocations"

Nit: depends on LIVEUPDATE?

> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index ec887f443741..c2b98f91900a 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -6,6 +6,8 @@
>   */
>  #include <linux/memblock.h> /* for max_pfn */
>  #include <linux/export.h>
> +#include <linux/kexec_handover.h>
> +#include <linux/kho/abi/dma_alloc.h>
>  #include <linux/mm.h>
>  #include <linux/dma-map-ops.h>
>  #include <linux/scatterlist.h>
> @@ -307,6 +309,167 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>  	return NULL;
>  }
>  
> +#ifdef CONFIG_DMA_LIVEUPDATE
> +int dma_direct_preserve_allocation(struct device *dev, void *cpu_addr,
> +				   size_t size, dma_addr_t dma_handle,
> +				   unsigned long attrs, u64 *state)
> +{
> +	struct dma_alloc_ser *ser;
> +	int ret;
> +
> +	if (!kho_is_enabled())
> +		return -EOPNOTSUPP;
> +
> +	if (IS_ENABLED(CONFIG_DMA_CMA))
> +		return -EOPNOTSUPP;
> +
> +	if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
> +	    !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev))
> +		return -EOPNOTSUPP;
> +
> +	if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_ALLOC) &&
> +	    !dev_is_dma_coherent(dev) &&
> +	    !is_swiotlb_for_alloc(dev))
> +		return -EOPNOTSUPP;
> +
> +	if (IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) &&
> +	    !dev_is_dma_coherent(dev))
> +		return -EOPNOTSUPP;
> +
> +	if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
> +	    dma_is_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
> +		return -EOPNOTSUPP;
> +
> +	ser = kho_alloc_preserve(sizeof(*ser));
> +	if (IS_ERR(ser))
> +		return PTR_ERR(ser);
> +
> +	ser->page_phys = dma_to_phys(dev, dma_handle);
> +	ser->force_decrypted = force_dma_unencrypted(dev);
> +	ser->size = size;
> +
> +	ret = kho_preserve_pages(phys_to_page(ser->page_phys),
> +				 size >> PAGE_SHIFT);

Should this be `PAGE_ALIGN(size) >> PAGE_SHIFT` OR 
`DIV_ROUND_UP(size, PAGE_SIZE)`?

Otherwise, if size is small, say, size == 64-bytes, we preserve 0 pages?

Also, IIRC, even with PAGE_ALIGN, preserving just the requested pgcount
is not enough because buddy allocator allocates in order-N.

For e.g. if a driver requests 20KB (5 pages), the buddy allocator 
fulfills it with an order-3 block (8 pages).

Now, if we only tell KHO to preserve 5 pages, the remaining 3 pages are 
free in the new kernel. When the driver eventually tears down and calls
dma_free_coherent(), dma_direct_free() will call 
__free_pages(page, get_order(size)), which will attempt to free all 8 
pages, causing a double-free panic on the 3 unpreserved pages?

Should we be preserving exactly 1 << get_order(size) pages as per buddy?
Same applies to unpreserve, and restore.

> +	if (ret) {
> +		kho_unpreserve_free(ser);
> +		return ret;
> +	}
> +
> +	*state = virt_to_phys(ser);
> +	return 0;
> +}
> +
> +void dma_direct_unpreserve_allocation(struct device *dev, u64 state)
> +{
> +	struct dma_alloc_ser *ser;
> +
> +	if (!kho_is_enabled())
> +		return;
> +
> +	ser = phys_to_virt(state);
> +	kho_unpreserve_pages(phys_to_page(ser->page_phys),
> +			     ser->size >> PAGE_SHIFT);
> +	kho_unpreserve_free(ser);
> +}
> +
> +void *dma_direct_restore_allocation(struct device *dev, size_t size,
> +				    dma_addr_t *dma_handle, gfp_t gfp,
> +				    unsigned long attrs, u64 state)

Are we relying on the caller to pass same attrs? So, a buffer with 
non-coherent attrs can be mapped with coherent attrs in the new kernel.
Could this cause side-effects? Should we check for such driver bugs with
a WARN here while comparing older attrs with the newer ones too?

Coherency breaking due to subtle driver bugs is very painful to debug :/

> +{
> +	bool remap = false, set_uncached = false;
> +	struct dma_alloc_ser *ser = NULL;
> +	struct page *page;
> +	void *cpu_addr;
> +
> +	if (!kho_is_enabled())
> +		return NULL;
> +
> +	ser = phys_to_virt(state);
> +	page = phys_to_page(ser->page_phys);

[...]

> +
> +	/*
> +	 * Remapping will be blocking so return error. The preserved memory
> +	 * might be already decrypted in the previous kernel, but the decryption
> +	 * call is not guaranteed to be non-blocking so return error always if
> +	 * decryption is required.
> +	 */
> +	if ((remap || force_dma_unencrypted(dev)) &&
> +	    dma_direct_use_pool(dev, gfp))
> +		return NULL; 
> +
> +	/*
> +	 * Encryption scheme changed between two kernels and this might cause
> +	 * issues if device/driver is not handling it properly.
> +	 */
> +	WARN_ON_ONCE(ser->force_decrypted != force_dma_unencrypted(dev));
> +
> +	/*
> +	 * arch_dma_prep_coherent() should make sure that any cache lines from
> +	 * the previous kernel, if the device was coherent previously or cached
> +	 * mapping in this kernel during init are not problamatic for
> +	 * non-coherent allocations.
> +	 */
> +	if (remap) {
> +		pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
> +
> +		if (force_dma_unencrypted(dev))
> +			prot = pgprot_decrypted(prot);
> +
> +		arch_dma_prep_coherent(page, size);
> +
> +		cpu_addr = dma_common_contiguous_remap(page, size, prot,
> +						       __builtin_return_address(0));
> +		if (!cpu_addr)
> +			return NULL;

Should we be kho_restore_free-ing on all these error paths?
We only seem to be kho_restore_free-ing on the success path.
Same for kho_restore_pages.. if we return an error here, we don't
restore the preserved pages? Are we leaking those too?

> +	} else {
> +		cpu_addr = page_address(page);
> +		if (dma_set_decrypted(dev, cpu_addr, size))
> +			return NULL;
> +	}
> +
> +	if (set_uncached) {
> +		arch_dma_prep_coherent(page, size);
> +		cpu_addr = arch_dma_set_uncached(cpu_addr, size);
> +		if (IS_ERR(cpu_addr))
> +			return NULL;
> +	}
> +
> +	*dma_handle = phys_to_dma_direct(dev, ser->page_phys);
> +
> +	/*
> +	 * Cannot free the restored pages on error here as these might be in use
> +	 * by a device with direct allocation in the previous kernel.
> +	 */
> +	WARN_ON(!kho_restore_pages(ser->page_phys,
> +				   ser->size >> PAGE_SHIFT));
> +	kho_restore_free(ser);
> +	return cpu_addr;
> +}
> +#endif
> +
>  void dma_direct_free(struct device *dev, size_t size,
>  		void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs)
>  {

Thanks,
Praan

next prev parent reply	other threads:[~2026-06-08 19:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05  0:27 [RFC PATCH 0/4] dma-mapping: Add preservation of direct allocations Samiullah Khawaja
2026-05-05  0:27 ` [RFC PATCH 1/4] dma: Add DMA allocation preservation KHO ABI Samiullah Khawaja
2026-06-08 17:53   ` Pranjal Shrivastava
2026-05-05  0:27 ` [RFC PATCH 2/4] dma/pool: Add an API to check if DMA allocation is from pool Samiullah Khawaja
2026-06-08 18:10   ` Pranjal Shrivastava
2026-05-05  0:27 ` [RFC PATCH 3/4] dma-direct: Add API to preserve/restore allocations Samiullah Khawaja
2026-06-01 12:35   ` Will Deacon
2026-06-02 20:14     ` Samiullah Khawaja
2026-06-08 19:55   ` Pranjal Shrivastava [this message]
2026-05-05  0:27 ` [RFC PATCH 4/4] dma-mapping: Add API to preserve/restore DMA allocation Samiullah Khawaja

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiceKYmc323G7tBs@google.com \
    --to=praan@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=kevin.tian@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=m.szyprowski@samsung.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pratyush@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=rppt@kernel.org \
    --cc=skhawaja@google.com \
    --cc=vipinsh@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox