From: Pranjal Shrivastava <praan@google.com>
To: Samiullah Khawaja <skhawaja@google.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>,
Will Deacon <will@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Mike Rapoport <rppt@kernel.org>,
Pratyush Yadav <pratyush@kernel.org>,
Alexander Graf <graf@amazon.com>,
Robin Murphy <robin.murphy@arm.com>,
Kevin Tian <kevin.tian@intel.com>,
iommu@lists.linux.dev, kexec@lists.infradead.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
David Matlack <dmatlack@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Vipin Sharma <vipinsh@google.com>
Subject: Re: [RFC PATCH 3/4] dma-direct: Add API to preserve/restore allocations
Date: Mon, 8 Jun 2026 19:55:21 +0000 [thread overview]
Message-ID: <aiceKYmc323G7tBs@google.com> (raw)
In-Reply-To: <20260505002737.2213734-4-skhawaja@google.com>
On Tue, May 05, 2026 at 12:27:36AM +0000, Samiullah Khawaja wrote:
> Add an API to preserve/restore the DMA direct allocation for liveupdate.
> The underlying memory is preserved/restored using KHO. During restore
> the memory is setup based on the device configuration, gfp flags and
> allocation attributes. Once restored, the driver can use the usual
> dma_free* API to deallocate the restored DMA allocation.
>
> This API will be used to add support in dma_alloc* APIs to
> preseve/restore the DMA allocations.
>
> Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
> ---
> include/linux/dma-direct.h | 29 +++++++
> kernel/dma/Kconfig | 3 +
> kernel/dma/direct.c | 163 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 195 insertions(+)
>
[...]
> diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> index bfef21b4a9ae..d92852942c6c 100644
> --- a/kernel/dma/Kconfig
> +++ b/kernel/dma/Kconfig
> @@ -265,3 +265,6 @@ config DMA_MAP_BENCHMARK
> performance of dma_(un)map_page.
>
> See tools/testing/selftests/dma/dma_map_benchmark.c
> +
> +config DMA_LIVEUPDATE
> + bool "Enable preservation of DMA direct allocations"
Nit: depends on LIVEUPDATE?
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index ec887f443741..c2b98f91900a 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -6,6 +6,8 @@
> */
> #include <linux/memblock.h> /* for max_pfn */
> #include <linux/export.h>
> +#include <linux/kexec_handover.h>
> +#include <linux/kho/abi/dma_alloc.h>
> #include <linux/mm.h>
> #include <linux/dma-map-ops.h>
> #include <linux/scatterlist.h>
> @@ -307,6 +309,167 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> return NULL;
> }
>
> +#ifdef CONFIG_DMA_LIVEUPDATE
> +int dma_direct_preserve_allocation(struct device *dev, void *cpu_addr,
> + size_t size, dma_addr_t dma_handle,
> + unsigned long attrs, u64 *state)
> +{
> + struct dma_alloc_ser *ser;
> + int ret;
> +
> + if (!kho_is_enabled())
> + return -EOPNOTSUPP;
> +
> + if (IS_ENABLED(CONFIG_DMA_CMA))
> + return -EOPNOTSUPP;
> +
> + if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
> + !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev))
> + return -EOPNOTSUPP;
> +
> + if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_ALLOC) &&
> + !dev_is_dma_coherent(dev) &&
> + !is_swiotlb_for_alloc(dev))
> + return -EOPNOTSUPP;
> +
> + if (IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) &&
> + !dev_is_dma_coherent(dev))
> + return -EOPNOTSUPP;
> +
> + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
> + dma_is_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
> + return -EOPNOTSUPP;
> +
> + ser = kho_alloc_preserve(sizeof(*ser));
> + if (IS_ERR(ser))
> + return PTR_ERR(ser);
> +
> + ser->page_phys = dma_to_phys(dev, dma_handle);
> + ser->force_decrypted = force_dma_unencrypted(dev);
> + ser->size = size;
> +
> + ret = kho_preserve_pages(phys_to_page(ser->page_phys),
> + size >> PAGE_SHIFT);
Should this be `PAGE_ALIGN(size) >> PAGE_SHIFT` OR
`DIV_ROUND_UP(size, PAGE_SIZE)`?
Otherwise, if size is small, say, size == 64-bytes, we preserve 0 pages?
Also, IIRC, even with PAGE_ALIGN, preserving just the requested pgcount
is not enough because buddy allocator allocates in order-N.
For e.g. if a driver requests 20KB (5 pages), the buddy allocator
fulfills it with an order-3 block (8 pages).
Now, if we only tell KHO to preserve 5 pages, the remaining 3 pages are
free in the new kernel. When the driver eventually tears down and calls
dma_free_coherent(), dma_direct_free() will call
__free_pages(page, get_order(size)), which will attempt to free all 8
pages, causing a double-free panic on the 3 unpreserved pages?
Should we be preserving exactly 1 << get_order(size) pages as per buddy?
Same applies to unpreserve, and restore.
> + if (ret) {
> + kho_unpreserve_free(ser);
> + return ret;
> + }
> +
> + *state = virt_to_phys(ser);
> + return 0;
> +}
> +
> +void dma_direct_unpreserve_allocation(struct device *dev, u64 state)
> +{
> + struct dma_alloc_ser *ser;
> +
> + if (!kho_is_enabled())
> + return;
> +
> + ser = phys_to_virt(state);
> + kho_unpreserve_pages(phys_to_page(ser->page_phys),
> + ser->size >> PAGE_SHIFT);
> + kho_unpreserve_free(ser);
> +}
> +
> +void *dma_direct_restore_allocation(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, gfp_t gfp,
> + unsigned long attrs, u64 state)
Are we relying on the caller to pass same attrs? So, a buffer with
non-coherent attrs can be mapped with coherent attrs in the new kernel.
Could this cause side-effects? Should we check for such driver bugs with
a WARN here while comparing older attrs with the newer ones too?
Coherency breaking due to subtle driver bugs is very painful to debug :/
> +{
> + bool remap = false, set_uncached = false;
> + struct dma_alloc_ser *ser = NULL;
> + struct page *page;
> + void *cpu_addr;
> +
> + if (!kho_is_enabled())
> + return NULL;
> +
> + ser = phys_to_virt(state);
> + page = phys_to_page(ser->page_phys);
[...]
> +
> + /*
> + * Remapping will be blocking so return error. The preserved memory
> + * might be already decrypted in the previous kernel, but the decryption
> + * call is not guaranteed to be non-blocking so return error always if
> + * decryption is required.
> + */
> + if ((remap || force_dma_unencrypted(dev)) &&
> + dma_direct_use_pool(dev, gfp))
> + return NULL;
> +
> + /*
> + * Encryption scheme changed between two kernels and this might cause
> + * issues if device/driver is not handling it properly.
> + */
> + WARN_ON_ONCE(ser->force_decrypted != force_dma_unencrypted(dev));
> +
> + /*
> + * arch_dma_prep_coherent() should make sure that any cache lines from
> + * the previous kernel, if the device was coherent previously or cached
> + * mapping in this kernel during init are not problamatic for
> + * non-coherent allocations.
> + */
> + if (remap) {
> + pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
> +
> + if (force_dma_unencrypted(dev))
> + prot = pgprot_decrypted(prot);
> +
> + arch_dma_prep_coherent(page, size);
> +
> + cpu_addr = dma_common_contiguous_remap(page, size, prot,
> + __builtin_return_address(0));
> + if (!cpu_addr)
> + return NULL;
Should we be kho_restore_free-ing on all these error paths?
We only seem to be kho_restore_free-ing on the success path.
Same for kho_restore_pages.. if we return an error here, we don't
restore the preserved pages? Are we leaking those too?
> + } else {
> + cpu_addr = page_address(page);
> + if (dma_set_decrypted(dev, cpu_addr, size))
> + return NULL;
> + }
> +
> + if (set_uncached) {
> + arch_dma_prep_coherent(page, size);
> + cpu_addr = arch_dma_set_uncached(cpu_addr, size);
> + if (IS_ERR(cpu_addr))
> + return NULL;
> + }
> +
> + *dma_handle = phys_to_dma_direct(dev, ser->page_phys);
> +
> + /*
> + * Cannot free the restored pages on error here as these might be in use
> + * by a device with direct allocation in the previous kernel.
> + */
> + WARN_ON(!kho_restore_pages(ser->page_phys,
> + ser->size >> PAGE_SHIFT));
> + kho_restore_free(ser);
> + return cpu_addr;
> +}
> +#endif
> +
> void dma_direct_free(struct device *dev, size_t size,
> void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs)
> {
Thanks,
Praan
next prev parent reply other threads:[~2026-06-08 19:55 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 0:27 [RFC PATCH 0/4] dma-mapping: Add preservation of direct allocations Samiullah Khawaja
2026-05-05 0:27 ` [RFC PATCH 1/4] dma: Add DMA allocation preservation KHO ABI Samiullah Khawaja
2026-06-08 17:53 ` Pranjal Shrivastava
2026-05-05 0:27 ` [RFC PATCH 2/4] dma/pool: Add an API to check if DMA allocation is from pool Samiullah Khawaja
2026-06-08 18:10 ` Pranjal Shrivastava
2026-05-05 0:27 ` [RFC PATCH 3/4] dma-direct: Add API to preserve/restore allocations Samiullah Khawaja
2026-06-01 12:35 ` Will Deacon
2026-06-02 20:14 ` Samiullah Khawaja
2026-06-08 19:55 ` Pranjal Shrivastava [this message]
2026-05-05 0:27 ` [RFC PATCH 4/4] dma-mapping: Add API to preserve/restore DMA allocation Samiullah Khawaja
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiceKYmc323G7tBs@google.com \
--to=praan@google.com \
--cc=akpm@linux-foundation.org \
--cc=dmatlack@google.com \
--cc=graf@amazon.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=kevin.tian@intel.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=m.szyprowski@samsung.com \
--cc=pasha.tatashin@soleen.com \
--cc=pratyush@kernel.org \
--cc=robin.murphy@arm.com \
--cc=rppt@kernel.org \
--cc=skhawaja@google.com \
--cc=vipinsh@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.