linux-parisc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] add a struct page* parameter to dma_map_ops.unmap_page
@ 2014-11-17 14:11 Stefano Stabellini
  2014-11-17 14:43 ` [Xen-devel] " David Vrabel
  2014-11-21 11:48 ` Stefano Stabellini
  0 siblings, 2 replies; 4+ messages in thread
From: Stefano Stabellini @ 2014-11-17 14:11 UTC (permalink / raw)
  To: gregkh
  Cc: David Vrabel, Ian Campbell, konrad.wilk, linux-kernel, xen-devel,
	torvalds, vinod.koul, dmaengine, Stefano Stabellini, bhelgaas,
	jejb, deller, linux-parisc, iommu, airlied, dri-devel,
	alexander.deucher, christian.koenig, linux, linux-mips, ralf,
	linux-arm-kernel, dwmw2

Hi all,
I am writing this email to ask for your advice.

On architectures where dma addresses are different from physical
addresses, it can be difficult to retrieve the physical address of a
page from its dma address.

Specifically this is the case for Xen on arm and arm64 but I think that
other architectures might have the same issue.

Knowing the physical address is necessary to be able to issue any
required cache maintenance operations when unmap_page,
sync_single_for_cpu and sync_single_for_device are called.

Adding a struct page* parameter to unmap_page, sync_single_for_cpu and
sync_single_for_device would make Linux dma handling on Xen on arm and
arm64 much easier and quicker.

I think that other drivers have similar problems, such as the Intel
IOMMU driver having to call find_iova and walking down an rbtree to get
the physical address in its implementation of unmap_page.

Callers have the struct page* in their hands already from the previous
map_page call so it shouldn't be an issue for them.  A problem does
exist however: there are about 280 callers of dma_unmap_page and
pci_unmap_page. We have even more callers of the dma_sync_single_for_*
functions.



Is such a change even conceivable? How would one go about it?

I think that Xen would not be the only one to gain from it, but I would
like to have a confirmation from others: given the magnitude of the
changes involved I would actually prefer to avoid them unless multiple
drivers/archs/subsystems could really benefit from them.

Cheers,

Stefano


diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index d5d3881..158a765 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -31,8 +31,9 @@ struct dma_map_ops {
 			       unsigned long offset, size_t size,
 			       enum dma_data_direction dir,
 			       struct dma_attrs *attrs);
-	void (*unmap_page)(struct device *dev, dma_addr_t dma_handle,
-			   size_t size, enum dma_data_direction dir,
+	void (*unmap_page)(struct device *dev, struct page *page,
+			   dma_addr_t dma_handle, size_t size,
+			   enum dma_data_direction dir,
 			   struct dma_attrs *attrs);
 	int (*map_sg)(struct device *dev, struct scatterlist *sg,
 		      int nents, enum dma_data_direction dir,
@@ -41,10 +42,10 @@ struct dma_map_ops {
 			 struct scatterlist *sg, int nents,
 			 enum dma_data_direction dir,
 			 struct dma_attrs *attrs);
-	void (*sync_single_for_cpu)(struct device *dev,
+	void (*sync_single_for_cpu)(struct device *dev, struct page *page,
 				    dma_addr_t dma_handle, size_t size,
 				    enum dma_data_direction dir);
-	void (*sync_single_for_device)(struct device *dev,
+	void (*sync_single_for_device)(struct device *dev, struct page *page,
 				       dma_addr_t dma_handle, size_t size,
 				       enum dma_data_direction dir);
 	void (*sync_sg_for_cpu)(struct device *dev,

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Xen-devel] [RFC] add a struct page* parameter to dma_map_ops.unmap_page
  2014-11-17 14:11 [RFC] add a struct page* parameter to dma_map_ops.unmap_page Stefano Stabellini
@ 2014-11-17 14:43 ` David Vrabel
  2014-11-21 11:48 ` Stefano Stabellini
  1 sibling, 0 replies; 4+ messages in thread
From: David Vrabel @ 2014-11-17 14:43 UTC (permalink / raw)
  To: Stefano Stabellini, gregkh
  Cc: linux-mips, airlied, dri-devel, xen-devel, linux, vinod.koul,
	deller, jejb, Ian Campbell, alexander.deucher, bhelgaas,
	linux-arm-kernel, linux-parisc, dwmw2, linux-kernel, ralf, iommu,
	David Vrabel, dmaengine, torvalds, christian.koenig

On 17/11/14 14:11, Stefano Stabellini wrote:
> Hi all,
> I am writing this email to ask for your advice.
> 
> On architectures where dma addresses are different from physical
> addresses, it can be difficult to retrieve the physical address of a
> page from its dma address.
> 
> Specifically this is the case for Xen on arm and arm64 but I think that
> other architectures might have the same issue.
> 
> Knowing the physical address is necessary to be able to issue any
> required cache maintenance operations when unmap_page,
> sync_single_for_cpu and sync_single_for_device are called.
> 
> Adding a struct page* parameter to unmap_page, sync_single_for_cpu and
> sync_single_for_device would make Linux dma handling on Xen on arm and
> arm64 much easier and quicker.

Using an opaque handle instead of struct page * would be more beneficial
for the Intel IOMMU driver.  e.g.,

typedef dma_addr_t dma_handle_t;

dma_handle_t dma_map_single(struct device *dev,
                            void *va, size_t size,
                            enum dma_data_direction dir);
void dma_unmap_single(struct device *dev,
                      dma_handle_t handle, size_t size,
                      enum dma_data_direction dir);

etc.

Drivers would then use:

dma_addr_t dma_addr(dma_handle_t handle);

To obtain the bus address from the handle.

> I think that other drivers have similar problems, such as the Intel
> IOMMU driver having to call find_iova and walking down an rbtree to get
> the physical address in its implementation of unmap_page.
> 
> Callers have the struct page* in their hands already from the previous
> map_page call so it shouldn't be an issue for them.  A problem does
> exist however: there are about 280 callers of dma_unmap_page and
> pci_unmap_page. We have even more callers of the dma_sync_single_for_*
> functions.

You will also need to fix dma_unmap_single() and pci_unmap_single()
(another 1000+ callers).

You may need to consider a parallel set of map/unmap API calls that
return/accept a handle, and then converting drivers one-by-one as
required, instead of trying to convert every single driver at once.

David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] add a struct page* parameter to dma_map_ops.unmap_page
  2014-11-17 14:11 [RFC] add a struct page* parameter to dma_map_ops.unmap_page Stefano Stabellini
  2014-11-17 14:43 ` [Xen-devel] " David Vrabel
@ 2014-11-21 11:48 ` Stefano Stabellini
       [not found]   ` <alpine.DEB.2.02.1411211147450.12596-7Z66fg9igcxYtxbxJUhB2Dgeux46jI+i@public.gmane.org>
  1 sibling, 1 reply; 4+ messages in thread
From: Stefano Stabellini @ 2014-11-21 11:48 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: gregkh, David Vrabel, Ian Campbell, konrad.wilk, linux-kernel,
	xen-devel, torvalds, vinod.koul, dmaengine, bhelgaas, jejb,
	deller, linux-parisc, iommu, airlied, dri-devel,
	alexander.deucher, christian.koenig, linux, linux-mips, ralf,
	linux-arm-kernel, dwmw2

On Mon, 17 Nov 2014, Stefano Stabellini wrote:
> Hi all,
> I am writing this email to ask for your advice.
> 
> On architectures where dma addresses are different from physical
> addresses, it can be difficult to retrieve the physical address of a
> page from its dma address.
> 
> Specifically this is the case for Xen on arm and arm64 but I think that
> other architectures might have the same issue.
> 
> Knowing the physical address is necessary to be able to issue any
> required cache maintenance operations when unmap_page,
> sync_single_for_cpu and sync_single_for_device are called.
> 
> Adding a struct page* parameter to unmap_page, sync_single_for_cpu and
> sync_single_for_device would make Linux dma handling on Xen on arm and
> arm64 much easier and quicker.
> 
> I think that other drivers have similar problems, such as the Intel
> IOMMU driver having to call find_iova and walking down an rbtree to get
> the physical address in its implementation of unmap_page.
> 
> Callers have the struct page* in their hands already from the previous
> map_page call so it shouldn't be an issue for them.  A problem does
> exist however: there are about 280 callers of dma_unmap_page and
> pci_unmap_page. We have even more callers of the dma_sync_single_for_*
> functions.
> 
> 
> 
> Is such a change even conceivable? How would one go about it?
> 
> I think that Xen would not be the only one to gain from it, but I would
> like to have a confirmation from others: given the magnitude of the
> changes involved I would actually prefer to avoid them unless multiple
> drivers/archs/subsystems could really benefit from them.

Given the lack of interest from the community, I am going to drop this
idea.




> Cheers,
> 
> Stefano
> 
> 
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index d5d3881..158a765 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -31,8 +31,9 @@ struct dma_map_ops {
>  			       unsigned long offset, size_t size,
>  			       enum dma_data_direction dir,
>  			       struct dma_attrs *attrs);
> -	void (*unmap_page)(struct device *dev, dma_addr_t dma_handle,
> -			   size_t size, enum dma_data_direction dir,
> +	void (*unmap_page)(struct device *dev, struct page *page,
> +			   dma_addr_t dma_handle, size_t size,
> +			   enum dma_data_direction dir,
>  			   struct dma_attrs *attrs);
>  	int (*map_sg)(struct device *dev, struct scatterlist *sg,
>  		      int nents, enum dma_data_direction dir,
> @@ -41,10 +42,10 @@ struct dma_map_ops {
>  			 struct scatterlist *sg, int nents,
>  			 enum dma_data_direction dir,
>  			 struct dma_attrs *attrs);
> -	void (*sync_single_for_cpu)(struct device *dev,
> +	void (*sync_single_for_cpu)(struct device *dev, struct page *page,
>  				    dma_addr_t dma_handle, size_t size,
>  				    enum dma_data_direction dir);
> -	void (*sync_single_for_device)(struct device *dev,
> +	void (*sync_single_for_device)(struct device *dev, struct page *page,
>  				       dma_addr_t dma_handle, size_t size,
>  				       enum dma_data_direction dir);
>  	void (*sync_sg_for_cpu)(struct device *dev,
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] add a struct page* parameter to dma_map_ops.unmap_page
       [not found]   ` <alpine.DEB.2.02.1411211147450.12596-7Z66fg9igcxYtxbxJUhB2Dgeux46jI+i@public.gmane.org>
@ 2014-11-21 20:18     ` Mitchel Humpherys
  0 siblings, 0 replies; 4+ messages in thread
From: Mitchel Humpherys @ 2014-11-21 20:18 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: linux-mips-6z/3iImG2C8G8FEW9MqTrA, airlied-cv59FeDIM0c,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	linux-lFZ/pmaqli7XmaaqVzeoHQ, vinod.koul-ral2JQCrhuEAvxtiuMwx3w,
	deller-Mmb7MZpHnFY, jejb-6jwH94ZQLHl74goWV3ctuw,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, Ian Campbell,
	dmaengine-u79uwXL29TY76Z2rM5mHXA, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, ralf-6z/3iImG2C8G8FEW9MqTrA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, David Vrabel,
	alexander.deucher-5C7GfCeVMHo,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	christian.koenig-5C7GfCeVMHo

On Fri, Nov 21 2014 at 03:48:33 AM, Stefano Stabellini <stefano.stabellini-mvvWK6WmYclDPfheJLI6IQ@public.gmane.org> wrote:
> On Mon, 17 Nov 2014, Stefano Stabellini wrote:
>> Hi all,
>> I am writing this email to ask for your advice.
>> 
>> On architectures where dma addresses are different from physical
>> addresses, it can be difficult to retrieve the physical address of a
>> page from its dma address.
>> 
>> Specifically this is the case for Xen on arm and arm64 but I think that
>> other architectures might have the same issue.
>> 
>> Knowing the physical address is necessary to be able to issue any
>> required cache maintenance operations when unmap_page,
>> sync_single_for_cpu and sync_single_for_device are called.
>> 
>> Adding a struct page* parameter to unmap_page, sync_single_for_cpu and
>> sync_single_for_device would make Linux dma handling on Xen on arm and
>> arm64 much easier and quicker.
>> 
>> I think that other drivers have similar problems, such as the Intel
>> IOMMU driver having to call find_iova and walking down an rbtree to get
>> the physical address in its implementation of unmap_page.
>> 
>> Callers have the struct page* in their hands already from the previous
>> map_page call so it shouldn't be an issue for them.  A problem does
>> exist however: there are about 280 callers of dma_unmap_page and
>> pci_unmap_page. We have even more callers of the dma_sync_single_for_*
>> functions.
>> 
>> 
>> 
>> Is such a change even conceivable? How would one go about it?
>> 
>> I think that Xen would not be the only one to gain from it, but I would
>> like to have a confirmation from others: given the magnitude of the
>> changes involved I would actually prefer to avoid them unless multiple
>> drivers/archs/subsystems could really benefit from them.
>
> Given the lack of interest from the community, I am going to drop this
> idea.

Actually it sounds like the right API design to me.  As a bonus it
should help performance a bit as well.  For example, the current
implementations of dma_sync_single_for_{cpu,device} and dma_unmap_page
on ARM while using the IOMMU mapper
(arm_iommu_sync_single_for_{cpu,device}, arm_iommu_unmap_page) all call
iommu_iova_to_phys which generally results in a page table walk or a
hardware register write/poll/read.

The problem, as you mentioned, is that there are a ton of callers of the
existing APIs.  I think David Vrabel had a good suggestion for dealing
with this:

On Mon, Nov 17 2014 at 06:43:46 AM, David Vrabel <david.vrabel-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> wrote:
> You may need to consider a parallel set of map/unmap API calls that
> return/accept a handle, and then converting drivers one-by-one as
> required, instead of trying to convert every single driver at once.

However, I'm not sure whether the costs of having a parallel set of APIs
outweigh the benefits of a cleaner API and a slight performance boost...
But I hope the idea isn't completely abandoned without some profiling or
other evidence of its benefits (e.g. patches showing how drivers could
be simplified with the new APIs).


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-11-21 20:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-17 14:11 [RFC] add a struct page* parameter to dma_map_ops.unmap_page Stefano Stabellini
2014-11-17 14:43 ` [Xen-devel] " David Vrabel
2014-11-21 11:48 ` Stefano Stabellini
     [not found]   ` <alpine.DEB.2.02.1411211147450.12596-7Z66fg9igcxYtxbxJUhB2Dgeux46jI+i@public.gmane.org>
2014-11-21 20:18     ` Mitchel Humpherys

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).