[PATCH 0/2] scatterlist cleanups

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] scatterlist cleanups
@ 2015-06-09 16:27 Dan Williams
  2015-06-09 16:27 ` [PATCH 1/2] scatterlist: use sg_phys() Dan Williams
  2015-06-09 16:27 ` [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end() Dan Williams
  0 siblings, 2 replies; 14+ messages in thread
From: Dan Williams @ 2015-06-09 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jens,

While working through a conversion of the scattlerlist data structure
I noticed that some users were open coding scatterlist manipulations.
Christoph recommended sending these for 4.1-rc inclusion.  These are
based on 4.1-rc7.

---

Dan Williams (2):
      scatterlist: use sg_phys()
      scatterlist: cleanup sg_chain() and sg_unmark_end()


 arch/arm/mm/dma-mapping.c                    |    2 +-
 arch/microblaze/kernel/dma.c                 |    2 +-
 block/blk-merge.c                            |    2 +-
 drivers/crypto/omap-sham.c                   |    2 +-
 drivers/dma/imx-dma.c                        |    8 ++------
 drivers/dma/ste_dma40.c                      |    5 +----
 drivers/iommu/intel-iommu.c                  |    4 ++--
 drivers/iommu/iommu.c                        |    2 +-
 drivers/mmc/card/queue.c                     |    4 ++--
 drivers/staging/android/ion/ion_chunk_heap.c |    4 ++--
 include/crypto/scatterwalk.h                 |    9 ++-------
 11 files changed, 16 insertions(+), 28 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] scatterlist: use sg_phys()
  2015-06-09 16:27 [PATCH 0/2] scatterlist cleanups Dan Williams
@ 2015-06-09 16:27 ` Dan Williams
  2015-06-10  0:34   ` Elliott, Robert (Server Storage)
  2015-06-10  9:32   ` Joerg Roedel
  2015-06-09 16:27 ` [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end() Dan Williams
  1 sibling, 2 replies; 14+ messages in thread
From: Dan Williams @ 2015-06-09 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

Coccinelle cleanup to replace open coded sg to physical address
translations.  This is in preparation for introducing scatterlists that
reference __pfn_t.

// sg_phys.cocci: convert usage page_to_phys(sg_page(sg)) to sg_phys(sg)
// usage: make coccicheck COCCI=sg_phys.cocci MODE=patch

virtual patch
virtual report
virtual org

@@
struct scatterlist *sg;
@@

- page_to_phys(sg_page(sg)) + sg->offset
+ sg_phys(sg)

@@
struct scatterlist *sg;
@@

- page_to_phys(sg_page(sg))
+ sg_phys(sg) - sg->offset

Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/arm/mm/dma-mapping.c                    |    2 +-
 arch/microblaze/kernel/dma.c                 |    2 +-
 drivers/iommu/intel-iommu.c                  |    4 ++--
 drivers/iommu/iommu.c                        |    2 +-
 drivers/staging/android/ion/ion_chunk_heap.c |    4 ++--
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 7e7583ddd607..9f6ff6671f01 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 		return -ENOMEM;
 
 	for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
-		phys_addr_t phys = page_to_phys(sg_page(s));
+		phys_addr_t phys = sg_phys(s) - s->offset;
 		unsigned int len = PAGE_ALIGN(s->offset + s->length);
 
 		if (!is_coherent &&
diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
index ed7ba8a11822..dcb3c594d626 100644
--- a/arch/microblaze/kernel/dma.c
+++ b/arch/microblaze/kernel/dma.c
@@ -61,7 +61,7 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
 	/* FIXME this part of code is untested */
 	for_each_sg(sgl, sg, nents, i) {
 		sg->dma_address = sg_phys(sg);
-		__dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
+		__dma_sync(sg_phys(sg),
 							sg->length, direction);
 	}
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 68d43beccb7e..9b9ada71e0d3 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1998,7 +1998,7 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 			sg_res = aligned_nrpages(sg->offset, sg->length);
 			sg->dma_address = ((dma_addr_t)iov_pfn << VTD_PAGE_SHIFT) + sg->offset;
 			sg->dma_length = sg->length;
-			pteval = page_to_phys(sg_page(sg)) | prot;
+			pteval = (sg_phys(sg) - sg->offset) | prot;
 			phys_pfn = pteval >> VTD_PAGE_SHIFT;
 		}
 
@@ -3302,7 +3302,7 @@ static int intel_nontranslate_map_sg(struct device *hddev,
 
 	for_each_sg(sglist, sg, nelems, i) {
 		BUG_ON(!sg_page(sg));
-		sg->dma_address = page_to_phys(sg_page(sg)) + sg->offset;
+		sg->dma_address = sg_phys(sg);
 		sg->dma_length = sg->length;
 	}
 	return nelems;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d4f527e56679..59808fc9110d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1147,7 +1147,7 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 	min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
 
 	for_each_sg(sg, s, nents, i) {
-		phys_addr_t phys = page_to_phys(sg_page(s)) + s->offset;
+		phys_addr_t phys = sg_phys(s);
 
 		/*
 		 * We are mapping on IOMMU page boundaries, so offset within
diff --git a/drivers/staging/android/ion/ion_chunk_heap.c b/drivers/staging/android/ion/ion_chunk_heap.c
index 3e6ec2ee6802..b7da5d142aa9 100644
--- a/drivers/staging/android/ion/ion_chunk_heap.c
+++ b/drivers/staging/android/ion/ion_chunk_heap.c
@@ -81,7 +81,7 @@ static int ion_chunk_heap_allocate(struct ion_heap *heap,
 err:
 	sg = table->sgl;
 	for (i -= 1; i >= 0; i--) {
-		gen_pool_free(chunk_heap->pool, page_to_phys(sg_page(sg)),
+		gen_pool_free(chunk_heap->pool, sg_phys(sg) - sg->offset,
 			      sg->length);
 		sg = sg_next(sg);
 	}
@@ -109,7 +109,7 @@ static void ion_chunk_heap_free(struct ion_buffer *buffer)
 							DMA_BIDIRECTIONAL);
 
 	for_each_sg(table->sgl, sg, table->nents, i) {
-		gen_pool_free(chunk_heap->pool, page_to_phys(sg_page(sg)),
+		gen_pool_free(chunk_heap->pool, sg_phys(sg) - sg->offset,
 			      sg->length);
 	}
 	chunk_heap->allocated -= allocated_size;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/2] scatterlist: use sg_phys()
  2015-06-09 16:27 ` [PATCH 1/2] scatterlist: use sg_phys() Dan Williams
@ 2015-06-10  0:34   ` Elliott, Robert (Server Storage)
  2015-06-10  9:32   ` Joerg Roedel
  1 sibling, 0 replies; 14+ messages in thread
From: Elliott, Robert (Server Storage) @ 2015-06-10  0:34 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: linux-kernel-owner at vger.kernel.org [mailto:linux-kernel-
> owner at vger.kernel.org] On Behalf Of Dan Williams
> Sent: Tuesday, June 09, 2015 10:27 AM
> Subject: [PATCH 1/2] scatterlist: use sg_phys()
> 
...
> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
> index ed7ba8a11822..dcb3c594d626 100644
> --- a/arch/microblaze/kernel/dma.c
> +++ b/arch/microblaze/kernel/dma.c
> @@ -61,7 +61,7 @@ static int dma_direct_map_sg(struct device *dev, struct
> scatterlist *sgl,
>  	/* FIXME this part of code is untested */
>  	for_each_sg(sgl, sg, nents, i) {
>  		sg->dma_address = sg_phys(sg);
> -		__dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
> +		__dma_sync(sg_phys(sg),
>  							sg->length, direction);
>  	}

That one ends up with weird indentation.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] scatterlist: use sg_phys()
  2015-06-09 16:27 ` [PATCH 1/2] scatterlist: use sg_phys() Dan Williams
  2015-06-10  0:34   ` Elliott, Robert (Server Storage)
@ 2015-06-10  9:32   ` Joerg Roedel
  2015-06-10 16:00     ` Dan Williams
  1 sibling, 1 reply; 14+ messages in thread
From: Joerg Roedel @ 2015-06-10  9:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 7e7583ddd607..9f6ff6671f01 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
>  		return -ENOMEM;
>  
>  	for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
> -		phys_addr_t phys = page_to_phys(sg_page(s));
> +		phys_addr_t phys = sg_phys(s) - s->offset;

So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
which makes the above statement to:

	page_to_phys(sg_page(s)) + s->offset - s->offset;

The compiler will probably optimize that away, but it still doesn't look
like an improvement.

>  		unsigned int len = PAGE_ALIGN(s->offset + s->length);
>  
>  		if (!is_coherent &&
> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
> index ed7ba8a11822..dcb3c594d626 100644
> --- a/arch/microblaze/kernel/dma.c
> +++ b/arch/microblaze/kernel/dma.c
> @@ -61,7 +61,7 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
>  	/* FIXME this part of code is untested */
>  	for_each_sg(sgl, sg, nents, i) {
>  		sg->dma_address = sg_phys(sg);
> -		__dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
> +		__dma_sync(sg_phys(sg),
>  							sg->length, direction);

Here the replacement makes sense, but weird indendation. Could all be
moved to one line, I guess.

>  	}
>  
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 68d43beccb7e..9b9ada71e0d3 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1998,7 +1998,7 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
>  			sg_res = aligned_nrpages(sg->offset, sg->length);
>  			sg->dma_address = ((dma_addr_t)iov_pfn << VTD_PAGE_SHIFT) + sg->offset;
>  			sg->dma_length = sg->length;
> -			pteval = page_to_phys(sg_page(sg)) | prot;
> +			pteval = (sg_phys(sg) - sg->offset) | prot;

Here it doesn't make sense too. In general, please remove the cases
where you have to subtract sg->offset after the conversion.


	Joerg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] scatterlist: use sg_phys()
  2015-06-10  9:32   ` Joerg Roedel
@ 2015-06-10 16:00     ` Dan Williams
  2015-06-10 16:31       ` Russell King - ARM Linux
  2015-06-11  6:50       ` Joerg Roedel
  0 siblings, 2 replies; 14+ messages in thread
From: Dan Williams @ 2015-06-10 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 10, 2015 at 2:32 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> index 7e7583ddd607..9f6ff6671f01 100644
>> --- a/arch/arm/mm/dma-mapping.c
>> +++ b/arch/arm/mm/dma-mapping.c
>> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
>>               return -ENOMEM;
>>
>>       for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>> -             phys_addr_t phys = page_to_phys(sg_page(s));
>> +             phys_addr_t phys = sg_phys(s) - s->offset;
>
> So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
> which makes the above statement to:
>
>         page_to_phys(sg_page(s)) + s->offset - s->offset;
>
> The compiler will probably optimize that away, but it still doesn't look
> like an improvement.

The goal is to eventually stop leaking struct page deep into the i/o
stack.  Anything that relies on being able to retrieve a struct page
out of an sg entry needs to be converted.  I think we need a new
helper for this case "sg_phys_aligned()?".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] scatterlist: use sg_phys()
  2015-06-10 16:00     ` Dan Williams
@ 2015-06-10 16:31       ` Russell King - ARM Linux
  2015-06-10 16:57         ` Dan Williams
  2015-06-11  6:50       ` Joerg Roedel
  1 sibling, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux @ 2015-06-10 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 10, 2015 at 09:00:31AM -0700, Dan Williams wrote:
> On Wed, Jun 10, 2015 at 2:32 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
> >> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> >> index 7e7583ddd607..9f6ff6671f01 100644
> >> --- a/arch/arm/mm/dma-mapping.c
> >> +++ b/arch/arm/mm/dma-mapping.c
> >> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
> >>               return -ENOMEM;
> >>
> >>       for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
> >> -             phys_addr_t phys = page_to_phys(sg_page(s));
> >> +             phys_addr_t phys = sg_phys(s) - s->offset;
> >
> > So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
> > which makes the above statement to:
> >
> >         page_to_phys(sg_page(s)) + s->offset - s->offset;
> >
> > The compiler will probably optimize that away, but it still doesn't look
> > like an improvement.
> 
> The goal is to eventually stop leaking struct page deep into the i/o
> stack.  Anything that relies on being able to retrieve a struct page
> out of an sg entry needs to be converted.  I think we need a new
> helper for this case "sg_phys_aligned()?".

Why?  The aim of the code is not to detect whether the address is aligned
to a page (if it were, it'd be testing for a zero s->offset, or it would
be testing for an s->offset being a multiple of the page size.

Let's first understand the code that's being modified (which seems to be
something which isn't done very much today - people seem to just like
changing code for the hell of it.)

        for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
                phys_addr_t phys = page_to_phys(sg_page(s));
                unsigned int len = PAGE_ALIGN(s->offset + s->length);

                if (!is_coherent &&
                        !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
                        __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
dir);

                prot = __dma_direction_to_prot(dir);

                ret = iommu_map(mapping->domain, iova, phys, len, prot);
                if (ret < 0)
                        goto fail;
                count += len >> PAGE_SHIFT;
                iova += len;
        }

What it's doing is trying to map each scatterlist entry via an IOMMU.
Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
page.

However, what says that the IOMMU page size is the same as the host CPU
page size - it may not be... so the above code is wrong for a completely
different reason.

What we _should_ be doing is finding out what the IOMMU minimum page
size is, and using that in conjunction with the sg_phys() of the
scatterlist entry to determine the start and length of the mapping
such that the IOMMU mapping covers the range described by the scatterlist
entry.  (iommu_map() takes arguments which must be aligned to the IOMMU
minimum page size.)

However, what we can also see from the above is that we have other code
here using sg_page() - which is a necessity to be able to perform the
required DMA cache maintanence to ensure that the data is visible to the
DMA device.  We need to kmap_atomic() these in order to flush them, and
there's no other way other than struct page to represent a highmem page.

So, I think your intent to want to remove struct page from the I/O
subsystem is a false hope, unless you want to end up rewriting lots of
architecture code, and you can come up with an alternative method to
represent highmem pages.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] scatterlist: use sg_phys()
  2015-06-10 16:31       ` Russell King - ARM Linux
@ 2015-06-10 16:57         ` Dan Williams
  2015-06-10 17:13           ` Russell King - ARM Linux
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Williams @ 2015-06-10 16:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 10, 2015 at 9:31 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Jun 10, 2015 at 09:00:31AM -0700, Dan Williams wrote:
>> On Wed, Jun 10, 2015 at 2:32 AM, Joerg Roedel <joro@8bytes.org> wrote:
>> > On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
>> >> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> >> index 7e7583ddd607..9f6ff6671f01 100644
>> >> --- a/arch/arm/mm/dma-mapping.c
>> >> +++ b/arch/arm/mm/dma-mapping.c
>> >> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
>> >>               return -ENOMEM;
>> >>
>> >>       for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>> >> -             phys_addr_t phys = page_to_phys(sg_page(s));
>> >> +             phys_addr_t phys = sg_phys(s) - s->offset;
>> >
>> > So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
>> > which makes the above statement to:
>> >
>> >         page_to_phys(sg_page(s)) + s->offset - s->offset;
>> >
>> > The compiler will probably optimize that away, but it still doesn't look
>> > like an improvement.
>>
>> The goal is to eventually stop leaking struct page deep into the i/o
>> stack.  Anything that relies on being able to retrieve a struct page
>> out of an sg entry needs to be converted.  I think we need a new
>> helper for this case "sg_phys_aligned()?".
>
> Why?  The aim of the code is not to detect whether the address is aligned
> to a page (if it were, it'd be testing for a zero s->offset, or it would
> be testing for an s->offset being a multiple of the page size.
>
> Let's first understand the code that's being modified (which seems to be
> something which isn't done very much today - people seem to just like
> changing code for the hell of it.)
>
>         for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>                 phys_addr_t phys = page_to_phys(sg_page(s));
>                 unsigned int len = PAGE_ALIGN(s->offset + s->length);
>
>                 if (!is_coherent &&
>                         !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
>                         __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
> dir);
>
>                 prot = __dma_direction_to_prot(dir);
>
>                 ret = iommu_map(mapping->domain, iova, phys, len, prot);
>                 if (ret < 0)
>                         goto fail;
>                 count += len >> PAGE_SHIFT;
>                 iova += len;
>         }
>
> What it's doing is trying to map each scatterlist entry via an IOMMU.
> Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
> page.
>
> However, what says that the IOMMU page size is the same as the host CPU
> page size - it may not be... so the above code is wrong for a completely
> different reason.
>
> What we _should_ be doing is finding out what the IOMMU minimum page
> size is, and using that in conjunction with the sg_phys() of the
> scatterlist entry to determine the start and length of the mapping
> such that the IOMMU mapping covers the range described by the scatterlist
> entry.  (iommu_map() takes arguments which must be aligned to the IOMMU
> minimum page size.)
>
> However, what we can also see from the above is that we have other code
> here using sg_page() - which is a necessity to be able to perform the
> required DMA cache maintanence to ensure that the data is visible to the
> DMA device.  We need to kmap_atomic() these in order to flush them, and
> there's no other way other than struct page to represent a highmem page.
>
> So, I think your intent to want to remove struct page from the I/O
> subsystem is a false hope, unless you want to end up rewriting lots of
> architecture code, and you can come up with an alternative method to
> represent highmem pages.

I think there will always be cases that need to do pfn_to_page() for
buffer management.  Those configurations will be blocked from seeing
cases where a pfn is not struct page backed.  So, you can have highmem
or dma to pmem, but not both.  Christoph, this is why I have Kconfig
symbols (DEV_PFN etc) to gate whether an arch/config supports pfn-only
i/o.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] scatterlist: use sg_phys()
  2015-06-10 16:57         ` Dan Williams
@ 2015-06-10 17:13           ` Russell King - ARM Linux
  2015-06-10 17:25             ` Dan Williams
  0 siblings, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux @ 2015-06-10 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 10, 2015 at 09:57:06AM -0700, Dan Williams wrote:
> On Wed, Jun 10, 2015 at 9:31 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > Why?  The aim of the code is not to detect whether the address is aligned
> > to a page (if it were, it'd be testing for a zero s->offset, or it would
> > be testing for an s->offset being a multiple of the page size.
> >
> > Let's first understand the code that's being modified (which seems to be
> > something which isn't done very much today - people seem to just like
> > changing code for the hell of it.)
> >
> >         for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
> >                 phys_addr_t phys = page_to_phys(sg_page(s));
> >                 unsigned int len = PAGE_ALIGN(s->offset + s->length);
> >
> >                 if (!is_coherent &&
> >                         !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
> >                         __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
> > dir);
> >
> >                 prot = __dma_direction_to_prot(dir);
> >
> >                 ret = iommu_map(mapping->domain, iova, phys, len, prot);
> >                 if (ret < 0)
> >                         goto fail;
> >                 count += len >> PAGE_SHIFT;
> >                 iova += len;
> >         }
> >
> > What it's doing is trying to map each scatterlist entry via an IOMMU.
> > Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
> > page.
> >
> > However, what says that the IOMMU page size is the same as the host CPU
> > page size - it may not be... so the above code is wrong for a completely
> > different reason.
> >
> > What we _should_ be doing is finding out what the IOMMU minimum page
> > size is, and using that in conjunction with the sg_phys() of the
> > scatterlist entry to determine the start and length of the mapping
> > such that the IOMMU mapping covers the range described by the scatterlist
> > entry.  (iommu_map() takes arguments which must be aligned to the IOMMU
> > minimum page size.)
> >
> > However, what we can also see from the above is that we have other code
> > here using sg_page() - which is a necessity to be able to perform the
> > required DMA cache maintanence to ensure that the data is visible to the
> > DMA device.  We need to kmap_atomic() these in order to flush them, and
> > there's no other way other than struct page to represent a highmem page.
> >
> > So, I think your intent to want to remove struct page from the I/O
> > subsystem is a false hope, unless you want to end up rewriting lots of
> > architecture code, and you can come up with an alternative method to
> > represent highmem pages.
> 
> I think there will always be cases that need to do pfn_to_page() for
> buffer management.  Those configurations will be blocked from seeing
> cases where a pfn is not struct page backed.  So, you can have highmem
> or dma to pmem, but not both.  Christoph, this is why I have Kconfig
> symbols (DEV_PFN etc) to gate whether an arch/config supports pfn-only
> i/o.

Hmm, pmem... yea, in the SolidRun community, we've basically decided to
stick with my updated Marvell BMM layer rather than switch to pmem.  I
forget the reasons why, but the decision was made after looking at what
pmem was doing...

In any case, let's not get bogged down in a peripheral issue.

What I'm objecting to is that the patches I've seen seem to be just
churn without any net benefit.

You can't simply make sg_page() return NULL after this change, because
you've done nothing with the remaining sg_page() callers to allow them
to gracefully handle that case.

What I'd like to see is a much fuller series of patches which show the
whole progression towards your end goal rather than a piecemeal
approach.  Right now, it's not clear that there is any benefit to
this round of changes.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] scatterlist: use sg_phys()
  2015-06-10 17:13           ` Russell King - ARM Linux
@ 2015-06-10 17:25             ` Dan Williams
  0 siblings, 0 replies; 14+ messages in thread
From: Dan Williams @ 2015-06-10 17:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 10, 2015 at 10:13 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Jun 10, 2015 at 09:57:06AM -0700, Dan Williams wrote:
>> On Wed, Jun 10, 2015 at 9:31 AM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > Why?  The aim of the code is not to detect whether the address is aligned
>> > to a page (if it were, it'd be testing for a zero s->offset, or it would
>> > be testing for an s->offset being a multiple of the page size.
>> >
>> > Let's first understand the code that's being modified (which seems to be
>> > something which isn't done very much today - people seem to just like
>> > changing code for the hell of it.)
>> >
>> >         for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>> >                 phys_addr_t phys = page_to_phys(sg_page(s));
>> >                 unsigned int len = PAGE_ALIGN(s->offset + s->length);
>> >
>> >                 if (!is_coherent &&
>> >                         !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
>> >                         __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
>> > dir);
>> >
>> >                 prot = __dma_direction_to_prot(dir);
>> >
>> >                 ret = iommu_map(mapping->domain, iova, phys, len, prot);
>> >                 if (ret < 0)
>> >                         goto fail;
>> >                 count += len >> PAGE_SHIFT;
>> >                 iova += len;
>> >         }
>> >
>> > What it's doing is trying to map each scatterlist entry via an IOMMU.
>> > Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
>> > page.
>> >
>> > However, what says that the IOMMU page size is the same as the host CPU
>> > page size - it may not be... so the above code is wrong for a completely
>> > different reason.
>> >
>> > What we _should_ be doing is finding out what the IOMMU minimum page
>> > size is, and using that in conjunction with the sg_phys() of the
>> > scatterlist entry to determine the start and length of the mapping
>> > such that the IOMMU mapping covers the range described by the scatterlist
>> > entry.  (iommu_map() takes arguments which must be aligned to the IOMMU
>> > minimum page size.)
>> >
>> > However, what we can also see from the above is that we have other code
>> > here using sg_page() - which is a necessity to be able to perform the
>> > required DMA cache maintanence to ensure that the data is visible to the
>> > DMA device.  We need to kmap_atomic() these in order to flush them, and
>> > there's no other way other than struct page to represent a highmem page.
>> >
>> > So, I think your intent to want to remove struct page from the I/O
>> > subsystem is a false hope, unless you want to end up rewriting lots of
>> > architecture code, and you can come up with an alternative method to
>> > represent highmem pages.
>>
>> I think there will always be cases that need to do pfn_to_page() for
>> buffer management.  Those configurations will be blocked from seeing
>> cases where a pfn is not struct page backed.  So, you can have highmem
>> or dma to pmem, but not both.  Christoph, this is why I have Kconfig
>> symbols (DEV_PFN etc) to gate whether an arch/config supports pfn-only
>> i/o.
>
> Hmm, pmem... yea, in the SolidRun community, we've basically decided to
> stick with my updated Marvell BMM layer rather than switch to pmem.  I
> forget the reasons why, but the decision was made after looking at what
> pmem was doing...

I'd of course be open to exploring if drivers/nvdimm/ could be made
more generally useful.

> In any case, let's not get bogged down in a peripheral issue.
>
> What I'm objecting to is that the patches I've seen seem to be just
> churn without any net benefit.
>
> You can't simply make sg_page() return NULL after this change, because
> you've done nothing with the remaining sg_page() callers to allow them
> to gracefully handle that case.
>
> What I'd like to see is a much fuller series of patches which show the
> whole progression towards your end goal rather than a piecemeal
> approach.  Right now, it's not clear that there is any benefit to
> this round of changes.
>

Fair enough.  I had them as part of a larger series [1].  Christoph
suggested that I break out the pure cleanups separately.  I'll add you
to the next rev of that series.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-June/001094.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] scatterlist: use sg_phys()
  2015-06-10 16:00     ` Dan Williams
  2015-06-10 16:31       ` Russell King - ARM Linux
@ 2015-06-11  6:50       ` Joerg Roedel
  1 sibling, 0 replies; 14+ messages in thread
From: Joerg Roedel @ 2015-06-11  6:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 10, 2015 at 09:00:31AM -0700, Dan Williams wrote:
> > So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
> > which makes the above statement to:
> >
> >         page_to_phys(sg_page(s)) + s->offset - s->offset;
> >
> > The compiler will probably optimize that away, but it still doesn't look
> > like an improvement.
> 
> The goal is to eventually stop leaking struct page deep into the i/o
> stack.  Anything that relies on being able to retrieve a struct page
> out of an sg entry needs to be converted.  I think we need a new
> helper for this case "sg_phys_aligned()?".

You still have a reference to a struct page, because sg_phys() calls
sg_page() too. If you want to get rid of sg_page() something like
sg_pfn() migth be a more workable solution than sg_phys_(page_)aligned.

But maybe I am just missing the bigger scope of this, so I agree with
Russell that it is better so see a patch series which shows the
direction you want to go with this.


	Joerg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end()
  2015-06-09 16:27 [PATCH 0/2] scatterlist cleanups Dan Williams
  2015-06-09 16:27 ` [PATCH 1/2] scatterlist: use sg_phys() Dan Williams
@ 2015-06-09 16:27 ` Dan Williams
  2015-06-10  5:38   ` Herbert Xu
  1 sibling, 1 reply; 14+ messages in thread
From: Dan Williams @ 2015-06-09 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

Replace open coded sg_chain() and sg_unmark_end() instances with the
aforementioned helpers.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/blk-merge.c            |    2 +-
 drivers/crypto/omap-sham.c   |    2 +-
 drivers/dma/imx-dma.c        |    8 ++------
 drivers/dma/ste_dma40.c      |    5 +----
 drivers/mmc/card/queue.c     |    4 ++--
 include/crypto/scatterwalk.h |    9 ++-------
 6 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index fd3fee81c23c..7bb4f260ca4b 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -266,7 +266,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
 		if (rq->cmd_flags & REQ_WRITE)
 			memset(q->dma_drain_buffer, 0, q->dma_drain_size);
 
-		sg->page_link &= ~0x02;
+		sg_unmark_end(sg);
 		sg = sg_next(sg);
 		sg_set_page(sg, virt_to_page(q->dma_drain_buffer),
 			    q->dma_drain_size,
diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 4d63e0d4da9a..df8b23e19b90 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -582,7 +582,7 @@ static int omap_sham_xmit_dma(struct omap_sham_dev *dd, dma_addr_t dma_addr,
 		 * the dmaengine may try to DMA the incorrect amount of data.
 		 */
 		sg_init_table(&ctx->sgl, 1);
-		ctx->sgl.page_link = ctx->sg->page_link;
+		sg_assign_page(&ctx->sgl, sg_page(ctx->sg));
 		ctx->sgl.offset = ctx->sg->offset;
 		sg_dma_len(&ctx->sgl) = len32;
 		sg_dma_address(&ctx->sgl) = sg_dma_address(ctx->sg);
diff --git a/drivers/dma/imx-dma.c b/drivers/dma/imx-dma.c
index eed405976ea9..081fbfc87f6b 100644
--- a/drivers/dma/imx-dma.c
+++ b/drivers/dma/imx-dma.c
@@ -886,18 +886,14 @@ static struct dma_async_tx_descriptor *imxdma_prep_dma_cyclic(
 	sg_init_table(imxdmac->sg_list, periods);
 
 	for (i = 0; i < periods; i++) {
-		imxdmac->sg_list[i].page_link = 0;
-		imxdmac->sg_list[i].offset = 0;
+		sg_set_page(&imxdmac->sg_list[i], NULL, period_len, 0);
 		imxdmac->sg_list[i].dma_address = dma_addr;
 		sg_dma_len(&imxdmac->sg_list[i]) = period_len;
 		dma_addr += period_len;
 	}
 
 	/* close the loop */
-	imxdmac->sg_list[periods].offset = 0;
-	sg_dma_len(&imxdmac->sg_list[periods]) = 0;
-	imxdmac->sg_list[periods].page_link =
-		((unsigned long)imxdmac->sg_list | 0x01) & ~0x02;
+	sg_chain(imxdmac->sg_list, periods + 1, imxdmac->sg_list);
 
 	desc->type = IMXDMA_DESC_CYCLIC;
 	desc->sg = imxdmac->sg_list;
diff --git a/drivers/dma/ste_dma40.c b/drivers/dma/ste_dma40.c
index 3c10f034d4b9..e8c00642cacb 100644
--- a/drivers/dma/ste_dma40.c
+++ b/drivers/dma/ste_dma40.c
@@ -2562,10 +2562,7 @@ dma40_prep_dma_cyclic(struct dma_chan *chan, dma_addr_t dma_addr,
 		dma_addr += period_len;
 	}
 
-	sg[periods].offset = 0;
-	sg_dma_len(&sg[periods]) = 0;
-	sg[periods].page_link =
-		((unsigned long)sg | 0x01) & ~0x02;
+	sg_chain(sg, periods + 1, sg);
 
 	txd = d40_prep_sg(chan, sg, sg, periods, direction,
 			  DMA_PREP_INTERRUPT);
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 8efa3684aef8..fa525ee20470 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -469,7 +469,7 @@ static unsigned int mmc_queue_packed_map_sg(struct mmc_queue *mq,
 			sg_set_buf(__sg, buf + offset, len);
 			offset += len;
 			remain -= len;
-			(__sg++)->page_link &= ~0x02;
+			sg_unmark_end(__sg++);
 			sg_len++;
 		} while (remain);
 	}
@@ -477,7 +477,7 @@ static unsigned int mmc_queue_packed_map_sg(struct mmc_queue *mq,
 	list_for_each_entry(req, &packed->list, queuelist) {
 		sg_len += blk_rq_map_sg(mq->queue, req, __sg);
 		__sg = sg + (sg_len - 1);
-		(__sg++)->page_link &= ~0x02;
+		sg_unmark_end(__sg++);
 	}
 	sg_mark_end(sg + (sg_len - 1));
 	return sg_len;
diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h
index 20e4226a2e14..4529889b0f07 100644
--- a/include/crypto/scatterwalk.h
+++ b/include/crypto/scatterwalk.h
@@ -25,13 +25,8 @@
 #include <linux/scatterlist.h>
 #include <linux/sched.h>
 
-static inline void scatterwalk_sg_chain(struct scatterlist *sg1, int num,
-					struct scatterlist *sg2)
-{
-	sg_set_page(&sg1[num - 1], (void *)sg2, 0, 0);
-	sg1[num - 1].page_link &= ~0x02;
-	sg1[num - 1].page_link |= 0x01;
-}
+#define scatterwalk_sg_chain(prv, num, sgl)	sg_chain(prv, num, sgl)
+#define scatterwalk_sg_next(sgl)		sg_next(sgl)
 
 static inline void scatterwalk_crypto_chain(struct scatterlist *head,
 					    struct scatterlist *sg,

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end()
  2015-06-09 16:27 ` [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end() Dan Williams
@ 2015-06-10  5:38   ` Herbert Xu
  2015-06-11  7:31     ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Herbert Xu @ 2015-06-10  5:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 09, 2015 at 12:27:15PM -0400, Dan Williams wrote:
>
> +#define scatterwalk_sg_chain(prv, num, sgl)	sg_chain(prv, num, sgl)
> +#define scatterwalk_sg_next(sgl)		sg_next(sgl)

Why are you reintroducing the scatterwalk_sg_next macro that we
just removed?

I also don't understand why this is so urgent that it has to go
into mainline at this late date.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end()
  2015-06-10  5:38   ` Herbert Xu
@ 2015-06-11  7:31     ` Christoph Hellwig
  2015-06-11  7:34       ` Herbert Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2015-06-11  7:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 10, 2015 at 01:38:04PM +0800, Herbert Xu wrote:
> On Tue, Jun 09, 2015 at 12:27:15PM -0400, Dan Williams wrote:
> >
> > +#define scatterwalk_sg_chain(prv, num, sgl)	sg_chain(prv, num, sgl)
> > +#define scatterwalk_sg_next(sgl)		sg_next(sgl)
> 
> Why are you reintroducing the scatterwalk_sg_next macro that we
> just removed?
> 
> I also don't understand why this is so urgent that it has to go
> into mainline at this late date.

It allows getting a cleaner slate for the next merge window, which seems
useful on it's own.  The re-addition of scatterwalk_sg_next seems next,
but getting rid of the open-coded sg_chain is useful.

Maybe you can take it through the crypto tree and also kill off the
scatterwalk_sg_chain name as well while we're at it?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end()
  2015-06-11  7:31     ` Christoph Hellwig
@ 2015-06-11  7:34       ` Herbert Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Herbert Xu @ 2015-06-11  7:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 11, 2015 at 09:31:07AM +0200, Christoph Hellwig wrote:
>
> It allows getting a cleaner slate for the next merge window, which seems
> useful on it's own.  The re-addition of scatterwalk_sg_next seems next,
> but getting rid of the open-coded sg_chain is useful.

Sure I can take the crypto bits.

> Maybe you can take it through the crypto tree and also kill off the
> scatterwalk_sg_chain name as well while we're at it?

Sure I'm happy to take such a patch.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-06-11  7:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-09 16:27 [PATCH 0/2] scatterlist cleanups Dan Williams
2015-06-09 16:27 ` [PATCH 1/2] scatterlist: use sg_phys() Dan Williams
2015-06-10  0:34   ` Elliott, Robert (Server Storage)
2015-06-10  9:32   ` Joerg Roedel
2015-06-10 16:00     ` Dan Williams
2015-06-10 16:31       ` Russell King - ARM Linux
2015-06-10 16:57         ` Dan Williams
2015-06-10 17:13           ` Russell King - ARM Linux
2015-06-10 17:25             ` Dan Williams
2015-06-11  6:50       ` Joerg Roedel
2015-06-09 16:27 ` [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end() Dan Williams
2015-06-10  5:38   ` Herbert Xu
2015-06-11  7:31     ` Christoph Hellwig
2015-06-11  7:34       ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).