* [1/2] usb: dwc2: Fix DMA alignment to start at allocated boundary
@ 2018-07-05 14:31 Antti Seppälä
0 siblings, 0 replies; 5+ messages in thread
From: Antti Seppälä @ 2018-07-05 14:31 UTC (permalink / raw)
To: Minas Harutyunyan, John Youn, Felipe Balbi
Cc: Grigor Tovmasyan, Vardan Mikayelyan, Douglas Anderson, William Wu,
Greg Kroah-Hartman, linux-usb, Antti Seppälä
The commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more
supported way") introduced a common way to align DMA allocations.
The code in the commit aligns the struct dma_aligned_buffer but the
actual DMA address pointed by data[0] gets aligned to an offset from
the allocated boundary by the kmalloc_ptr and the old_xfer_buffer
pointers.
This is against the recommendation in Documentation/DMA-API.txt which
states:
Therefore, it is recommended that driver writers who don't take
special care to determine the cache line size at run time only map
virtual regions that begin and end on page boundaries (which are
guaranteed also to be cache line boundaries).
The effect of this is that architectures with non-coherent DMA caches
may run into memory corruption or kernel crashes with Unhandled
kernel unaligned accesses exceptions.
Fix the alignment by positioning the DMA area in front of the allocation
and use memory at the end of the area for storing the orginal
transfer_buffer pointer. This may have the added benefit of increased
performance as the DMA area is now fully aligned on all architectures.
Tested with Lantiq xRX200 (MIPS) and RPi Model B Rev 2 (ARM).
Fixes: 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more
supported way")
Signed-off-by: Antti Seppälä <a.seppala@gmail.com>
---
drivers/usb/dwc2/hcd.c | 44 +++++++++++++++++++++++---------------------
1 file changed, 23 insertions(+), 21 deletions(-)
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index b1104be3429c..2ed0ac18e053 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -2665,34 +2665,29 @@ static int dwc2_alloc_split_dma_aligned_buf(struct dwc2_hsotg *hsotg,
#define DWC2_USB_DMA_ALIGN 4
-struct dma_aligned_buffer {
- void *kmalloc_ptr;
- void *old_xfer_buffer;
- u8 data[0];
-};
-
static void dwc2_free_dma_aligned_buffer(struct urb *urb)
{
- struct dma_aligned_buffer *temp;
+ void *stored_xfer_buffer;
if (!(urb->transfer_flags & URB_ALIGNED_TEMP_BUFFER))
return;
- temp = container_of(urb->transfer_buffer,
- struct dma_aligned_buffer, data);
+ /* Restore urb->transfer_buffer from the end of the allocated area */
+ memcpy(&stored_xfer_buffer, urb->transfer_buffer +
+ urb->transfer_buffer_length, sizeof(urb->transfer_buffer));
if (usb_urb_dir_in(urb))
- memcpy(temp->old_xfer_buffer, temp->data,
+ memcpy(stored_xfer_buffer, urb->transfer_buffer,
urb->transfer_buffer_length);
- urb->transfer_buffer = temp->old_xfer_buffer;
- kfree(temp->kmalloc_ptr);
+ kfree(urb->transfer_buffer);
+ urb->transfer_buffer = stored_xfer_buffer;
urb->transfer_flags &= ~URB_ALIGNED_TEMP_BUFFER;
}
static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
{
- struct dma_aligned_buffer *temp, *kmalloc_ptr;
+ void *kmalloc_ptr;
size_t kmalloc_size;
if (urb->num_sgs || urb->sg ||
@@ -2700,22 +2695,29 @@ static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
!((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
return 0;
- /* Allocate a buffer with enough padding for alignment */
+ /*
+ * Allocate a buffer with enough padding for original transfer_buffer
+ * pointer. This allocation is guaranteed to be aligned properly for
+ * DMA
+ */
kmalloc_size = urb->transfer_buffer_length +
- sizeof(struct dma_aligned_buffer) + DWC2_USB_DMA_ALIGN - 1;
+ sizeof(urb->transfer_buffer);
kmalloc_ptr = kmalloc(kmalloc_size, mem_flags);
if (!kmalloc_ptr)
return -ENOMEM;
- /* Position our struct dma_aligned_buffer such that data is aligned */
- temp = PTR_ALIGN(kmalloc_ptr + 1, DWC2_USB_DMA_ALIGN) - 1;
- temp->kmalloc_ptr = kmalloc_ptr;
- temp->old_xfer_buffer = urb->transfer_buffer;
+ /*
+ * Position value of original urb->transfer_buffer pointer to the end
+ * of allocation for later referencing
+ */
+ memcpy(kmalloc_ptr + urb->transfer_buffer_length,
+ &urb->transfer_buffer, sizeof(urb->transfer_buffer));
+
if (usb_urb_dir_out(urb))
- memcpy(temp->data, urb->transfer_buffer,
+ memcpy(kmalloc_ptr, urb->transfer_buffer,
urb->transfer_buffer_length);
- urb->transfer_buffer = temp->data;
+ urb->transfer_buffer = kmalloc_ptr;
urb->transfer_flags |= URB_ALIGNED_TEMP_BUFFER;
^ permalink raw reply related [flat|nested] 5+ messages in thread* [1/2] usb: dwc2: Fix DMA alignment to start at allocated boundary
2018-07-05 14:31 [1/2] usb: dwc2: Fix DMA alignment to start at allocated boundary Antti Seppälä
@ 2018-07-06 15:57 ` Doug Anderson
-1 siblings, 0 replies; 5+ messages in thread
From: Doug Anderson @ 2018-07-06 15:57 UTC (permalink / raw)
To: Antti Seppälä
Cc: Minas Harutyunyan, John Youn, Felipe Balbi, Grigor Tovmasyan,
Vardan Mikayelyan, William Wu, Greg Kroah-Hartman, linux-usb,
stable
Hi,
On Thu, Jul 5, 2018 at 7:31 AM, Antti Seppälä <a.seppala@gmail.com> wrote:
> The commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more
> supported way") introduced a common way to align DMA allocations.
> The code in the commit aligns the struct dma_aligned_buffer but the
> actual DMA address pointed by data[0] gets aligned to an offset from
> the allocated boundary by the kmalloc_ptr and the old_xfer_buffer
> pointers.
>
> This is against the recommendation in Documentation/DMA-API.txt which
> states:
>
> Therefore, it is recommended that driver writers who don't take
> special care to determine the cache line size at run time only map
> virtual regions that begin and end on page boundaries (which are
> guaranteed also to be cache line boundaries).
>
> The effect of this is that architectures with non-coherent DMA caches
> may run into memory corruption or kernel crashes with Unhandled
> kernel unaligned accesses exceptions.
>
> Fix the alignment by positioning the DMA area in front of the allocation
> and use memory at the end of the area for storing the orginal
> transfer_buffer pointer. This may have the added benefit of increased
> performance as the DMA area is now fully aligned on all architectures.
>
> Tested with Lantiq xRX200 (MIPS) and RPi Model B Rev 2 (ARM).
>
> Fixes: 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more
> supported way")
>
> Signed-off-by: Antti Seppälä <a.seppala@gmail.com>
> ---
> drivers/usb/dwc2/hcd.c | 44 +++++++++++++++++++++++---------------------
> 1 file changed, 23 insertions(+), 21 deletions(-)
Thanks for tracking this down and sorry for the original regression.
Seems like a good fix. With this fix, I'd be curious of your
observations on how dwc2 performs (both performance and compatibility
under stress) with the newest driver compared to whatever you were
using before.
Also: you're using the dwc2_set_ltq_params() parameters? Have you
checked if removing the "max_transfer_size" limit boosts your
performance?
Cc: stable@vger.kernel.org
Reviewed-by: Douglas Anderson <dianders@chromium.org>
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH 1/2] usb: dwc2: Fix DMA alignment to start at allocated boundary
@ 2018-07-06 15:57 ` Doug Anderson
0 siblings, 0 replies; 5+ messages in thread
From: Doug Anderson @ 2018-07-06 15:57 UTC (permalink / raw)
To: Antti Seppälä
Cc: Minas Harutyunyan, John Youn, Felipe Balbi, Grigor Tovmasyan,
Vardan Mikayelyan, William Wu, Greg Kroah-Hartman, linux-usb,
stable
Hi,
On Thu, Jul 5, 2018 at 7:31 AM, Antti Seppälä <a.seppala@gmail.com> wrote:
> The commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more
> supported way") introduced a common way to align DMA allocations.
> The code in the commit aligns the struct dma_aligned_buffer but the
> actual DMA address pointed by data[0] gets aligned to an offset from
> the allocated boundary by the kmalloc_ptr and the old_xfer_buffer
> pointers.
>
> This is against the recommendation in Documentation/DMA-API.txt which
> states:
>
> Therefore, it is recommended that driver writers who don't take
> special care to determine the cache line size at run time only map
> virtual regions that begin and end on page boundaries (which are
> guaranteed also to be cache line boundaries).
>
> The effect of this is that architectures with non-coherent DMA caches
> may run into memory corruption or kernel crashes with Unhandled
> kernel unaligned accesses exceptions.
>
> Fix the alignment by positioning the DMA area in front of the allocation
> and use memory at the end of the area for storing the orginal
> transfer_buffer pointer. This may have the added benefit of increased
> performance as the DMA area is now fully aligned on all architectures.
>
> Tested with Lantiq xRX200 (MIPS) and RPi Model B Rev 2 (ARM).
>
> Fixes: 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more
> supported way")
>
> Signed-off-by: Antti Seppälä <a.seppala@gmail.com>
> ---
> drivers/usb/dwc2/hcd.c | 44 +++++++++++++++++++++++---------------------
> 1 file changed, 23 insertions(+), 21 deletions(-)
Thanks for tracking this down and sorry for the original regression.
Seems like a good fix. With this fix, I'd be curious of your
observations on how dwc2 performs (both performance and compatibility
under stress) with the newest driver compared to whatever you were
using before.
Also: you're using the dwc2_set_ltq_params() parameters? Have you
checked if removing the "max_transfer_size" limit boosts your
performance?
Cc: stable@vger.kernel.org
Reviewed-by: Douglas Anderson <dianders@chromium.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [1/2] usb: dwc2: Fix DMA alignment to start at allocated boundary
2018-07-06 15:57 ` [PATCH 1/2] " Doug Anderson
@ 2018-07-07 14:01 ` Antti Seppälä
-1 siblings, 0 replies; 5+ messages in thread
From: Antti Seppälä @ 2018-07-07 14:01 UTC (permalink / raw)
To: Doug Anderson
Cc: Minas Harutyunyan, John Youn, Felipe Balbi, Grigor Tovmasyan,
Vardan Mikayelyan, William Wu, Greg Kroah-Hartman, linux-usb,
stable
On 6 July 2018 at 18:57, Doug Anderson <dianders@chromium.org> wrote:
> Hi,
>
> Thanks for tracking this down and sorry for the original regression.
> Seems like a good fix. With this fix, I'd be curious of your
> observations on how dwc2 performs (both performance and compatibility
> under stress) with the newest driver compared to whatever you were
> using before.
>
My totally not scientifically accurate performance test included
running iperf through my LTE dongle that was connected to dwc2. I saw
throughput increase in download speeds.
Before (kernel 4.9.109 with the offending commit reverted) iperf
reported download bandwidth at 33.2 Mbits/sec
Using newest dwc2 driver after applying "Fix DMA alignment to start at
allocated boundary" patch I got 38.2 Mbits/sec
If I also apply the "Fix inefficient copy of unaligned buffers" patch
I could achieve a total throughput for download around 44.6 Mbits/sec
which I believe is capped by my 50Mbit/s subscription.
> Also: you're using the dwc2_set_ltq_params() parameters? Have you
> checked if removing the "max_transfer_size" limit boosts your
> performance?
>
Yes, I'm using the parameters set there. I tried removing
max_transfer_size but it did not have noticeable impact on the
performance in my tests.
> Cc: stable@vger.kernel.org
> Reviewed-by: Douglas Anderson <dianders@chromium.org>
Thanks for reviewing :)
-Antti
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] usb: dwc2: Fix DMA alignment to start at allocated boundary
@ 2018-07-07 14:01 ` Antti Seppälä
0 siblings, 0 replies; 5+ messages in thread
From: Antti Seppälä @ 2018-07-07 14:01 UTC (permalink / raw)
To: Doug Anderson
Cc: Minas Harutyunyan, John Youn, Felipe Balbi, Grigor Tovmasyan,
Vardan Mikayelyan, William Wu, Greg Kroah-Hartman, linux-usb,
stable
On 6 July 2018 at 18:57, Doug Anderson <dianders@chromium.org> wrote:
> Hi,
>
> Thanks for tracking this down and sorry for the original regression.
> Seems like a good fix. With this fix, I'd be curious of your
> observations on how dwc2 performs (both performance and compatibility
> under stress) with the newest driver compared to whatever you were
> using before.
>
My totally not scientifically accurate performance test included
running iperf through my LTE dongle that was connected to dwc2. I saw
throughput increase in download speeds.
Before (kernel 4.9.109 with the offending commit reverted) iperf
reported download bandwidth at 33.2 Mbits/sec
Using newest dwc2 driver after applying "Fix DMA alignment to start at
allocated boundary" patch I got 38.2 Mbits/sec
If I also apply the "Fix inefficient copy of unaligned buffers" patch
I could achieve a total throughput for download around 44.6 Mbits/sec
which I believe is capped by my 50Mbit/s subscription.
> Also: you're using the dwc2_set_ltq_params() parameters? Have you
> checked if removing the "max_transfer_size" limit boosts your
> performance?
>
Yes, I'm using the parameters set there. I tried removing
max_transfer_size but it did not have noticeable impact on the
performance in my tests.
> Cc: stable@vger.kernel.org
> Reviewed-by: Douglas Anderson <dianders@chromium.org>
Thanks for reviewing :)
-Antti
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-07-07 14:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-05 14:31 [1/2] usb: dwc2: Fix DMA alignment to start at allocated boundary Antti Seppälä
-- strict thread matches above, loose matches on Subject: below --
2018-07-06 15:57 Doug Anderson
2018-07-06 15:57 ` [PATCH 1/2] " Doug Anderson
2018-07-07 14:01 [1/2] " Antti Seppälä
2018-07-07 14:01 ` [PATCH 1/2] " Antti Seppälä
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.