* arm: dma-mapping: CPU may not see up-to-date data after DMA transaction @ 2018-11-22 7:36 JABLONSKY Jan 2018-11-22 9:29 ` Russell King - ARM Linux 0 siblings, 1 reply; 9+ messages in thread From: JABLONSKY Jan @ 2018-11-22 7:36 UTC (permalink / raw) To: linux-arm-kernel Proper cleaning and invalidating has to be performed also for inner cache, otherwise CPU may work with data from inner cache Problem appears on The Altera SoC FPGA (Cortex-A9), during higher CPU and system memory load followed reading by CPU from memory after DMA transaction. That means CPU may not see most up-to-date and correct copy of DMA buffer. Replace with functions __sync_cache_range_r/w, which handle mentioned situation. Protect against preemption to make sure, that the code is running on the current CPU Signed-off-by: Jan Jablonsky <jan.jablonsky@thalesgroup.com> --- arch/arm/mm/dma-mapping.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 661fe48ab78d..f40dc8b250d8 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -997,11 +997,18 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, dma_cache_maint_page(page, off, size, dir, dmac_map_area); paddr = page_to_phys(page) + off; + + /* + * Protect against preemption + * Ensure that the code is running on the current CPU + */ + preempt_disable(); if (dir == DMA_FROM_DEVICE) { - outer_inv_range(paddr, paddr + size); + __sync_cache_range_r(phys_to_virt(paddr), size); } else { - outer_clean_range(paddr, paddr + size); + __sync_cache_range_w(phys_to_virt(paddr), size); } + preempt_enable(); /* FIXME: non-speculating: flush on bidirectional mappings? */ } @@ -1013,7 +1020,13 @@ static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, /* FIXME: non-speculating: not required */ /* in any case, don't bother invalidating if DMA to device */ if (dir != DMA_TO_DEVICE) { - outer_inv_range(paddr, paddr + size); + /* + * Protect against preemption + * Ensure that the code is running on the current CPU + */ + preempt_disable(); + __sync_cache_range_r(phys_to_virt(paddr), size); + preempt_enable(); dma_cache_maint_page(page, off, size, dir, dmac_unmap_area); } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* arm: dma-mapping: CPU may not see up-to-date data after DMA transaction 2018-11-22 7:36 arm: dma-mapping: CPU may not see up-to-date data after DMA transaction JABLONSKY Jan @ 2018-11-22 9:29 ` Russell King - ARM Linux 2018-11-22 12:25 ` JABLONSKY Jan 0 siblings, 1 reply; 9+ messages in thread From: Russell King - ARM Linux @ 2018-11-22 9:29 UTC (permalink / raw) To: linux-arm-kernel On Thu, Nov 22, 2018 at 07:36:26AM +0000, JABLONSKY Jan wrote: > Proper cleaning and invalidating has to be performed also for inner cache, > otherwise CPU may work with data from inner cache The functions are _already_ performing cache maintanence on the inner cache, this patch makes no sense. > Problem appears on The Altera SoC FPGA (Cortex-A9), during > higher CPU and system memory load followed reading by CPU from memory after > DMA transaction. > That means CPU may not see most up-to-date and correct copy of DMA buffer. > > Replace with functions __sync_cache_range_r/w, which handle mentioned > situation. > Protect against preemption to make sure, that the code is > running on the current CPU This is too vague to say much more. > > Signed-off-by: Jan Jablonsky <jan.jablonsky@thalesgroup.com> > --- > arch/arm/mm/dma-mapping.c | 19 ++++++++++++++++--- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index 661fe48ab78d..f40dc8b250d8 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -997,11 +997,18 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, > dma_cache_maint_page(page, off, size, dir, dmac_map_area); > > paddr = page_to_phys(page) + off; > + > + /* > + * Protect against preemption > + * Ensure that the code is running on the current CPU > + */ > + preempt_disable(); > if (dir == DMA_FROM_DEVICE) { > - outer_inv_range(paddr, paddr + size); > + __sync_cache_range_r(phys_to_virt(paddr), size); > } else { > - outer_clean_range(paddr, paddr + size); > + __sync_cache_range_w(phys_to_virt(paddr), size); > } > + preempt_enable(); > /* FIXME: non-speculating: flush on bidirectional mappings? */ > } > > @@ -1013,7 +1020,13 @@ static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, > /* FIXME: non-speculating: not required */ > /* in any case, don't bother invalidating if DMA to device */ > if (dir != DMA_TO_DEVICE) { > - outer_inv_range(paddr, paddr + size); > + /* > + * Protect against preemption > + * Ensure that the code is running on the current CPU > + */ > + preempt_disable(); > + __sync_cache_range_r(phys_to_virt(paddr), size); > + preempt_enable(); > > dma_cache_maint_page(page, off, size, dir, dmac_unmap_area); > } -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ^ permalink raw reply [flat|nested] 9+ messages in thread
* arm: dma-mapping: CPU may not see up-to-date data after DMA transaction 2018-11-22 9:29 ` Russell King - ARM Linux @ 2018-11-22 12:25 ` JABLONSKY Jan 2018-11-22 12:51 ` Russell King - ARM Linux 0 siblings, 1 reply; 9+ messages in thread From: JABLONSKY Jan @ 2018-11-22 12:25 UTC (permalink / raw) To: linux-arm-kernel After DMA transaction is complete, content of the DMA buffer is stored in the main memory. outer_inv_range invalidates cache lines only in L2 level. outer_cache is dedicated to L2 cache, but CPU must also clean L1 dcache to avoid possible L1 hit Functions __sync_cache_range_r/w handle L1 and also L2 cache maintenance That means outer_cache_* functions (within __dma_page_cpu_to_dev and __dma_page_dev_to_cpu) they can be replaced with functions below: outer_inv_range -> __sync_cache_range_r outer_clean_range -> __sync_cache_range_w - __sync_cache_range_r -> - When DMA writes and CPU read - to achieve reading direct from main memory (L1 miss, L2 miss) - Ensure preceding writes to *p by other CPUs are visible to subsequent reads by this CPU and - __sync_cache_range_w -> - When CPU writes and DMA read - Ensure preceding writes to *p by this CPU are visible to subsequent reads by other CPUs Similar problem is also described in the link below: https://community.arm.com/processors/f/discussions/6555/need-to-invalidate-l1-cache-after-dma-on-cortex-a9 Test case: The problem appears during calculation of md5sum of eMMC (16GB) in the loop. After some time (e.g. 1-2 hours), md5sum may be different For example: 5c2b3c7a6d69a2f6c4c1ddfdd3bf1ed5 /dev/mmcblk0 time 746 [s] 5c2b3c7a6d69a2f6c4c1ddfdd3bf1ed5 /dev/mmcblk0 time 738 [s] c5f2bb8e9d83744d4087450d6274208e /dev/mmcblk0 time 691 [s] ... I hope, now is more clear for you On Do, 2018-11-22 at 09:29 +0000, Russell King - ARM Linux wrote: > On Thu, Nov 22, 2018 at 07:36:26AM +0000, JABLONSKY Jan wrote: > > Proper cleaning and invalidating has to be performed also for inner cache, > > otherwise CPU may work with data from inner cache > > The functions are _already_ performing cache maintanence on the inner > cache, this patch makes no sense. > > > Problem appears on The Altera SoC FPGA (Cortex-A9), during > > higher CPU and system memory load followed reading by CPU from memory after > > DMA transaction. > > That means CPU may not see most up-to-date and correct copy of DMA buffer. > > > > Replace with functions __sync_cache_range_r/w, which handle mentioned > > situation. > > Protect against preemption to make sure, that the code is > > running on the current CPU > > This is too vague to say much more. > > > > > Signed-off-by: Jan Jablonsky <jan.jablonsky@thalesgroup.com> > > --- > > arch/arm/mm/dma-mapping.c | 19 ++++++++++++++++--- > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > > index 661fe48ab78d..f40dc8b250d8 100644 > > --- a/arch/arm/mm/dma-mapping.c > > +++ b/arch/arm/mm/dma-mapping.c > > @@ -997,11 +997,18 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, > > dma_cache_maint_page(page, off, size, dir, dmac_map_area); > > > > paddr = page_to_phys(page) + off; > > + > > + /* > > + * Protect against preemption > > + * Ensure that the code is running on the current CPU > > + */ > > + preempt_disable(); > > if (dir == DMA_FROM_DEVICE) { > > - outer_inv_range(paddr, paddr + size); > > + __sync_cache_range_r(phys_to_virt(paddr), size); > > } else { > > - outer_clean_range(paddr, paddr + size); > > + __sync_cache_range_w(phys_to_virt(paddr), size); > > } > > + preempt_enable(); > > /* FIXME: non-speculating: flush on bidirectional mappings? */ > > } > > > > @@ -1013,7 +1020,13 @@ static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, > > /* FIXME: non-speculating: not required */ > > /* in any case, don't bother invalidating if DMA to device */ > > if (dir != DMA_TO_DEVICE) { > > - outer_inv_range(paddr, paddr + size); > > + /* > > + * Protect against preemption > > + * Ensure that the code is running on the current CPU > > + */ > > + preempt_disable(); > > + __sync_cache_range_r(phys_to_virt(paddr), size); > > + preempt_enable(); > > > > dma_cache_maint_page(page, off, size, dir, dmac_unmap_area); > > } > ^ permalink raw reply [flat|nested] 9+ messages in thread
* arm: dma-mapping: CPU may not see up-to-date data after DMA transaction 2018-11-22 12:25 ` JABLONSKY Jan @ 2018-11-22 12:51 ` Russell King - ARM Linux 2018-11-22 15:36 ` JABLONSKY Jan 0 siblings, 1 reply; 9+ messages in thread From: Russell King - ARM Linux @ 2018-11-22 12:51 UTC (permalink / raw) To: linux-arm-kernel On Thu, Nov 22, 2018 at 12:25:18PM +0000, JABLONSKY Jan wrote: > After DMA transaction is complete, content of the DMA buffer is stored in the main memory. > > outer_inv_range invalidates cache lines only in L2 level. Correct. > outer_cache is dedicated to L2 cache, but CPU must also clean L1 dcache to avoid possible L1 hit Correct, except we _do_ handle the L1 cache. See dmac_*_area functions that are already being called via dma_cache_maint_page(). You did analyse the code before proposing these modifications? ;) Given that we already handle the L1 cache, why do we need more L1 cache maintanence? In any case, in your patch, using phys_to_virt() on the addresses is utterly wrong, and will lead to hitting the wrong memory if the physical address is a highmem page (this is why we have dma_cache_maint_page(), which has the necessary code to handle this case.) The physical to virtual translation functions and macros are only valid for lowmem. So, I think your patch actually ends up breaking stuff by doing cache maintanence on unexpected memory locations. Given, also, that we have shed-loads of Cortex A9 systems out there, using DMA with MMC, some using it for the rootfs, I doubt that the problem lies here but elsewhere - which MMC host controller are you using? Has that driver been properly reviewed - does it use the DMA API correctly? Does it use the correct DMA direction (which should always be the same for all DMA API operations on the buffer.) It could also be a hardware bug (which is suggested with your modifications to disable preemption). As cache maintanence is broadcasted across all cores in a SMP system with Cortex A9, it should not matter whether we are pre-empted during cache handling for DMA. If disabling preemption makes a difference for that, it probably means you have bigger coherency issues on your SoC. Yes, you may have found a rare corner case, but we really need to fully understand the details of what is going on here _before_ making radical changes to code that has been proven by probably billions of users over the course of the last nine years. Thanks. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ^ permalink raw reply [flat|nested] 9+ messages in thread
* arm: dma-mapping: CPU may not see up-to-date data after DMA transaction 2018-11-22 12:51 ` Russell King - ARM Linux @ 2018-11-22 15:36 ` JABLONSKY Jan 2018-11-22 16:36 ` Robin Murphy 2018-11-22 16:57 ` Russell King - ARM Linux 0 siblings, 2 replies; 9+ messages in thread From: JABLONSKY Jan @ 2018-11-22 15:36 UTC (permalink / raw) To: linux-arm-kernel Yes, you have right, Synopsys DesignWare Multimedia Card Interface driver dw_mmc SoC is using internal DMA controller (IDMAC) A Few days ago, I also analyzed driver and proper using of DMA-API. I found in dw_mci_dmac_complete_dma callback function, that invalidation of caches is performed only in the case of external DMA. https://lore.kernel.org/patchwork/patch/1015401/ there is some significant progress, but the problem still occurs The idea behind using function __sync_cache_range_r is, that the sequence of cache maintaining is similar as described in L2 cache controller TRM to ensure coherency http://infocenter.arm.com/help/topic/com.arm.doc.ddi0246h/DDI0246H_l2c310_r3p3_trm.pdf Perform a combination of Clean and Invalidate operations that ensures coherency from the outside in, and from the inside out. CleanLevel1 Address ; This is broadcast within the cluster DSB ; Ensure completion of the clean as far as Level 2 Clean&InvalLevel2 Address ; forces the address out past level 2 SYNC ; Ensures completion of the L2 inval Clean&InvalLevel1 Address ; This is broadcast within the cluster DSB ; Ensure completion of the clean&inval as far as Level 2 (no data lost) With mentioned sequence, I really can see big difference. But anyway, I understand what you mean, I will continue with the analysis On Do, 2018-11-22 at 12:51 +0000, Russell King - ARM Linux wrote: > On Thu, Nov 22, 2018 at 12:25:18PM +0000, JABLONSKY Jan wrote: > > After DMA transaction is complete, content of the DMA buffer is stored in the main memory. > > > > outer_inv_range invalidates cache lines only in L2 level. > > Correct. > > > outer_cache is dedicated to L2 cache, but CPU must also clean L1 dcache to avoid possible L1 hit > > Correct, except we _do_ handle the L1 cache. See dmac_*_area functions > that are already being called via dma_cache_maint_page(). You did > analyse the code before proposing these modifications? ;) > > Given that we already handle the L1 cache, why do we need more L1 cache > maintanence? > > In any case, in your patch, using phys_to_virt() on the addresses is > utterly wrong, and will lead to hitting the wrong memory if the physical > address is a highmem page (this is why we have dma_cache_maint_page(), > which has the necessary code to handle this case.) The physical to > virtual translation functions and macros are only valid for lowmem. > > So, I think your patch actually ends up breaking stuff by doing cache > maintanence on unexpected memory locations. > > Given, also, that we have shed-loads of Cortex A9 systems out there, > using DMA with MMC, some using it for the rootfs, I doubt that the > problem lies here but elsewhere - which MMC host controller are you > using? Has that driver been properly reviewed - does it use the DMA > API correctly? Does it use the correct DMA direction (which should > always be the same for all DMA API operations on the buffer.) > > It could also be a hardware bug (which is suggested with your > modifications to disable preemption). As cache maintanence is > broadcasted across all cores in a SMP system with Cortex A9, it should > not matter whether we are pre-empted during cache handling for DMA. If > disabling preemption makes a difference for that, it probably means you > have bigger coherency issues on your SoC. > > Yes, you may have found a rare corner case, but we really need to fully > understand the details of what is going on here _before_ making radical > changes to code that has been proven by probably billions of users over > the course of the last nine years. > > Thanks. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* arm: dma-mapping: CPU may not see up-to-date data after DMA transaction 2018-11-22 15:36 ` JABLONSKY Jan @ 2018-11-22 16:36 ` Robin Murphy 2018-11-22 16:57 ` Russell King - ARM Linux 1 sibling, 0 replies; 9+ messages in thread From: Robin Murphy @ 2018-11-22 16:36 UTC (permalink / raw) To: linux-arm-kernel On 22/11/2018 15:36, JABLONSKY Jan wrote: > Yes, you have right, > > Synopsys DesignWare Multimedia Card Interface driver dw_mmc > > SoC is using internal DMA controller (IDMAC) > A Few days ago, I also analyzed driver and proper using of DMA-API. > I found in dw_mci_dmac_complete_dma callback function, that invalidation of caches is performed only in the case > of external DMA. > https://lore.kernel.org/patchwork/patch/1015401/ > > there is some significant progress, but the problem still occurs Frankly, dw_mci_dmac_complete_dma() looks pretty bogus anyway - that dma_sync_sg_for_cpu() is immediately before a call to host->dma_ops->cleanup(), which does the dma_unmap_sg() that I would expect to see at the end of a DMA transfer. So at best that sync should just be redundant with the unmap in terms of cache maintenance, but the fact that is uses a hard-coded direction and a possibly different device really smells of that driver being a lot more rotten in general. Robin. > > > The idea behind using function __sync_cache_range_r is, that the sequence > of cache maintaining is similar as described in L2 cache controller TRM to ensure coherency > > http://infocenter.arm.com/help/topic/com.arm.doc.ddi0246h/DDI0246H_l2c310_r3p3_trm.pdf > > Perform a combination of Clean and Invalidate operations that ensures coherency from > the outside in, and from the inside out. > > CleanLevel1 Address ; This is broadcast within the cluster > DSB ; Ensure completion of the clean as far as Level 2 > Clean&InvalLevel2 Address ; forces the address out past level 2 > SYNC ; Ensures completion of the L2 inval > Clean&InvalLevel1 Address ; This is broadcast within the cluster > DSB ; Ensure completion of the clean&inval as far as Level 2 (no data lost) > > With mentioned sequence, I really can see big difference. > But anyway, I understand what you mean, I will continue with the analysis > > > > > > On Do, 2018-11-22 at 12:51 +0000, Russell King - ARM Linux wrote: >> On Thu, Nov 22, 2018 at 12:25:18PM +0000, JABLONSKY Jan wrote: >>> After DMA transaction is complete, content of the DMA buffer is stored in the main memory. >>> >>> outer_inv_range invalidates cache lines only in L2 level. >> >> Correct. >> >>> outer_cache is dedicated to L2 cache, but CPU must also clean L1 dcache to avoid possible L1 hit >> >> Correct, except we _do_ handle the L1 cache. See dmac_*_area functions >> that are already being called via dma_cache_maint_page(). You did >> analyse the code before proposing these modifications? ;) >> >> Given that we already handle the L1 cache, why do we need more L1 cache >> maintanence? >> >> In any case, in your patch, using phys_to_virt() on the addresses is >> utterly wrong, and will lead to hitting the wrong memory if the physical >> address is a highmem page (this is why we have dma_cache_maint_page(), >> which has the necessary code to handle this case.) The physical to >> virtual translation functions and macros are only valid for lowmem. >> >> So, I think your patch actually ends up breaking stuff by doing cache >> maintanence on unexpected memory locations. >> >> Given, also, that we have shed-loads of Cortex A9 systems out there, >> using DMA with MMC, some using it for the rootfs, I doubt that the >> problem lies here but elsewhere - which MMC host controller are you >> using? Has that driver been properly reviewed - does it use the DMA >> API correctly? Does it use the correct DMA direction (which should >> always be the same for all DMA API operations on the buffer.) >> >> It could also be a hardware bug (which is suggested with your >> modifications to disable preemption). As cache maintanence is >> broadcasted across all cores in a SMP system with Cortex A9, it should >> not matter whether we are pre-empted during cache handling for DMA. If >> disabling preemption makes a difference for that, it probably means you >> have bigger coherency issues on your SoC. >> >> Yes, you may have found a rare corner case, but we really need to fully >> understand the details of what is going on here _before_ making radical >> changes to code that has been proven by probably billions of users over >> the course of the last nine years. >> >> Thanks. >> > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > ^ permalink raw reply [flat|nested] 9+ messages in thread
* arm: dma-mapping: CPU may not see up-to-date data after DMA transaction 2018-11-22 15:36 ` JABLONSKY Jan 2018-11-22 16:36 ` Robin Murphy @ 2018-11-22 16:57 ` Russell King - ARM Linux 2019-01-08 6:10 ` JABLONSKY Jan 1 sibling, 1 reply; 9+ messages in thread From: Russell King - ARM Linux @ 2018-11-22 16:57 UTC (permalink / raw) To: linux-arm-kernel On Thu, Nov 22, 2018 at 03:36:13PM +0000, JABLONSKY Jan wrote: > Yes, you have right, > > Synopsys DesignWare Multimedia Card Interface driver dw_mmc > > SoC is using internal DMA controller (IDMAC) > A Few days ago, I also analyzed driver and proper using of DMA-API. > I found in dw_mci_dmac_complete_dma callback function, that invalidation of caches is performed only in the case > of external DMA. > https://lore.kernel.org/patchwork/patch/1015401/ > > there is some significant progress, but the problem still occurs >From that patch, should I assume you're not using a separate DMAengine, i.o.w., host->use_dma is TRANS_MODE_IDMAC ? Now, setting up the DMA table: for (i = 0; i < sg_len; i++) { unsigned int length = sg_dma_len(&data->sg[i]); u32 mem_addr = sg_dma_address(&data->sg[i]); is wrong. scatterlists do not have to be contiguous arrays, they can be fragmented, and the only way to walk a scatterlist correctly is to use for_each_sg(), iow: struct scatterlist *sg; for_each_sg(data->sg, sg, sg_len, i) { unsigned int length = sg_dma_len(sg); u32 mem_addr = sg_dma_address(sg); The 64-bit path needs similarly fixing. It's interesting that the driver uses for_each_sg() in some places already but not for setting up these tables. It should always use for_each_sg(). dw_mci_dma_cleanup() does this: dma_unmap_sg(host->dev, data->sg, data->sg_len, mmc_get_dma_dir(data)); Note that it uses host->dev. Other DMA functions use the same. However, dw_mci_dmac_complete_dma() uses a different variable: /* Invalidate cache after read */ dma_sync_sg_for_cpu(mmc_dev(host->slot->mmc), data->sg, data->sg_len, DMA_FROM_DEVICE); It just happens that mmc_dev(host->slot->mmc) is the same as host->dev, but this is really bad programming style - it looks like pure obfuscation to me. In any case, since the driver supports DMAengine, this is incorrect - when using DMAengine, the buffer needs to be mapped and unmapped using the DMAengine struct device, not the host controller's struct device. So, using host->dev and mmc_dev(host->slot->mmc) is incorrect in this driver. Next, the comment "/* Invalidate cache after read */" suggests that something is very wrong - the cache handling happens when the buffer is unmapped. That happens at two points in this driver - either via host->dma_ops->cleanup(host); if the buffer was mapped by dw_mci_submit_data_dma(), or by dw_mci_post_req() if it was mapped by dw_mci_pre_req(). Similar applies to: /* Flush cache before write */ if (host->data->flags & MMC_DATA_WRITE) dma_sync_sg_for_device(mmc_dev(host->slot->mmc), sgl, sg_elems, DMA_TO_DEVICE); in dw_mci_edmac_start_dma() - those dma_sync_*() calls really should not be necessary. I'd get rid of those dma_sync_*() calls and try to validate that the buffer has been properly mapped to the _correct_ device, and check that it is properly unmapped after the transfer has completed. IOW, it's mapped to the MMC device if using dw_mci_idmac_ops (iow, host->dev) and the DMAengine device if using dw_mci_edmac_ops (iow, host->dms->ch->device->dev). Note that there's an additional issue when using DMA engine - that is an ordering problem, where the MMC controller may report that the transfer has completed before the DMA engine has finished transferring the data to memory - I see nothing in this driver that caters to that issue. I don't think that's your problem, because I think you're using the internal MMC controller DMA. > Perform a combination of Clean and Invalidate operations that ensures coherency from > the outside in, and from the inside out. > > CleanLevel1 Address ; This is broadcast within the cluster > DSB ; Ensure completion of the clean as far as Level 2 > Clean&InvalLevel2 Address ; forces the address out past level 2 > SYNC ; Ensures completion of the L2 inval > Clean&InvalLevel1 Address ; This is broadcast within the cluster > DSB ; Ensure completion of the clean&inval as far as Level 2 (no data lost) > > With mentioned sequence, I really can see big difference. > But anyway, I understand what you mean, I will continue with the analysis What we end up doing is: - On buffer mapping for DMA_FROM_DEVICE (DMA from device to memory): - invalidate L1 cache - dsb - invalidate L2 cache - sync L2 This is to ensure that no writebacks occur while the DMA is progressing. ... DMA happens ... - On buffer unmapping for DMA_FROM_DEVICE: - invalidate L2 cache - sync L2 - invalidate L1 cache - dsb This is to ensure that any speculatively read cache lines are killed prior to the CPU reading the new data in the buffer. This reflects the L1/L2/L1 sequence talked about above without excessive expense - these cache maintanence operations are _very_ expensive. Note that the CPU _may_ speculatively prefetch while the DMA happens, it may also speculatively prefetch between and during each of the steps in the documentation quote above. Note that the CPU is _not_ allowed to write in any way to the memory buffer while it is mapped for an active DMA operation - that _will_ definitely lead to corrupted data. - On buffer mapping for DMA_TO_DEVICE (DMA from memory to device): - clean L1 cache - dsb - clean L2 cache - sync L2 ... DMA happens ... - On buffer unmapping for DMA_TO_DEVICE: - nothing is necessary, since we must have made the data visible to DMA in the mapping case. Note that there's a requirement that the MMIO or other write that triggers the DMA operation _must_ have an appropriate barrier (eg, dsb) between all previous writes and the write that starts the DMA. That means the non-relaxed write[bwl]() functions must be used, or the driver muse use an explicit barrier and document that. Hope this helps. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: arm: dma-mapping: CPU may not see up-to-date data after DMA transaction 2018-11-22 16:57 ` Russell King - ARM Linux @ 2019-01-08 6:10 ` JABLONSKY Jan 2019-01-08 9:51 ` Russell King - ARM Linux 0 siblings, 1 reply; 9+ messages in thread From: JABLONSKY Jan @ 2019-01-08 6:10 UTC (permalink / raw) To: Russell King - ARM Linux; +Cc: Chris Cole, linux-arm-kernel@lists.infradead.org Hi again, I saw, that there are some findings in v7_dma_inv_range https://patchwork.kernel.org/patch/10673097/ It wouldn't be better to replace it directly with C&I ? Otherwise still there is risk that you lost writes in->out ? (Exactly happens in my case) v7_dma_inv_range: from mcrlo p15, 0, r0, c7, c6, 1 @ invalidate D / U line to mcrlo p15, 0, r0, c7, c14, 1 @ clean & invalidate D / U line Thanks _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: arm: dma-mapping: CPU may not see up-to-date data after DMA transaction 2019-01-08 6:10 ` JABLONSKY Jan @ 2019-01-08 9:51 ` Russell King - ARM Linux 0 siblings, 0 replies; 9+ messages in thread From: Russell King - ARM Linux @ 2019-01-08 9:51 UTC (permalink / raw) To: JABLONSKY Jan; +Cc: Chris Cole, linux-arm-kernel@lists.infradead.org On Tue, Jan 08, 2019 at 06:10:33AM +0000, JABLONSKY Jan wrote: > Hi again, > > I saw, that there are some findings in v7_dma_inv_range > https://patchwork.kernel.org/patch/10673097/ > > > It wouldn't be better to replace it directly with C&I ? No. > Otherwise still there is risk that you lost writes in->out ? > (Exactly happens in my case) For a region that is being used to DMA data _from_ the device, the data we have in the cache need not be written back out except if there are overlapping cache lines - writing it out would be a complete waste of bus bandwidth. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-01-08 9:54 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-11-22 7:36 arm: dma-mapping: CPU may not see up-to-date data after DMA transaction JABLONSKY Jan 2018-11-22 9:29 ` Russell King - ARM Linux 2018-11-22 12:25 ` JABLONSKY Jan 2018-11-22 12:51 ` Russell King - ARM Linux 2018-11-22 15:36 ` JABLONSKY Jan 2018-11-22 16:36 ` Robin Murphy 2018-11-22 16:57 ` Russell King - ARM Linux 2019-01-08 6:10 ` JABLONSKY Jan 2019-01-08 9:51 ` Russell King - ARM Linux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).