Hi,

I am attaching a patch which attempts to reduce the cache operations 
while doing
MMC transactions. I have tested it only on arm and the tests performed 
with benchmarks
like iozone/bonnie showed that the data integrity is maintained while 
I/O bandwidth is increased.
I have tested it with K3.1 and I believe it can apply to 3.12 also.

My understanding from the important API's dealing with DMA memory is as 
follows:
1) dma_map_sg/ dma_sync_sg_for_device -> make sure that cache is flushed 
after CPU is done
updating the memory allocated for DMA and is called before giving 
control of DMA memory to
the device.
2) dma_unmap_sg/dma_sync_sg_for_cpu -> Make sure that cache is 
invalidated before
reading from the DMA area which was used by the device to write the data.

About the patch:
Changes in sdhci_adma_table_pre make sure that we only flush if we have 
updated DMA
area after the call to dma_map_sg.

Changes in sdhci_adma_table_post take care of following:
1) Remove invalidation of cache for memory locations which are going to 
be updated by CPU,
as they are not being read.
2) Perform the unmap of sg before CPU accesses DMA area as the changes 
we did for unaligned
cases might get lost due to invalidation afterwards. I was not able to 
induce unaligned buffer
accesses using normal filesystem/raw device operations. Maybe that's why 
this issue was not
discovered so far.
3) Only drawback is sg->dma_address gets used after the call to 
dma_unmap_sg.

I would like to understand if this patch can cause any regressions for 
any of the architectures
or with the MMC functionality.

Thanks & Regards,
Vishal Annapurve