public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [Patch] dma_sync_to_device
@ 2004-02-10 17:31 Martin Diehl
  2004-02-10 18:42 ` David S. Miller
  2004-02-11  6:17 ` Deepak Saxena
  0 siblings, 2 replies; 23+ messages in thread
From: Martin Diehl @ 2004-02-10 17:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel


Hi Dave,

last fall we agreed there is a dma call missing to sync streaming out dma 
mappings before giving the buffer back to the device. I've sent you a 
patch which you liked IIRC. However, I wasn't repeatedly polling you for 
actually inclusion due to the 2.6 stabilization period.

Anyway, I see now other related stuff (like dma_pool patches) getting in
so I'm wondering whether it might be the right moment now - right after 
2.6.3-final I'd suggest. So let me resend and ask for application. I've 
just retested with 2.6.3-rc2 and verified the old patch still applies 
cleanly and works as expected.

The patch does:
* provide generic dma_sync_to_device_{single|sg} api
* add i386 pci implementation
* related Documentation/DMA-mapping update

Thanks,
Martin

-----------------------

--- linux-2.6.0-test8/include/asm-i386/dma-mapping.h	Wed Oct  8 21:24:53 2003
+++ v2.6.0-test8-md/include/asm-i386/dma-mapping.h	Tue Oct 21 10:56:45 2003
@@ -92,6 +92,20 @@
 	flush_write_buffers();
 }
 
+static inline void
+dma_sync_to_device_single(struct device *dev, dma_addr_t dma_handle, size_t size,
+		enum dma_data_direction direction)
+{
+	flush_write_buffers();
+}
+
+static inline void
+dma_sync_to_device_sg(struct device *dev, struct scatterlist *sg, int nelems,
+		 enum dma_data_direction direction)
+{
+	flush_write_buffers();
+}
+
 static inline int
 dma_supported(struct device *dev, u64 mask)
 {
--- linux-2.6.0-test8/include/asm-generic/pci-dma-compat.h	Wed Oct  8 21:24:02 2003
+++ v2.6.0-test8-md/include/asm-generic/pci-dma-compat.h	Tue Oct 21 10:55:09 2003
@@ -84,4 +84,18 @@
 	dma_sync_sg(hwdev == NULL ? NULL : &hwdev->dev, sg, nelems, (enum dma_data_direction)direction);
 }
 
+static inline void
+pci_dma_sync_to_device_single(struct pci_dev *hwdev, dma_addr_t dma_handle,
+		    size_t size, int direction)
+{
+	dma_sync_to_device_single(hwdev == NULL ? NULL : &hwdev->dev, dma_handle, size, (enum dma_data_direction)direction);
+}
+
+static inline void
+pci_dma_sync_to_device_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+		int nelems, int direction)
+{
+	dma_sync_to_device_sg(hwdev == NULL ? NULL : &hwdev->dev, sg, nelems, (enum dma_data_direction)direction);
+}
+
 #endif
--- linux-2.6.0-test8/Documentation/DMA-mapping.txt	Wed Oct  8 21:24:06 2003
+++ v2.6.0-test8-md/Documentation/DMA-mapping.txt	Tue Oct 21 11:27:17 2003
@@ -543,8 +543,11 @@
 all bus addresses.
 
 If you need to use the same streaming DMA region multiple times and touch
-the data in between the DMA transfers, just map it with
-pci_map_{single,sg}, and after each DMA transfer call either:
+the data in between the DMA transfers, the buffer needs to be synced
+depending on the transfer direction.
+
+When reading from the device, just map it with pci_map_{single,sg},
+and after each DMA transfer call either:
 
 	pci_dma_sync_single(dev, dma_handle, size, direction);
 
@@ -553,6 +556,20 @@
 	pci_dma_sync_sg(dev, sglist, nents, direction);
 
 as appropriate.
+
+When writing to the mapped the buffer, prepare the data and
+then before giving the buffer to the hardware call either:
+
+	pci_dma_sync_to_device_single(dev, dma_handle, size, direction);
+
+or:
+
+	pci_dma_sync_to_device_sg(dev, sglist, nents, direction);
+
+as appropriate.
+
+For bidirectional mappings the corresponding calls are required before and
+after passing ownership between cpu and hardware.
 
 After the last DMA transfer call one of the DMA unmap routines
 pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_*


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-10 17:31 [Patch] dma_sync_to_device Martin Diehl
@ 2004-02-10 18:42 ` David S. Miller
  2004-02-10 18:59   ` Martin Diehl
  2004-02-11  6:17 ` Deepak Saxena
  1 sibling, 1 reply; 23+ messages in thread
From: David S. Miller @ 2004-02-10 18:42 UTC (permalink / raw)
  To: Martin Diehl; +Cc: linux-kernel

On Tue, 10 Feb 2004 18:31:40 +0100 (CET)
Martin Diehl <lists@mdiehl.de> wrote:

> last fall we agreed there is a dma call missing to sync streaming out dma 
> mappings before giving the buffer back to the device. I've sent you a 
> patch which you liked IIRC. However, I wasn't repeatedly polling you for 
> actually inclusion due to the 2.6 stabilization period.
> 
> Anyway, I see now other related stuff (like dma_pool patches) getting in
> so I'm wondering whether it might be the right moment now - right after 
> 2.6.3-final I'd suggest. So let me resend and ask for application. I've 
> just retested with 2.6.3-rc2 and verified the old patch still applies 
> cleanly and works as expected.

Believe it or not your work still sits deep in my inbox waiting for my backlog
to work on back to it.

I'll try to get to this again.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-10 18:42 ` David S. Miller
@ 2004-02-10 18:59   ` Martin Diehl
  0 siblings, 0 replies; 23+ messages in thread
From: Martin Diehl @ 2004-02-10 18:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

On Tue, 10 Feb 2004, David S. Miller wrote:

> Believe it or not your work still sits deep in my inbox waiting for my backlog
> to work on back to it.
> 
> I'll try to get to this again.

Ok, Thanks.

It's not particularly urgent but my concern was just the patch wouldn't 
apply any longer once colliding dma stuff gets in...
And the dma_pool stuff might be a good motivation for arch maintainers to 
adopt this in one go ;-)

Martin


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-10 17:31 [Patch] dma_sync_to_device Martin Diehl
  2004-02-10 18:42 ` David S. Miller
@ 2004-02-11  6:17 ` Deepak Saxena
  2004-02-11  6:51   ` Martin Diehl
  1 sibling, 1 reply; 23+ messages in thread
From: Deepak Saxena @ 2004-02-11  6:17 UTC (permalink / raw)
  To: Martin Diehl; +Cc: David S. Miller, linux-kernel

On Feb 10 2004, at 18:31, Martin Diehl was caught saying:
[snip]
> --- linux-2.6.0-test8/Documentation/DMA-mapping.txt	Wed Oct  8 21:24:06 2003
> +++ v2.6.0-test8-md/Documentation/DMA-mapping.txt	Tue Oct 21 11:27:17 2003
> @@ -543,8 +543,11 @@
>  all bus addresses.
>  
>  If you need to use the same streaming DMA region multiple times and touch
> -the data in between the DMA transfers, just map it with
> -pci_map_{single,sg}, and after each DMA transfer call either:
> +the data in between the DMA transfers, the buffer needs to be synced
> +depending on the transfer direction.
> +
> +When reading from the device, just map it with pci_map_{single,sg},
> +and after each DMA transfer call either:
>  
>  	pci_dma_sync_single(dev, dma_handle, size, direction);
>  
> @@ -553,6 +556,20 @@
>  	pci_dma_sync_sg(dev, sglist, nents, direction);
>  
>  as appropriate.
> +
> +When writing to the mapped the buffer, prepare the data and
> +then before giving the buffer to the hardware call either:
> +
> +	pci_dma_sync_to_device_single(dev, dma_handle, size, direction);
> +
> +or:
> +
> +	pci_dma_sync_to_device_sg(dev, sglist, nents, direction);

Maybe I am missunderstanding something, but how is this
any different than simply doing:

	pci_dma_sync_single(dev, dma_handle, size, DMA_TO_DEVICE);

My understanding of the API is that I can map a buffer
as DMA_BIDIRECTIONAL and then specify the direction. An
existing working example is in the eepro100 driver 
in speedo_init_rx_ring():

   sp->rx_ring_dma[i] = pci_map_single(sp->pdev, rxf, 
              PKT_BUF_SZ + sizeof(struct RxFD), PCI_DMA_BIDIRECTIONAL);

later in the same function:
   
   pci_dma_sync_single(sp->pdev, sp->rx_ring_dma[i],
              sizeof(struct RxFD), PCI_DMA_TODEVICE);

If this is not allowed, then the docs and this driver
(along with any others using it) need to be updated.
If this acceptable, why add an existing function to
achieve the same goal?

~Deepak

-- 
Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-11  6:17 ` Deepak Saxena
@ 2004-02-11  6:51   ` Martin Diehl
  2004-02-11 16:39     ` Deepak Saxena
  2004-02-11 18:18     ` Matt Porter
  0 siblings, 2 replies; 23+ messages in thread
From: Martin Diehl @ 2004-02-11  6:51 UTC (permalink / raw)
  To: Deepak Saxena; +Cc: David S. Miller, linux-kernel

On Tue, 10 Feb 2004, Deepak Saxena wrote:

> > +	pci_dma_sync_to_device_single(dev, dma_handle, size, direction);
> 
> Maybe I am missunderstanding something, but how is this
> any different than simply doing:
> 
> 	pci_dma_sync_single(dev, dma_handle, size, DMA_TO_DEVICE);

For i386 you are right: the implementation of pci_dma_sync_single and 
pci_dma_sync_to_device_single happen to be identical. This is because this 
arch is cache-coherent so all we have to do to achieve consistency is 
flushing the buffers. However, for other arch's there might be significant 
dependency on the direction.

The existing pci_dma_sync_single was meant for the FROM_DEVICE direction 
only. I agree it's not entirely obvious currently. But making it 
BIDIRECTIONAL would be pretty expensive on some none cache-coherent archs 
and the whole point of having separate streaming mappings with dedicated 
TO or FROM direction would be void.

> My understanding of the API is that I can map a buffer
> as DMA_BIDIRECTIONAL and then specify the direction. An
> existing working example is in the eepro100 driver 
> in speedo_init_rx_ring():
> 
>    sp->rx_ring_dma[i] = pci_map_single(sp->pdev, rxf, 
>               PKT_BUF_SZ + sizeof(struct RxFD), PCI_DMA_BIDIRECTIONAL);

For an rx_ring I tend to say this should be FROM_DEVICE but would work 
anyway, probably with some unneded overhead when syncing.

> later in the same function:
>    
>    pci_dma_sync_single(sp->pdev, sp->rx_ring_dma[i],
>               sizeof(struct RxFD), PCI_DMA_TODEVICE);

IMHO that's an bug. It happens to work on i386, but currently there's no 
dma-api call to resync the outgoing streaming maps. So if the drivers 
has modified rx_ring_dma and wants to sync so the device will see the 
changes consistently, this might fail on other archs.

And I'm wondering why this driver syncs the rx_ring with direction 
TODDEVICE in the first place - the direction-parameter indicates the 
direction of the dma transfer, not the act of giving buffer ownership to 
the hardware. Is this hardware reading from the rx_ring buffer? Sorry if I 
missunderstood what the rx_ring_dma[] is in this case - if this are just 
the ring descriptors (in contrast to the actual buffers) I believe the 
whole mapping should just be consistent, not streaming.

> If this is not allowed, then the docs and this driver
> (along with any others using it) need to be updated.
> If this acceptable, why add an existing function to
> achieve the same goal?

That's why my patch updates Documentation/DMA-mapping.txt ;-)

Martin


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-11  6:51   ` Martin Diehl
@ 2004-02-11 16:39     ` Deepak Saxena
  2004-02-11 17:51       ` David S. Miller
  2004-02-11 18:18     ` Matt Porter
  1 sibling, 1 reply; 23+ messages in thread
From: Deepak Saxena @ 2004-02-11 16:39 UTC (permalink / raw)
  To: Martin Diehl; +Cc: David S. Miller, linux-kernel

On Feb 11 2004, at 07:51, Martin Diehl was caught saying:
> On Tue, 10 Feb 2004, Deepak Saxena wrote:
> 
> > > +	pci_dma_sync_to_device_single(dev, dma_handle, size, direction);
> > 
> > Maybe I am missunderstanding something, but how is this
> > any different than simply doing:
> > 
> > 	pci_dma_sync_single(dev, dma_handle, size, DMA_TO_DEVICE);
> 
> For i386 you are right: the implementation of pci_dma_sync_single and 
> pci_dma_sync_to_device_single happen to be identical. This is because this 
> arch is cache-coherent so all we have to do to achieve consistency is 
> flushing the buffers. However, for other arch's there might be significant 
> dependency on the direction.
> 
> The existing pci_dma_sync_single was meant for the FROM_DEVICE direction 
> only. I agree it's not entirely obvious currently. But making it 
> BIDIRECTIONAL would be pretty expensive on some none cache-coherent archs 
> and the whole point of having separate streaming mappings with dedicated 
> TO or FROM direction would be void.

If pci_dma_sync_single is for FROM_DEVICE only, than the direction
parameter should go away from it and the from
pci_dma_sync_to_device_single().

> > My understanding of the API is that I can map a buffer
> > as DMA_BIDIRECTIONAL and then specify the direction. An
> > existing working example is in the eepro100 driver 
> > in speedo_init_rx_ring():
> > 
> >    sp->rx_ring_dma[i] = pci_map_single(sp->pdev, rxf, 
> >               PKT_BUF_SZ + sizeof(struct RxFD), PCI_DMA_BIDIRECTIONAL);
> 
> For an rx_ring I tend to say this should be FROM_DEVICE but would work 
> anyway, probably with some unneded overhead when syncing.
> 
> > later in the same function:
> >    
> >    pci_dma_sync_single(sp->pdev, sp->rx_ring_dma[i],
> >               sizeof(struct RxFD), PCI_DMA_TODEVICE);
> 
> IMHO that's an bug. It happens to work on i386, but currently there's no 
> dma-api call to resync the outgoing streaming maps. So if the drivers 
> has modified rx_ring_dma and wants to sync so the device will see the 
> changes consistently, this might fail on other archs.

It works on ARM also, which has no cache coherency at all. This driver
has been in use for years on many architectures, so I think everyone has 
interpreted the mapping API as allowing the above scenario. 

> And I'm wondering why this driver syncs the rx_ring with direction 
> TODDEVICE in the first place - the direction-parameter indicates the 
> direction of the dma transfer, not the act of giving buffer ownership to 
> the hardware. Is this hardware reading from the rx_ring buffer? Sorry if I 
> missunderstood what the rx_ring_dma[] is in this case - if this are just 
> the ring descriptors (in contrast to the actual buffers) I believe the 
> whole mapping should just be consistent, not streaming.

rx_ring_dma is the buffers + descriptors. The eepro100 driver allocates
them both into a skb via  dev_alloc_skb(PKT_BUF_SZ + sizeof(struct RxFD))
and after filling in the RxFD portion (the descriptor), it is syncing
it to the device (cache writeback on ARM) b/c the device will be DMAing.
Consistent mapping wont work b/c they are skbs. Later on, after the
data has arrived, the driver does a sync FROM_DEVICE (cache invalidate
on ARM).

So in this situtation the whole drive would have to be re-architcted
and the RxFD taken out of the skb so that it can be sync'd to the device
and the buffer?  (Not that anyone probably still uses the eepro100 driver,
but a good example of the level of work required).

~Deepak

-- 
Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-11 16:39     ` Deepak Saxena
@ 2004-02-11 17:51       ` David S. Miller
  0 siblings, 0 replies; 23+ messages in thread
From: David S. Miller @ 2004-02-11 17:51 UTC (permalink / raw)
  To: dsaxena; +Cc: lists, linux-kernel

On Wed, 11 Feb 2004 09:39:01 -0700
Deepak Saxena <dsaxena@plexity.net> wrote:

> If pci_dma_sync_single is for FROM_DEVICE only, than the direction
> parameter should go away from it and the from
> pci_dma_sync_to_device_single().

This is wrong.  The direction parameter says what was done by the device/cpu
for the DMA, this is needed by the port to know how to perform the
pci_dma_sync_single et al. correctly.

For example, a port may have to do something different for FROM_DEVICE vs.
TO_DEVICE to properly execute the pci_dma_sync_single() request.

MIPS (and seemingly ARM) are probably the best platforms by which to draw up
the worst case scenerio for the implementation of these things :) and thus
the optimizations made possible by certain combinations of request+direction.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-11  6:51   ` Martin Diehl
  2004-02-11 16:39     ` Deepak Saxena
@ 2004-02-11 18:18     ` Matt Porter
  2004-02-11 18:30       ` David S. Miller
  2004-02-11 18:43       ` linux-2.6.2 Kernel Problem Elikster
  1 sibling, 2 replies; 23+ messages in thread
From: Matt Porter @ 2004-02-11 18:18 UTC (permalink / raw)
  To: Martin Diehl; +Cc: Deepak Saxena, David S. Miller, linux-kernel

On Wed, Feb 11, 2004 at 07:51:48AM +0100, Martin Diehl wrote:
> On Tue, 10 Feb 2004, Deepak Saxena wrote:
> 
> > > +	pci_dma_sync_to_device_single(dev, dma_handle, size, direction);
> > 
> > Maybe I am missunderstanding something, but how is this
> > any different than simply doing:
> > 
> > 	pci_dma_sync_single(dev, dma_handle, size, DMA_TO_DEVICE);
> 
> For i386 you are right: the implementation of pci_dma_sync_single and 
> pci_dma_sync_to_device_single happen to be identical. This is because this 
> arch is cache-coherent so all we have to do to achieve consistency is 
> flushing the buffers. However, for other arch's there might be significant 
> dependency on the direction.

Sure, other non cache coherent arch's that I'm aware of (PPC, ARM, etc.)
already implement the least expensive cache operations based on the
direction parameter in pci_dma_sync_single(). On PPC, we do the right
thing based on each of three valid directions, I don't yet see what
additional information pci_dma_sync_to_device_single() provides. 

> The existing pci_dma_sync_single was meant for the FROM_DEVICE direction 
> only. I agree it's not entirely obvious currently. But making it 

It's definitely not obvious since DMA-mapping.txt shows it as having
a direction parameter and makes no claim that it is only for the
FROM_DEVICE direction.  In addition, all the non-coherent arches I've
worked on implement all the directions.

> BIDIRECTIONAL would be pretty expensive on some none cache-coherent archs 
> and the whole point of having separate streaming mappings with dedicated 
> TO or FROM direction would be void.

BIDIRECTIONAL would be expensive, yes, that's why pci_dma_sync_single()
implementation use the already present directional information to do
the right thing.

Maybe we need a clear example.

-Matt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-11 18:18     ` Matt Porter
@ 2004-02-11 18:30       ` David S. Miller
  2004-02-11 18:57         ` Deepak Saxena
  2004-02-11 19:23         ` Matt Porter
  2004-02-11 18:43       ` linux-2.6.2 Kernel Problem Elikster
  1 sibling, 2 replies; 23+ messages in thread
From: David S. Miller @ 2004-02-11 18:30 UTC (permalink / raw)
  To: Matt Porter; +Cc: lists, dsaxena, linux-kernel

On Wed, 11 Feb 2004 11:18:00 -0700
Matt Porter <mporter@kernel.crashing.org> wrote:

> Sure, other non cache coherent arch's that I'm aware of (PPC, ARM, etc.)
> already implement the least expensive cache operations based on the
> direction parameter in pci_dma_sync_single(). On PPC, we do the right
> thing based on each of three valid directions, I don't yet see what
> additional information pci_dma_sync_to_device_single() provides. 

There are two points in time where you want to sync:

1) Right after the device has done a DMA transaction, and the cpu
   wishes to read/write the datum.

2) Right after the cpu has read/write the datum, and we like to let the
   device DMA to/from the thing again.

That is the distinction provided by the two interfaces.

Consider something like MIPS, cache flushes needed for both of the above
operations:

1) pci_map_single(), device DMA's from the buffer.

2) pci_dma_sync_single().  Cpu writes some new command or
   status flag into the buffer.

3) pci_dma_sync_to_device_single(), now device is asked to DMA from the buffer
   again.

Cache flushes are needed on MIPS for both step #2 and #3, and different kinds of
flushes in fact.

Do you understand the need for this now?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* linux-2.6.2 Kernel Problem
  2004-02-11 18:18     ` Matt Porter
  2004-02-11 18:30       ` David S. Miller
@ 2004-02-11 18:43       ` Elikster
  2004-02-14 11:51         ` Adrian Bunk
  1 sibling, 1 reply; 23+ messages in thread
From: Elikster @ 2004-02-11 18:43 UTC (permalink / raw)
  To: linux-kernel

Greetings,

   I noticed two problems that seems to come up after using the 2.6.2 kernel on the production servers.  One is the RPM system. It seems to barf badly due to the db3 vs db4 errors when updating the packages on Redhat 9.0 system.

   Second one is little more weirder.  It is with Horde system which it works all the way, except for sending emails, which it seems to have some problem with it.

   Both works fine under 2.4 kernels, but after switching to 2.6, it seems to have certain problems with it.  Any ideas on this?


-- 
Best regards,
 Elikster                            mailto:elik@webspires.com


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-11 18:30       ` David S. Miller
@ 2004-02-11 18:57         ` Deepak Saxena
  2004-02-11 19:08           ` David S. Miller
  2004-02-11 19:23         ` Matt Porter
  1 sibling, 1 reply; 23+ messages in thread
From: Deepak Saxena @ 2004-02-11 18:57 UTC (permalink / raw)
  To: David S. Miller; +Cc: Matt Porter, lists, linux-kernel

On Feb 11 2004, at 10:30, David S. Miller was caught saying:
> On Wed, 11 Feb 2004 11:18:00 -0700
> Matt Porter <mporter@kernel.crashing.org> wrote:
> 
> > Sure, other non cache coherent arch's that I'm aware of (PPC, ARM, etc.)
> > already implement the least expensive cache operations based on the
> > direction parameter in pci_dma_sync_single(). On PPC, we do the right
> > thing based on each of three valid directions, I don't yet see what
> > additional information pci_dma_sync_to_device_single() provides. 
> 
> There are two points in time where you want to sync:
> 
> 1) Right after the device has done a DMA transaction, and the cpu
>    wishes to read/write the datum.
> 
> 2) Right after the cpu has read/write the datum, and we like to let the
>    device DMA to/from the thing again.
> 
> That is the distinction provided by the two interfaces.
> 
> Consider something like MIPS, cache flushes needed for both of the above
> operations:
> 
> 1) pci_map_single(), device DMA's from the buffer.
> 
> 2) pci_dma_sync_single().  Cpu writes some new command or
>    status flag into the buffer.
> 
> 3) pci_dma_sync_to_device_single(), now device is asked to DMA from the buffer
>    again.
> 
> Cache flushes are needed on MIPS for both step #2 and #3, and different kinds of
> flushes in fact.
> 
> Do you understand the need for this now?

Not really. Steps 2 and 3 can be done by simply calling pci_dma_sync_single()
with the appropriate direction flag.  I don't understand why a 
pci_dma_sync_single() is needed after the device does a DMA from the 
buffer and before the CPU writes a command.  After the CPU writes data to the
buffer, it can do a pci_dma_sync_single(..., DMA_TO_DEVICE), which causes
a cache flush. Isn't this what we're already doing today?  Why do we need 
to do a cache flush before the CPU writes data into the buffer which is
then immediatelly going to be flushed?

~Deepak

-- 
Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-11 18:57         ` Deepak Saxena
@ 2004-02-11 19:08           ` David S. Miller
  2004-02-12  3:46             ` Deepak Saxena
  2004-02-13  1:49             ` Jamie Lokier
  0 siblings, 2 replies; 23+ messages in thread
From: David S. Miller @ 2004-02-11 19:08 UTC (permalink / raw)
  To: dsaxena; +Cc: mporter, lists, linux-kernel

On Wed, 11 Feb 2004 11:57:25 -0700
Deepak Saxena <dsaxena@plexity.net> wrote:

> > 1) pci_map_single(), device DMA's from the buffer.
> > 
> > 2) pci_dma_sync_single().  Cpu writes some new command or
> >    status flag into the buffer.
> > 
> > 3) pci_dma_sync_to_device_single(), now device is asked to DMA from the buffer
> >    again.
> > 
> > Cache flushes are needed on MIPS for both step #2 and #3, and different kinds of
> > flushes in fact.
> > 
> > Do you understand the need for this now?
> 
> Not really. Steps 2 and 3 can be done by simply calling pci_dma_sync_single()
> with the appropriate direction flag.

No, direction says what device did or is going to do with the buffer.

> I don't understand why a 
> pci_dma_sync_single() is needed after the device does a DMA from the 
> buffer and before the CPU writes a command.

To flush PCI controller caches.

> After the CPU writes data to the
> buffer, it can do a pci_dma_sync_single(..., DMA_TO_DEVICE), which causes
> a cache flush. Isn't this what we're already doing today?

It is different.  pci_dma_sync_single(..., DMA_TO_DEVICE), on MIPS for example,
would do absolutely nothing.  At mapping time, the local cpu cache was flushed,
and assuming the MIPS pci controllers don't have caches of their own there is
nothing to flush there either.

Whereas pci_dma_sync_device_single() would flush the dirty lines from the cpu
caches.  In fact, it will perform the same CPU cache flushes as pci_map_single()
did, using MIPS as the example again.

New sequence:

1) pci_map_single(..., DMA_TO_DEVICE).  Flush dirty data from cpu caches to memory,
   so device may see it.

2) device reads buffer

3) pci_dma_sync_single(... DMA_TO_DEVICE).  If PCI controller has caches, flush them.

4) CPU writes new buffer data.

5) pci_dma_sync_device_single(... DMA_TO_DEVICE).  Like #1, flush dirty data from cpu
   caches to memory.

6) Device reads buffer.

Still disagree? :-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-11 18:30       ` David S. Miller
  2004-02-11 18:57         ` Deepak Saxena
@ 2004-02-11 19:23         ` Matt Porter
  2004-02-11 19:30           ` David S. Miller
  1 sibling, 1 reply; 23+ messages in thread
From: Matt Porter @ 2004-02-11 19:23 UTC (permalink / raw)
  To: David S. Miller; +Cc: Matt Porter, lists, dsaxena, linux-kernel

On Wed, Feb 11, 2004 at 10:30:56AM -0800, David S. Miller wrote:
> On Wed, 11 Feb 2004 11:18:00 -0700
> Matt Porter <mporter@kernel.crashing.org> wrote:
> 
> > Sure, other non cache coherent arch's that I'm aware of (PPC, ARM, etc.)
> > already implement the least expensive cache operations based on the
> > direction parameter in pci_dma_sync_single(). On PPC, we do the right
> > thing based on each of three valid directions, I don't yet see what
> > additional information pci_dma_sync_to_device_single() provides. 
> 
> There are two points in time where you want to sync:
> 
> 1) Right after the device has done a DMA transaction, and the cpu
>    wishes to read/write the datum.
> 
> 2) Right after the cpu has read/write the datum, and we like to let the
>    device DMA to/from the thing again.
>
> That is the distinction provided by the two interfaces.
 
Ok, this distinction helps a bit.

> Consider something like MIPS, cache flushes needed for both of the above
> operations:
> 
> 1) pci_map_single(), device DMA's from the buffer.
> 
> 2) pci_dma_sync_single().  Cpu writes some new command or
>    status flag into the buffer.
> 
> 3) pci_dma_sync_to_device_single(), now device is asked to DMA from the buffer
>    again.
> 
> Cache flushes are needed on MIPS for both step #2 and #3, and different kinds of
> flushes in fact.
> 
> Do you understand the need for this now?

Actually, not yet.  Is it not possible for MIPS to determine the correct
cache operation to use if step #3 used a pci_dma_sync_single() with a
TO_DEVICE direction?  

I'm guessing that MIPS must have some kind of bridge cache in order to
require the pci_dma_sync_to_device_single() if I'm starting to follow
this.

-Matt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-11 19:23         ` Matt Porter
@ 2004-02-11 19:30           ` David S. Miller
  0 siblings, 0 replies; 23+ messages in thread
From: David S. Miller @ 2004-02-11 19:30 UTC (permalink / raw)
  To: Matt Porter; +Cc: mporter, lists, dsaxena, linux-kernel

On Wed, 11 Feb 2004 12:23:19 -0700
Matt Porter <mporter@kernel.crashing.org> wrote:

> On Wed, Feb 11, 2004 at 10:30:56AM -0800, David S. Miller wrote:
> > 1) pci_map_single(), device DMA's from the buffer.
> > 
> > 2) pci_dma_sync_single().  Cpu writes some new command or
> >    status flag into the buffer.
> > 
> > 3) pci_dma_sync_to_device_single(), now device is asked to DMA from the buffer
> >    again.
> 
> Actually, not yet.  Is it not possible for MIPS to determine the correct
> cache operation to use if step #3 used a pci_dma_sync_single() with a
> TO_DEVICE direction?  

It should do a writeback of dirty data from the cpu cache, so that the device
may see it after pci_dma_sync_to_device_single() completes.

> I'm guessing that MIPS must have some kind of bridge cache in order to
> require the pci_dma_sync_to_device_single() if I'm starting to follow
> this.

Some platforms do, some don't.  Sparc64 has PCI controller DMA caches but it's
cpu caches are fully coherent, for example.  On such a platform this new interface
is going to be a NOP, but on things like MIPS it will not be.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-11 19:08           ` David S. Miller
@ 2004-02-12  3:46             ` Deepak Saxena
  2004-02-12  3:58               ` David S. Miller
  2004-02-13  1:49             ` Jamie Lokier
  1 sibling, 1 reply; 23+ messages in thread
From: Deepak Saxena @ 2004-02-12  3:46 UTC (permalink / raw)
  To: David S. Miller; +Cc: mporter, lists, linux-kernel

On Feb 11 2004, at 11:08, David S. Miller was caught saying:
> New sequence:
> 
> 1) pci_map_single(..., DMA_TO_DEVICE).  Flush dirty data from cpu caches to memory,
>    so device may see it.
> 
> 2) device reads buffer
> 
> 3) pci_dma_sync_single(... DMA_TO_DEVICE).  If PCI controller has caches, flush them.
> 
> 4) CPU writes new buffer data.
> 
> 5) pci_dma_sync_device_single(... DMA_TO_DEVICE).  Like #1, flush dirty data from cpu
>    caches to memory.
> 
> 6) Device reads buffer.
> 
> Still disagree? :-)

/me groks now. I am assuming this is a 2.7 thing as it is
reinterpreting/redefining the API. In ARM, pci_dma_sync_single() does
a cache flush, which is why I was confused asked about two cache flushes.
What you are proposing is that by definition pci_dma_sync_* syncs 
bridges caches with system memory, while pci_dma_sync_device_* syncs 
the CPU's cache with system memory.

This will definetely confuse a lot of driver writers. 

~Deepak

-- 
Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-12  3:46             ` Deepak Saxena
@ 2004-02-12  3:58               ` David S. Miller
  0 siblings, 0 replies; 23+ messages in thread
From: David S. Miller @ 2004-02-12  3:58 UTC (permalink / raw)
  To: dsaxena; +Cc: mporter, lists, linux-kernel

On Wed, 11 Feb 2004 20:46:11 -0700
Deepak Saxena <dsaxena@plexity.net> wrote:

> /me groks now. I am assuming this is a 2.7 thing as it is
> reinterpreting/redefining the API.

No, it is adding an operation that previously was impossible to
make occur.

> In ARM, pci_dma_sync_single() does
> a cache flush, which is why I was confused asked about two cache flushes.
> What you are proposing is that by definition pci_dma_sync_* syncs 
> bridges caches with system memory, while pci_dma_sync_device_* syncs 
> the CPU's cache with system memory.
> 
> This will definetely confuse a lot of driver writers. 

What I am proposing, is to keep pci_dma_sync_*() with it's current definition
which is to transfer control of the buffer from DEVICE to CPU.  It has always
been defined this way.

I am also adding a new operation, pci_dma_sync_device_*() which transfers control
of the buffer from CPU back to DEVICE.  This operation was not possible previously.

Therefore we will merge this into both 2.6.x and 2.4.x as soon as a suitable full
and reviewed patch exists.

If you flush the cache on ARM for pci_dma_sync_*(), so what?  If the cpu writes to
the buffer then asks the device to DMA from the area it's going to see garbage.

To be honest there aren't many pci_dma_sync_*() users, and thus it is relatively
easy to audit them all for problems in this area.  Yes, updates and clarifications
in the docs will be necessary as well.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-11 19:08           ` David S. Miller
  2004-02-12  3:46             ` Deepak Saxena
@ 2004-02-13  1:49             ` Jamie Lokier
  2004-02-14  7:24               ` David S. Miller
  1 sibling, 1 reply; 23+ messages in thread
From: Jamie Lokier @ 2004-02-13  1:49 UTC (permalink / raw)
  To: David S. Miller; +Cc: dsaxena, mporter, lists, linux-kernel

David S. Miller wrote:
> It is different.  pci_dma_sync_single(..., DMA_TO_DEVICE), on MIPS for example,
> would do absolutely nothing.  At mapping time, the local cpu cache was flushed,
> and assuming the MIPS pci controllers don't have caches of their own there is
> nothing to flush there either.
> 
> Whereas pci_dma_sync_device_single() would flush the dirty lines from the cpu
> caches.  In fact, it will perform the same CPU cache flushes as pci_map_single()
> did, using MIPS as the example again.

The names are a bit confusing.
How about changing them to:

    pci_dma_sync_single         => pci_dma_sync_for_cpu
    pci_dma_sync_device_single  => pci_dma_sync_for_device

-- Jamie

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
@ 2004-02-13 14:27 James Bottomley
  2004-02-14  8:51 ` Martin Diehl
  0 siblings, 1 reply; 23+ messages in thread
From: James Bottomley @ 2004-02-13 14:27 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Martin Diehl

Just to be sure we eliminate all confusion (and it amazes me how
frequently this comes up), could you add some words DMA-mapping.txt to
make clear that this new API does not address PCI posting (which is a
problem with onward write cache flushing in the PCI bridge)---otherwise
I can see device driver writers thinking that
pci_dma_sync_to_device_single(... DMA_TO_DEVICE) will flush posted
writes.

Thanks,

James



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-13  1:49             ` Jamie Lokier
@ 2004-02-14  7:24               ` David S. Miller
  0 siblings, 0 replies; 23+ messages in thread
From: David S. Miller @ 2004-02-14  7:24 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: dsaxena, mporter, lists, linux-kernel

On Fri, 13 Feb 2004 01:49:53 +0000
Jamie Lokier <jamie@shareable.org> wrote:

> The names are a bit confusing.
> How about changing them to:
> 
>     pci_dma_sync_single         => pci_dma_sync_for_cpu
>     pci_dma_sync_device_single  => pci_dma_sync_for_device

Ok, but we have to keep pci_dma_sync_single around as an alias
to pci_dma_sync_for_cpu, at least for some time.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-13 14:27 [Patch] dma_sync_to_device James Bottomley
@ 2004-02-14  8:51 ` Martin Diehl
  2004-02-14 22:34   ` James Bottomley
  2004-02-14 23:18   ` David S. Miller
  0 siblings, 2 replies; 23+ messages in thread
From: Martin Diehl @ 2004-02-14  8:51 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linux Kernel

On 13 Feb 2004, James Bottomley wrote:

> Just to be sure we eliminate all confusion (and it amazes me how
> frequently this comes up), could you add some words DMA-mapping.txt to
> make clear that this new API does not address PCI posting (which is a
> problem with onward write cache flushing in the PCI bridge)---otherwise
> I can see device driver writers thinking that
> pci_dma_sync_to_device_single(... DMA_TO_DEVICE) will flush posted
> writes.

Ok, will do.

Just to make sure I got you right, your concern is people mixing up the 
effect of this call with some means to sync with posted writes on iomem 
mapped memory location provided by some device on the bus? I.e. the 
situation where they probably want to use readl() or similar to force the 
posted writes to complete in contrast to sync_to_device which ensures the 
modified system memory gets synced before the busmastering starts?

If so, wouldn't this concern be related to the pci_dma api as a whole, not 
only the new pci_dma_sync_to_device call particularly? I mean, none of the 
api functions to deal with consistent or streaming maps of system memory 
are applicable for flushing posted writes to iomem on the bus. Maybe the 
confusion comes from the fact on some archs the pci_dma_sync call would 
have to flush posted writes _from_ the busmaster device to system memory?

Could you please confirm my understanding of the issue is correct (or not) 
before I'm trying to add some clarification to DMA-mapping.txt.

Martin


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: linux-2.6.2 Kernel Problem
  2004-02-11 18:43       ` linux-2.6.2 Kernel Problem Elikster
@ 2004-02-14 11:51         ` Adrian Bunk
  0 siblings, 0 replies; 23+ messages in thread
From: Adrian Bunk @ 2004-02-14 11:51 UTC (permalink / raw)
  To: Elikster; +Cc: linux-kernel

On Wed, Feb 11, 2004 at 11:43:51AM -0700, Elikster wrote:

> Greetings,
> 
>    I noticed two problems that seems to come up after using the 2.6.2 kernel on the production servers.  One is the RPM system. It seems to barf badly due to the db3 vs db4 errors when updating the packages on Redhat 9.0 system.
>...

This problem and a workaround are listed at

  http://www.linux.org.uk/~davej/docs/post-halloween-2.6.txt


> Best regards,
>  Elikster                            mailto:elik@webspires.com

cu
Adrian

BTW: Please tell your mail client about line breaks.

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-14  8:51 ` Martin Diehl
@ 2004-02-14 22:34   ` James Bottomley
  2004-02-14 23:18   ` David S. Miller
  1 sibling, 0 replies; 23+ messages in thread
From: James Bottomley @ 2004-02-14 22:34 UTC (permalink / raw)
  To: Martin Diehl; +Cc: Linux Kernel

On Sat, 2004-02-14 at 03:51, Martin Diehl wrote:
> Ok, will do.
> 
> Just to make sure I got you right, your concern is people mixing up the 
> effect of this call with some means to sync with posted writes on iomem 
> mapped memory location provided by some device on the bus? I.e. the 
> situation where they probably want to use readl() or similar to force the 
> posted writes to complete in contrast to sync_to_device which ensures the 
> modified system memory gets synced before the busmastering starts?

Yes, that's it.

> If so, wouldn't this concern be related to the pci_dma api as a whole, not 
> only the new pci_dma_sync_to_device call particularly? I mean, none of the 
> api functions to deal with consistent or streaming maps of system memory 
> are applicable for flushing posted writes to iomem on the bus. Maybe the 
> confusion comes from the fact on some archs the pci_dma_sync call would 
> have to flush posted writes _from_ the busmaster device to system memory?

Well ... people did ask, even in the old api.  The excuse always was
that the cache coherency aspect deals exclusively with the CPU cache. 
Now you're adding an API specifically to deal with possible bus caches,
I just wanted it to be crystal clear that you can't use the bus cache
API for flushing posted writes.

James



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Patch] dma_sync_to_device
  2004-02-14  8:51 ` Martin Diehl
  2004-02-14 22:34   ` James Bottomley
@ 2004-02-14 23:18   ` David S. Miller
  1 sibling, 0 replies; 23+ messages in thread
From: David S. Miller @ 2004-02-14 23:18 UTC (permalink / raw)
  To: Martin Diehl; +Cc: James.Bottomley, linux-kernel

On Sat, 14 Feb 2004 09:51:17 +0100 (CET)
Martin Diehl <lists@mdiehl.de> wrote:

> Ok, will do.

Martin, when you have an updated patch, toss it my way please and I'll
work on it further.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2004-02-14 23:18 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-10 17:31 [Patch] dma_sync_to_device Martin Diehl
2004-02-10 18:42 ` David S. Miller
2004-02-10 18:59   ` Martin Diehl
2004-02-11  6:17 ` Deepak Saxena
2004-02-11  6:51   ` Martin Diehl
2004-02-11 16:39     ` Deepak Saxena
2004-02-11 17:51       ` David S. Miller
2004-02-11 18:18     ` Matt Porter
2004-02-11 18:30       ` David S. Miller
2004-02-11 18:57         ` Deepak Saxena
2004-02-11 19:08           ` David S. Miller
2004-02-12  3:46             ` Deepak Saxena
2004-02-12  3:58               ` David S. Miller
2004-02-13  1:49             ` Jamie Lokier
2004-02-14  7:24               ` David S. Miller
2004-02-11 19:23         ` Matt Porter
2004-02-11 19:30           ` David S. Miller
2004-02-11 18:43       ` linux-2.6.2 Kernel Problem Elikster
2004-02-14 11:51         ` Adrian Bunk
  -- strict thread matches above, loose matches on Subject: below --
2004-02-13 14:27 [Patch] dma_sync_to_device James Bottomley
2004-02-14  8:51 ` Martin Diehl
2004-02-14 22:34   ` James Bottomley
2004-02-14 23:18   ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox