[RFT][PATCH] generic device DMA implementation

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFT][PATCH] generic device DMA implementation
@ 2002-12-18  3:01 James Bottomley
  2002-12-18  3:13 ` David Mosberger
  2002-12-28 18:14 ` Russell King
  0 siblings, 2 replies; 32+ messages in thread
From: James Bottomley @ 2002-12-18  3:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: James.Bottomley

[-- Attachment #1: Type: text/plain, Size: 463 bytes --]

The attached should represent close to final form for the generic DMA API.  It 
includes documentation (surprise!) and and implementation in terms of the pci_ 
API for every arch (apart from parisc, which will be submitted later).

I've folded in the feedback from the previous thread.  Hopefully, this should 
be ready for inclusion.  If people could test it on x86 and other 
architectures, I'd be grateful.

comments and feedback from testing welcome.

James


[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 39112 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.859   -> 1.861  
#	arch/i386/kernel/pci-dma.c	1.8     -> 1.10   
#	   drivers/pci/pci.c	1.51    -> 1.52   
#	include/asm-i386/pci.h	1.17    -> 1.18   
#	 include/linux/pci.h	1.55    -> 1.56   
#	Documentation/DMA-mapping.txt	1.13    -> 1.14   
#	arch/i386/kernel/i386_ksyms.c	1.40    -> 1.41   
#	               (new)	        -> 1.1     include/asm-s390x/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-arm/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-sparc/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-cris/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-sh/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-ppc64/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-m68knommu/dma-mapping.h
#	               (new)	        -> 1.1     Documentation/DMA-API.txt
#	               (new)	        -> 1.1     include/asm-um/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-ia64/dma-mapping.h
#	               (new)	        -> 1.3     include/asm-generic/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-alpha/dma-mapping.h
#	               (new)	        -> 1.2     include/linux/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-ppc/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-s390/dma-mapping.h
#	               (new)	        -> 1.5     include/asm-i386/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-sparc64/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-m68k/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-mips/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-x86_64/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-v850/dma-mapping.h
#	               (new)	        -> 1.1     include/asm-mips64/dma-mapping.h
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/12/09	jejb@mulgrave.(none)	1.860
# Merge ssh://raven/BK/dma-generic-device-2.5.50
# into mulgrave.(none):/home/jejb/BK/dma-generic-device-2.5
# --------------------------------------------
# 02/12/16	jejb@mulgrave.(none)	1.861
# Documentation complete
# --------------------------------------------
#
diff -Nru a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/Documentation/DMA-API.txt	Tue Dec 17 20:49:32 2002
@@ -0,0 +1,325 @@
+               Dynamic DMA mapping using the generic device
+               ============================================
+
+        James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
+
+This document describes the DMA API.  For a more gentle introduction
+phrased in terms of the pci_ equivalents (and actual examples) see
+DMA-mapping.txt
+
+This API is split into two pieces.  Part I describes the API and the
+corresponding pci_ API.  Part II describes the extensions to the API
+for supporting non-consistent memory machines.  Unless you know that
+your driver absolutely has to support non-consistent platforms (this
+is usually only legacy platforms) you should only use the API
+described in part I.
+
+Part I - pci_ and dma_ Equivalent API 
+-------------------------------------
+
+To get the pci_ API, you must #include <linux/pci.h>
+To get the dma_ API, you must #include <linux/dma-mapping.h>
+
+void *
+dma_alloc_consistent(struct device *dev, size_t size,
+			     dma_addr_t *dma_handle)
+void *
+pci_alloc_consistent(struct pci_dev *dev, size_t size,
+			     dma_addr_t *dma_handle)
+
+Consistent memory is memory for which a write by either the device or
+the processor can immediately be read by the processor or device
+without having to worry about caching effects.
+
+This routine allocates a region of <size> bytes of consistent memory.
+it also returns a <dma_handle> which may be cast to an unsigned
+integer the same width as the bus and used as the physical address
+base of the region.
+
+Returns: a pointer to the allocated region (in the processor's virtual
+address space) or NULL if the allocation failed.
+
+Note: consistent memory can be expensive on some platforms, and the
+minimum allocation length may be as big as a page, so you should
+consolidate your requests for consistent memory as much as possible.
+
+void
+dma_free_consistent(struct device *dev, size_t size, void *cpu_addr
+			   dma_addr_t dma_handle)
+void
+pci_free_consistent(struct pci_dev *dev, size_t size, void *cpu_addr
+			   dma_addr_t dma_handle)
+
+Free the region of consistent memory you previously allocated.  dev,
+size and dma_handle must all be the same as those passed into the
+consistent allocate.  cpu_addr must be the virtual address returned by
+the consistent allocate
+
+int
+dma_supported(struct device *dev, u64 mask)
+int
+pci_dma_supported(struct device *dev, u64 mask)
+
+Checks to see if the device can support DMA to the memory described by
+mask.
+
+Returns: 1 if it can and 0 if it can't.
+
+Notes: This routine merely tests to see if the mask is possible.  It
+won't change the current mask settings.  It is more intended as an
+internal API for use by the platform than an external API for use by
+driver writers.
+
+int
+dma_set_mask(struct device *dev, u64 mask)
+int
+pci_dma_set_mask(struct pci_device *dev, u64 mask)
+
+Checks to see if the mask is possible and updates the device
+parameters if it is.
+
+Returns: 1 if successful and 0 if not
+
+dma_addr_t
+dma_map_single(struct device *dev, void *cpu_addr, size_t size,
+		      enum dma_data_direction direction)
+dma_addr_t
+pci_map_single(struct device *dev, void *cpu_addr, size_t size,
+		      int direction)
+
+Maps a piece of processor virtual memory so it can be accessed by the
+device and returns the physical handle of the memory.
+
+The direction for both api's may be converted freely by casting.
+However the dma_ API uses a strongly typed enumerator for its
+direction:
+
+DMA_NONE		= PCI_DMA_NONE		no direction (used for
+						debugging)
+DMA_TO_DEVICE		= PCI_DMA_TODEVICE	data is going from the
+						memory to the device
+DMA_FROM_DEVICE		= PCI_DMA_FROMDEVICE	data is coming from
+						the device to the
+						memory
+DMA_BIDIRECTIONAL	= PCI_DMA_BIDIRECTIONAL	direction isn't known
+
+Notes:  Not all memory regions in a machine can be mapped by this
+API.  Further, regions that appear to be physically contiguous in
+kernel virtual space may not be contiguous as physical memory.  Since
+this API does not provide any scatter/gather capability, it will fail
+if the user tries to map a non physically contiguous piece of memory.
+For this reason, it is recommended that memory mapped by this API be
+obtained only from sources which guarantee to be physically contiguous
+(like kmalloc).
+
+Further, the physical address of the memory must be within the
+dma_mask of the device (the dma_mask represents a bit mask of the
+addressable region for the device.  i.e. if the physical address of
+the memory anded with the dma_mask is still equal to the physical
+address, then the device can perform DMA to the memory).  In order to
+ensure that the memory allocated by kmalloc is within the dma_mask,
+the driver may specify various platform dependent flags to restrict
+the physical memory range of the allocation (e.g. on x86, GFP_DMA
+guarantees to be within the first 16Mb of available physical memory,
+as required by ISA devices).
+
+Note also that the above constraints on physical contiguity and
+dma_mask may not apply if the platform has an IOMMU (a device which
+supplies a physical to virtual mapping between the I/O memory bus and
+the device).  However, to be portable, device driver writers may *not*
+assume that such an IOMMU exists.
+
+Warnings:  Memory coherency operates at a granularity called the cache
+line width.  In order for memory mapped by this API to operate
+correctly, the mapped region must begin exactly on a cache line
+boundary and end exactly on one (to prevent two separately mapped
+regions from sharing a single cache line).  Since the cache line size
+may not be known at compile time, the API will not enforce this
+requirement.  Therefore, it is recommended that driver writers who
+don't take special care to determine the cache line size at run time
+only map virtual regions that begin and end on page boundaries (which
+are guaranteed also to be cache line boundaries).
+
+DMA_TO_DEVICE synchronisation must be done after the last modification
+of the memory region by the software and before it is handed off to
+the driver.  Once this primitive is used.  Memory covered by this
+primitive should be treated as read only by the device.  If the device
+may write to it at any point, it should be DMA_BIDIRECTIONAL (see
+below).
+
+DMA_FROM_DEVICE synchronisation must be done before the driver
+accesses data that may be changed by the device.  This memory should
+be treated as read only by the driver.  If the driver needs to write
+to it at any point, it should be DMA_BIDIRECTIONAL (see below).
+
+DMA_BIDIRECTIONAL requires special handling: it means that the driver
+isn't sure if the memory was modified before being handed off to the
+device and also isn't sure if the device will also modify it.  Thus,
+you must always sync bidirectional memory twice: once before the
+memory is handed off to the device (to make sure all memory changes
+are flushed from the processor) and once before the data may be
+accessed after being used by the device (to make sure any processor
+cache lines are updated with data that the device may have changed.
+
+void
+dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
+		 enum dma_data_direction direction)
+void
+pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
+		 size_t size, int direction)
+
+Unmaps the region previously mapped.  All the parameters passed in
+must be identical to those passed in (and returned) by the mapping
+API.
+
+dma_addr_t
+dma_map_page(struct device *dev, struct page *page,
+		    unsigned long offset, size_t size,
+		    enum dma_data_direction direction)
+dma_addr_t
+pci_map_page(struct pci_dev *hwdev, struct page *page,
+		    unsigned long offset, size_t size, int direction)
+void
+dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
+	       enum dma_data_direction direction)
+void
+pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address,
+	       size_t size, int direction)
+
+API for mapping and unmapping for pages.  All the notes and warnings
+for the other mapping APIs apply here.  Also, although the <offset>
+and <size> parameters are provided to do partial page mapping, it is
+recommended that you never use these unless you really know what the
+cache width is.
+
+int
+dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+	   enum dma_data_direction direction)
+int
+pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+	   int nents, int direction)
+
+Maps a scatter gather list from the block layer.
+
+Returns: the number of physical segments mapped (this may be shorted
+than <nents> passed in if the block layer determines that some
+elements of the scatter/gather list are physically adjacent and thus
+may be mapped with a single entry).
+
+void
+dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries,
+	     enum dma_data_direction direction)
+void
+pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+	     int nents, int direction)
+
+unmap the previously mapped scatter/gather list.  All the parameters
+must be the same as those and passed in to the scatter/gather mapping
+API.
+
+Note: <nents> must be the number you passed in, *not* the number of
+physical entries returned.
+
+void
+dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size,
+		enum dma_data_direction direction)
+void
+pci_dma_sync_single(struct pci_dev *hwdev, dma_addr_t dma_handle,
+			   size_t size, int direction)
+void
+dma_sync_sg(struct device *dev, struct scatterlist *sg, int nelems,
+			  enum dma_data_direction direction)
+void
+pci_dma_sync_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+		       int nelems, int direction)
+
+synchronise a single contiguous or scatter/gather mapping.  All the
+parameters must be the same as those passed into the single mapping
+API.
+
+Notes:  You must do this:
+
+- Before reading values that have been written by DMA from the device
+  (use the DMA_FROM_DEVICE direction)
+- After writing values that will be written to the device using DMA
+  (use the DMA_TO_DEVICE) direction
+- before *and* after handing memory to the device if the memory is
+  DMA_BIDIRECTIONAL
+
+See also dma_map_single().
+
+Part II - Advanced dma_ usage
+-----------------------------
+
+Warning: These pieces of the DMA API have no PCI equivalent.  They
+should also not be used in the majority of cases, since they cater for
+unlikely corner cases that don't belong in usual drivers.
+
+If you don't understand how cache line coherency works between a
+processor and an I/O device, you should not be using this part of the
+API at all.
+
+void *
+dma_alloc_nonconsistent(struct device *dev, size_t size,
+			       dma_addr_t *dma_handle)
+
+Identical to dma_alloc_consistent() except that the platform will
+choose to return either consistent or non-consistent memory as it sees
+fit.  By using this API, you are guaranteeing to the platform that you
+have all the correct and necessary sync points for this memory in the
+driver should it choose to return non-consistent memory.
+
+Note: where the platform can return consistent memory, it will
+guarantee that the sync points become nops.
+
+Warning:  Handling non-consistent memory is a real pain.  You should
+only ever use this API if you positively know your driver will be
+required to work on one of the rare (usually non-PCI) architectures
+that simply cannot make consistent memory.
+
+void
+dma_free_nonconsistent(struct device *dev, size_t size, void *cpu_addr,
+			      dma_addr_t dma_handle)
+
+free memory allocated by the nonconsistent API.  All parameters must
+be identical to those passed in (and returned by
+dma_alloc_nonconsistent()).
+
+int
+dma_is_consistent(dma_addr_t dma_handle)
+
+returns true if the memory pointed to by the dma_handle is actually
+consistent.
+
+int
+dma_get_cache_alignment(void)
+
+returns the processor cache alignment.  This is the absolute minimum
+alignment *and* width that you must observe when either mapping
+memory or doing partial flushes.
+
+Notes: This API may return a number *larger* than the actual cache
+line, but it will guarantee that one or more cache lines fit exactly
+into the width returned by this call.  It will also always be a power
+of two for easy alignment
+
+void
+dma_sync_single_range(struct device *dev, dma_addr_t dma_handle,
+		      unsigned long offset, size_t size,
+		      enum dma_data_direction direction)
+
+does a partial sync.  starting at offset and continuing for size.  You
+must be careful to observe the cache alignment and width when doing
+anything like this.  You must also be extra careful about accessing
+memory you intend to sync partially.
+
+void
+dma_cache_sync(void *vaddr, size_t size,
+	       enum dma_data_direction direction)
+
+Do a partial sync of memory that was allocated by
+dma_alloc_nonconsistent(), starting at virtual address vaddr and
+continuing on for size.  Again, you *must* observe the cache line
+boundaries when doing this.
+
+
diff -Nru a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt
--- a/Documentation/DMA-mapping.txt	Tue Dec 17 20:49:32 2002
+++ b/Documentation/DMA-mapping.txt	Tue Dec 17 20:49:32 2002
@@ -5,6 +5,10 @@
 		 Richard Henderson <rth@cygnus.com>
 		  Jakub Jelinek <jakub@redhat.com>
 
+This document describes the DMA mapping system in terms of the pci_
+API.  For a similar API that works for generic devices, see
+DMA-API.txt.
+
 Most of the 64bit platforms have special hardware that translates bus
 addresses (DMA addresses) into physical addresses.  This is similar to
 how page tables and/or a TLB translates virtual addresses to physical
diff -Nru a/arch/i386/kernel/i386_ksyms.c b/arch/i386/kernel/i386_ksyms.c
--- a/arch/i386/kernel/i386_ksyms.c	Tue Dec 17 20:49:32 2002
+++ b/arch/i386/kernel/i386_ksyms.c	Tue Dec 17 20:49:32 2002
@@ -124,8 +124,8 @@
 EXPORT_SYMBOL(__copy_to_user);
 EXPORT_SYMBOL(strnlen_user);
 
-EXPORT_SYMBOL(pci_alloc_consistent);
-EXPORT_SYMBOL(pci_free_consistent);
+EXPORT_SYMBOL(dma_alloc_consistent);
+EXPORT_SYMBOL(dma_free_consistent);
 
 #ifdef CONFIG_PCI
 EXPORT_SYMBOL(pcibios_penalize_isa_irq);
diff -Nru a/arch/i386/kernel/pci-dma.c b/arch/i386/kernel/pci-dma.c
--- a/arch/i386/kernel/pci-dma.c	Tue Dec 17 20:49:32 2002
+++ b/arch/i386/kernel/pci-dma.c	Tue Dec 17 20:49:32 2002
@@ -13,13 +13,13 @@
 #include <linux/pci.h>
 #include <asm/io.h>
 
-void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
+void *dma_alloc_consistent(struct device *dev, size_t size,
 			   dma_addr_t *dma_handle)
 {
 	void *ret;
 	int gfp = GFP_ATOMIC;
 
-	if (hwdev == NULL || ((u32)hwdev->dma_mask != 0xffffffff))
+	if (dev == NULL || ((u32)*dev->dma_mask != 0xffffffff))
 		gfp |= GFP_DMA;
 	ret = (void *)__get_free_pages(gfp, get_order(size));
 
@@ -30,7 +30,7 @@
 	return ret;
 }
 
-void pci_free_consistent(struct pci_dev *hwdev, size_t size,
+void dma_free_consistent(struct device *dev, size_t size,
 			 void *vaddr, dma_addr_t dma_handle)
 {
 	free_pages((unsigned long)vaddr, get_order(size));
diff -Nru a/include/asm-alpha/dma-mapping.h b/include/asm-alpha/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-alpha/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-arm/dma-mapping.h b/include/asm-arm/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-arm/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-cris/dma-mapping.h b/include/asm-cris/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-cris/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-generic/dma-mapping.h b/include/asm-generic/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-generic/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1,154 @@
+/* Copyright (C) 2002 by James.Bottomley@HansenPartnership.com 
+ *
+ * Implements the generic device dma API via the existing pci_ one
+ * for unconverted architectures
+ */
+
+#ifndef _ASM_GENERIC_DMA_MAPPING_H
+#define _ASM_GENERIC_DMA_MAPPING_H
+
+/* we implement the API below in terms of the existing PCI one,
+ * so include it */
+#include <linux/pci.h>
+
+static inline int
+dma_supported(struct device *dev, u64 mask)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	return pci_dma_supported(to_pci_dev(dev), mask);
+}
+
+static inline int
+dma_set_mask(struct device *dev, u64 dma_mask)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	return pci_set_dma_mask(to_pci_dev(dev), dma_mask);
+}
+
+static inline void *
+dma_alloc_consistent(struct device *dev, size_t size, dma_addr_t *dma_handle)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	return pci_alloc_consistent(to_pci_dev(dev), size, dma_handle);
+}
+
+static inline void
+dma_free_consistent(struct device *dev, size_t size, void *cpu_addr,
+		    dma_addr_t dma_handle)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	pci_free_consistent(to_pci_dev(dev), size, cpu_addr, dma_handle);
+}
+
+static inline dma_addr_t
+dma_map_single(struct device *dev, void *cpu_addr, size_t size,
+	       enum dma_data_direction direction)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	return pci_map_single(to_pci_dev(dev), cpu_addr, size, (int)direction);
+}
+
+static inline void
+dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
+		 enum dma_data_direction direction)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	pci_unmap_single(to_pci_dev(dev), dma_addr, size, (int)direction);
+}
+
+static inline dma_addr_t
+dma_map_page(struct device *dev, struct page *page,
+	     unsigned long offset, size_t size,
+	     enum dma_data_direction direction)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	return pci_map_page(to_pci_dev(dev), page, offset, size, (int)direction);
+}
+
+static inline void
+dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
+	       enum dma_data_direction direction)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	pci_unmap_page(to_pci_dev(dev), dma_address, size, (int)direction);
+}
+
+static inline int
+dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+	   enum dma_data_direction direction)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	return pci_map_sg(to_pci_dev(dev), sg, nents, (int)direction);
+}
+
+static inline void
+dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries,
+	     enum dma_data_direction direction)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	pci_unmap_sg(to_pci_dev(dev), sg, nhwentries, (int)direction);
+}
+
+static inline void
+dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size,
+		enum dma_data_direction direction)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	pci_dma_sync_single(to_pci_dev(dev), dma_handle, size, (int)direction);
+}
+
+static inline void
+dma_sync_sg(struct device *dev, struct scatterlist *sg, int nelems,
+	    enum dma_data_direction direction)
+{
+	BUG_ON(dev->bus != &pci_bus_type);
+
+	pci_dma_sync_sg(to_pci_dev(dev), sg, nelems, (int)direction);
+}
+
+/* Now for the API extensions over the pci_ one */
+
+#define dma_alloc_nonconsistent(d, s, h) dma_alloc_consistent(d, s, h)
+#define dma_free_nonconsistent(d, s, v, h) dma_free_consistent(d, s, v, h)
+#define dma_is_consistent(d)	(1)
+
+static inline int
+dma_get_cache_alignment(void)
+{
+	/* no easy way to get cache size on all processors, so return
+	 * the maximum possible, to be safe */
+	return (1 << L1_CACHE_SHIFT_MAX);
+}
+
+static inline void
+dma_sync_single_range(struct device *dev, dma_addr_t dma_handle,
+		      unsigned long offset, size_t size,
+		      enum dma_data_direction direction)
+{
+	/* just sync everything, that's all the pci API can do */
+	dma_sync_single(dev, dma_handle, offset+size, direction);
+}
+
+static inline void
+dma_cache_sync(void *vaddr, size_t size,
+	       enum dma_data_direction direction)
+{
+	/* could define this in terms of the dma_cache ... operations,
+	 * but if you get this on a platform, you should convert the platform
+	 * to using the generic device DMA API */
+	BUG();
+}
+
+#endif
+
diff -Nru a/include/asm-i386/dma-mapping.h b/include/asm-i386/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-i386/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1,137 @@
+#ifndef _ASM_I386_DMA_MAPPING_H
+#define _ASM_I386_DMA_MAPPING_H
+
+#include <asm/cache.h>
+
+#define dma_alloc_nonconsistent(d, s, h) dma_alloc_consistent(d, s, h)
+#define dma_free_nonconsistent(d, s, v, h) dma_free_consistent(d, s, v, h)
+
+void *dma_alloc_consistent(struct device *dev, size_t size,
+			   dma_addr_t *dma_handle);
+
+void dma_free_consistent(struct device *dev, size_t size,
+			 void *vaddr, dma_addr_t dma_handle);
+
+static inline dma_addr_t
+dma_map_single(struct device *dev, void *ptr, size_t size,
+	       enum dma_data_direction direction)
+{
+	BUG_ON(direction == DMA_NONE);
+	flush_write_buffers();
+	return virt_to_phys(ptr);
+}
+
+static inline void
+dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
+		 enum dma_data_direction direction)
+{
+	BUG_ON(direction == DMA_NONE);
+}
+
+static inline int
+dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+	   enum dma_data_direction direction)
+{
+	int i;
+
+	BUG_ON(direction == DMA_NONE);
+
+	for (i = 0; i < nents; i++ ) {
+		BUG_ON(!sg[i].page);
+
+		sg[i].dma_address = page_to_phys(sg[i].page) + sg[i].offset;
+	}
+
+	flush_write_buffers();
+	return nents;
+}
+
+static inline dma_addr_t
+dma_map_page(struct device *dev, struct page *page, unsigned long offset,
+	     size_t size, enum dma_data_direction direction)
+{
+	BUG_ON(direction == DMA_NONE);
+	return (dma_addr_t)(page_to_pfn(page)) * PAGE_SIZE + offset;
+}
+
+static inline void
+dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
+	       enum dma_data_direction direction)
+{
+	BUG_ON(direction == DMA_NONE);
+}
+
+
+static inline void
+dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries,
+	     enum dma_data_direction direction)
+{
+	BUG_ON(direction == DMA_NONE);
+}
+
+static inline void
+dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size,
+		enum dma_data_direction direction)
+{
+	flush_write_buffers();
+}
+
+static inline void
+dma_sync_single_range(struct device *dev, dma_addr_t dma_handle,
+		      unsigned long offset, size_t size,
+		      enum dma_data_direction direction)
+{
+	flush_write_buffers();
+}
+
+
+static inline void
+dma_sync_sg(struct device *dev, struct scatterlist *sg, int nelems,
+		 enum dma_data_direction direction)
+{
+	flush_write_buffers();
+}
+
+static inline int
+dma_supported(struct device *dev, u64 mask)
+{
+        /*
+         * we fall back to GFP_DMA when the mask isn't all 1s,
+         * so we can't guarantee allocations that must be
+         * within a tighter range than GFP_DMA..
+         */
+        if(mask < 0x00ffffff)
+                return 0;
+
+	return 1;
+}
+
+static inline int
+dma_set_mask(struct device *dev, u64 mask)
+{
+	if(!dev->dma_mask || !dma_supported(dev, mask))
+		return -EIO;
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
+static inline int
+dma_get_cache_alignment(void)
+{
+	/* no easy way to get cache size on all x86, so return the
+	 * maximum possible, to be safe */
+	return (1 << L1_CACHE_SHIFT_MAX);
+}
+
+#define dma_is_consistent(d)	(1)
+
+static inline void
+dma_cache_sync(void *vaddr, size_t size,
+	       enum dma_data_direction direction)
+{
+	flush_write_buffers();
+}
+
+#endif
diff -Nru a/include/asm-i386/pci.h b/include/asm-i386/pci.h
--- a/include/asm-i386/pci.h	Tue Dec 17 20:49:32 2002
+++ b/include/asm-i386/pci.h	Tue Dec 17 20:49:32 2002
@@ -6,6 +6,9 @@
 #ifdef __KERNEL__
 #include <linux/mm.h>		/* for struct page */
 
+/* we support the new DMA API, but still provide the old one */
+#define PCI_NEW_DMA_COMPAT_API	1
+
 /* Can be used to override the logic in pci_scan_bus for skipping
    already-configured bus numbers - to be used for buggy BIOSes
    or architectures with incomplete PCI setup by the loader */
@@ -46,78 +49,6 @@
  */
 #define PCI_DMA_BUS_IS_PHYS	(1)
 
-/* Allocate and map kernel buffer using consistent mode DMA for a device.
- * hwdev should be valid struct pci_dev pointer for PCI devices,
- * NULL for PCI-like buses (ISA, EISA).
- * Returns non-NULL cpu-view pointer to the buffer if successful and
- * sets *dma_addrp to the pci side dma address as well, else *dma_addrp
- * is undefined.
- */
-extern void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
-				  dma_addr_t *dma_handle);
-
-/* Free and unmap a consistent DMA buffer.
- * cpu_addr is what was returned from pci_alloc_consistent,
- * size must be the same as what as passed into pci_alloc_consistent,
- * and likewise dma_addr must be the same as what *dma_addrp was set to.
- *
- * References to the memory and mappings associated with cpu_addr/dma_addr
- * past this call are illegal.
- */
-extern void pci_free_consistent(struct pci_dev *hwdev, size_t size,
-				void *vaddr, dma_addr_t dma_handle);
-
-/* Map a single buffer of the indicated size for DMA in streaming mode.
- * The 32-bit bus address to use is returned.
- *
- * Once the device is given the dma address, the device owns this memory
- * until either pci_unmap_single or pci_dma_sync_single is performed.
- */
-static inline dma_addr_t pci_map_single(struct pci_dev *hwdev, void *ptr,
-					size_t size, int direction)
-{
-	if (direction == PCI_DMA_NONE)
-		BUG();
-	flush_write_buffers();
-	return virt_to_phys(ptr);
-}
-
-/* Unmap a single streaming mode DMA translation.  The dma_addr and size
- * must match what was provided for in a previous pci_map_single call.  All
- * other usages are undefined.
- *
- * After this call, reads by the cpu to the buffer are guarenteed to see
- * whatever the device wrote there.
- */
-static inline void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
-				    size_t size, int direction)
-{
-	if (direction == PCI_DMA_NONE)
-		BUG();
-	/* Nothing to do */
-}
-
-/*
- * pci_{map,unmap}_single_page maps a kernel page to a dma_addr_t. identical
- * to pci_map_single, but takes a struct page instead of a virtual address
- */
-static inline dma_addr_t pci_map_page(struct pci_dev *hwdev, struct page *page,
-				      unsigned long offset, size_t size, int direction)
-{
-	if (direction == PCI_DMA_NONE)
-		BUG();
-
-	return (dma_addr_t)(page_to_pfn(page)) * PAGE_SIZE + offset;
-}
-
-static inline void pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address,
-				  size_t size, int direction)
-{
-	if (direction == PCI_DMA_NONE)
-		BUG();
-	/* Nothing to do */
-}
-
 /* pci_unmap_{page,single} is a nop so... */
 #define DECLARE_PCI_UNMAP_ADDR(ADDR_NAME)
 #define DECLARE_PCI_UNMAP_LEN(LEN_NAME)
@@ -126,84 +57,6 @@
 #define pci_unmap_len(PTR, LEN_NAME)		(0)
 #define pci_unmap_len_set(PTR, LEN_NAME, VAL)	do { } while (0)
 
-/* Map a set of buffers described by scatterlist in streaming
- * mode for DMA.  This is the scather-gather version of the
- * above pci_map_single interface.  Here the scatter gather list
- * elements are each tagged with the appropriate dma address
- * and length.  They are obtained via sg_dma_{address,length}(SG).
- *
- * NOTE: An implementation may be able to use a smaller number of
- *       DMA address/length pairs than there are SG table elements.
- *       (for example via virtual mapping capabilities)
- *       The routine returns the number of addr/length pairs actually
- *       used, at most nents.
- *
- * Device ownership issues as mentioned above for pci_map_single are
- * the same here.
- */
-static inline int pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
-			     int nents, int direction)
-{
-	int i;
-
-	if (direction == PCI_DMA_NONE)
-		BUG();
-
-	for (i = 0; i < nents; i++ ) {
-		if (!sg[i].page)
-			BUG();
-
-		sg[i].dma_address = page_to_phys(sg[i].page) + sg[i].offset;
-	}
-
-	flush_write_buffers();
-	return nents;
-}
-
-/* Unmap a set of streaming mode DMA translations.
- * Again, cpu read rules concerning calls here are the same as for
- * pci_unmap_single() above.
- */
-static inline void pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg,
-				int nents, int direction)
-{
-	if (direction == PCI_DMA_NONE)
-		BUG();
-	/* Nothing to do */
-}
-
-/* Make physical memory consistent for a single
- * streaming mode DMA translation after a transfer.
- *
- * If you perform a pci_map_single() but wish to interrogate the
- * buffer using the cpu, yet do not wish to teardown the PCI dma
- * mapping, you must call this function before doing so.  At the
- * next point you give the PCI dma address back to the card, the
- * device again owns the buffer.
- */
-static inline void pci_dma_sync_single(struct pci_dev *hwdev,
-				       dma_addr_t dma_handle,
-				       size_t size, int direction)
-{
-	if (direction == PCI_DMA_NONE)
-		BUG();
-	flush_write_buffers();
-}
-
-/* Make physical memory consistent for a set of streaming
- * mode DMA translations after a transfer.
- *
- * The same as pci_dma_sync_single but for a scatter-gather list,
- * same rules and usage.
- */
-static inline void pci_dma_sync_sg(struct pci_dev *hwdev,
-				   struct scatterlist *sg,
-				   int nelems, int direction)
-{
-	if (direction == PCI_DMA_NONE)
-		BUG();
-	flush_write_buffers();
-}
 
 /* Return whether the given PCI device DMA address mask can
  * be supported properly.  For example, if your device can
diff -Nru a/include/asm-ia64/dma-mapping.h b/include/asm-ia64/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-ia64/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-m68k/dma-mapping.h b/include/asm-m68k/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-m68k/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-m68knommu/dma-mapping.h b/include/asm-m68knommu/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-m68knommu/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-mips/dma-mapping.h b/include/asm-mips/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-mips/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-mips64/dma-mapping.h b/include/asm-mips64/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-mips64/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-ppc/dma-mapping.h b/include/asm-ppc/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-ppc/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-ppc64/dma-mapping.h b/include/asm-ppc64/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-ppc64/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-s390/dma-mapping.h b/include/asm-s390/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-s390/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-s390x/dma-mapping.h b/include/asm-s390x/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-s390x/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-sh/dma-mapping.h b/include/asm-sh/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-sh/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-sparc/dma-mapping.h b/include/asm-sparc/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-sparc/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-sparc64/dma-mapping.h b/include/asm-sparc64/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-sparc64/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-um/dma-mapping.h b/include/asm-um/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-um/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-v850/dma-mapping.h b/include/asm-v850/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-v850/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/asm-x86_64/dma-mapping.h b/include/asm-x86_64/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/asm-x86_64/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1 @@
+#include <asm-generic/dma-mapping.h>
diff -Nru a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/linux/dma-mapping.h	Tue Dec 17 20:49:32 2002
@@ -0,0 +1,17 @@
+#ifndef _ASM_LINUX_DMA_MAPPING_H
+#define _ASM_LINUX_DMA_MAPPING_H
+
+/* These definitions mirror those in pci.h, so they can be used
+ * interchangeably with their PCI_ counterparts */
+enum dma_data_direction {
+	DMA_BIDIRECTIONAL = 0,
+	DMA_TO_DEVICE = 1,
+	DMA_FROM_DEVICE = 2,
+	DMA_NONE = 3,
+};
+
+#include <asm/dma-mapping.h>
+
+#endif
+
+
diff -Nru a/include/linux/pci.h b/include/linux/pci.h
--- a/include/linux/pci.h	Tue Dec 17 20:49:32 2002
+++ b/include/linux/pci.h	Tue Dec 17 20:49:32 2002
@@ -826,5 +826,92 @@
 #define PCIPCI_VIAETBF		8
 #define PCIPCI_VSFX		16
 
+#include <linux/dma-mapping.h>
+
+/* If you define PCI_NEW_DMA_COMPAT_API it means you support the new DMA API
+ * and you want the pci_ DMA API to be implemented using it.
+ */
+#if defined(PCI_NEW_DMA_COMPAT_API) && defined(CONFIG_PCI)
+
+/* note pci_set_dma_mask isn't here, since it's a public function
+ * exported from drivers/pci, use dma_supported instead */
+
+static inline int
+pci_dma_supported(struct pci_dev *hwdev, u64 mask)
+{
+	return dma_supported(&hwdev->dev, mask);
+}
+
+static inline void *
+pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
+		     dma_addr_t *dma_handle)
+{
+	return dma_alloc_consistent(&hwdev->dev, size, dma_handle);
+}
+
+static inline void
+pci_free_consistent(struct pci_dev *hwdev, size_t size,
+		    void *vaddr, dma_addr_t dma_handle)
+{
+	dma_free_consistent(&hwdev->dev, size, vaddr, dma_handle);
+}
+
+static inline dma_addr_t
+pci_map_single(struct pci_dev *hwdev, void *ptr, size_t size, int direction)
+{
+	return dma_map_single(&hwdev->dev, ptr, size, (enum dma_data_direction)direction);
+}
+
+static inline void
+pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
+		 size_t size, int direction)
+{
+	dma_unmap_single(&hwdev->dev, dma_addr, size, (enum dma_data_direction)direction);
+}
+
+static inline dma_addr_t
+pci_map_page(struct pci_dev *hwdev, struct page *page,
+	     unsigned long offset, size_t size, int direction)
+{
+	return dma_map_page(&hwdev->dev, page, offset, size, (enum dma_data_direction)direction);
+}
+
+static inline void
+pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address,
+	       size_t size, int direction)
+{
+	dma_unmap_page(&hwdev->dev, dma_address, size, (enum dma_data_direction)direction);
+}
+
+static inline int
+pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+	   int nents, int direction)
+{
+	return dma_map_sg(&hwdev->dev, sg, nents, (enum dma_data_direction)direction);
+}
+
+static inline void
+pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+	     int nents, int direction)
+{
+	dma_unmap_sg(&hwdev->dev, sg, nents, (enum dma_data_direction)direction);
+}
+
+static inline void
+pci_dma_sync_single(struct pci_dev *hwdev, dma_addr_t dma_handle,
+		    size_t size, int direction)
+{
+	dma_sync_single(&hwdev->dev, dma_handle, size, (enum dma_data_direction)direction);
+}
+
+static inline void
+pci_dma_sync_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+		int nelems, int direction)
+{
+	dma_sync_sg(&hwdev->dev, sg, nelems, (enum dma_data_direction)direction);
+}
+
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* LINUX_PCI_H */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-18  3:01 James Bottomley
@ 2002-12-18  3:13 ` David Mosberger
  2002-12-28 18:14 ` Russell King
  1 sibling, 0 replies; 32+ messages in thread
From: David Mosberger @ 2002-12-18  3:13 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

  James> The attached should represent close to final form for the
  James> generic DMA API.  It includes documentation (surprise!) and
  James> and implementation in terms of the pci_ API for every arch
  James> (apart from parisc, which will be submitted later).

  James> I've folded in the feedback from the previous thread.
  James> Hopefully, this should be ready for inclusion.  If people
  James> could test it on x86 and other architectures, I'd be
  James> grateful.

  James> comments and feedback from testing welcome.

Would you mind doing a s/consistent/coherent/g?  This has been
misnamed in the PCI DMA interface all along, but I didn't think it's
worth breaking drivers because of it.  But since this is a new
interface, there is no such issue.

(Consistency says something about memory access ordering, coherency
only talks about there not being multiple values for a given memory
location.  On DMA-coherent platforms with weakly-ordered memory
systems, the returned memory really is only coherent, not consistent,
i.e., you have to use memory barriers if you want to enforce
ordering.)

Thanks,

	--david

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
@ 2002-12-27 20:21 David Brownell
  2002-12-27 21:40 ` James Bottomley
  2002-12-27 21:47 ` James Bottomley
  0 siblings, 2 replies; 32+ messages in thread
From: David Brownell @ 2002-12-27 20:21 UTC (permalink / raw)
  To: James Bottomley, linux-kernel

I think you saw that patch to let the new 2.5.53 generic dma code
replace one of the two indirections USB needs.  Here are some of
the key open issues I'm thinking of:

- DMA mapping calls still return no errors; so BUG() out instead?

   Consider systems where DMA-able memory is limited (like SA-1111,
   to 1 MByte); clearly it should be possible for these calls to
   fail, when they can't allocate a bounce buffer.  Or (see below)
   when an invalid argument is provided to a dma mapping call.

   Fix by defining fault returns for the current signatures,
   starting with the api specs:

     * dma_map_sg() returns negative errno (or zero?) when it
       fails.  (Those are illegal sglist lengths.)

     * dma_map_single() returns an arch-specific value, like
       DMA_ADDR_INVALID, when it fails.  (DaveM's suggestion,
       from a while back; it's seemingly arch-specific.)

   Yes, the PCI dma calls would benefit from those bugfixes too.

- Implementation-wise, I'm rather surprised that the generic
   version doesn't just add new bus driver methods rather than
   still insisting everything be PCI underneath.

   It's not clear to me how I'd make, for example, a USB device
   or interface work with dma_map_sg() ... those "generic" calls
   are going to fail (except on x86, where all memory is DMA-able)
   since USB != PCI. Even when usb_buffer_map_sg() would succeed.
   (The second indirection:  the usb controller hardware does the
   mapping, not the device or hcd.  That's usually PCI.)

   Hmm, I suppose there'd need to be a default implementation
   of the mapping operations (for all non-pci busses) that'd
   fail cleanly ... :)

- There's no analogue to pci_pool, and there's nothing like
   "kmalloc" (likely built from N dma-coherent pools).

   That forces drivers to write and maintain memory allocators,
   is a waste of energy as well as being bug-prone.  So in that
   sense this API isn't a complete functional replacement of
   the current PCI (has pools, ~= kmem_cache_t) or USB (with
   simpler buffer_alloc ~= kmalloc) APIs for dma.

- The API says drivers "must" satisfy dma_get_cache_alignment(),
   yet both implementations, asm-{i386,generic}/dma-mapping.h,
   ignore that rule.

   Are you certain of that rule, for all cache coherency models?
   I thought only some machines (with dma-incoherent caches) had
   that as a hard constraint.  (Otherwise it's a soft one:  even
   if there's cacheline contention, the hardware won't lose data
   when drivers use memory barriers correctly.)

   I expect that combination is likely to be problematic, since the
   previous rule has been (wrongly?) that kmalloc or kmem_cache
   memory is fine for DMA mappings, no size restrictions.  Yet for
   one example on x86 dma_get_cache_alignment() returns 128 bytes,
   but kmalloc has several smaller pool sizes ... and lately will
   align to L1_CACHE_BYTES (wasting memory on small allocs?) even
   when that's smaller than L1_CACHE_MAX (in the new dma calls).

   All the more reason to have a drop-in kmalloc alternative for
   dma-aware code to use, handling such details transparently!

- Isn't arch/i386/kernel/pci-dma.c handling DMA masks wrong?
   It's passing GFP_DMA in cases where GFP_HIGHMEM is correct ...

I'm glad to see progress on making DMA more generic, thanks!

- Dave

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-27 20:21 David Brownell
@ 2002-12-27 21:40 ` James Bottomley
  2002-12-28  1:29   ` David Brownell
  2002-12-28  1:56   ` David Brownell
  2002-12-27 21:47 ` James Bottomley
  1 sibling, 2 replies; 32+ messages in thread
From: James Bottomley @ 2002-12-27 21:40 UTC (permalink / raw)
  To: David Brownell; +Cc: James Bottomley, linux-kernel

david-b@pacbell.net said:
> - Implementation-wise, I'm rather surprised that the generic
>    version doesn't just add new bus driver methods rather than
>    still insisting everything be PCI underneath. 

You mean dma-mapping.h in asm-generic?  The reason for that is to provide an 
implementation that functions now for non x86 (and non-parisc) archs without 
having to write specific code for them all.  Since all the other arch's now 
function in terms of the pci_ API, that was the only way of sliding the dma_ 
API in without breaking them all.

Bus driver methods have been advocated before, but it's not clear to me that 
they should be exposed in the *generic* API.

>    It's not clear to me how I'd make, for example, a USB device
>    or interface work with dma_map_sg() ... those "generic" calls
>    are going to fail (except on x86, where all memory is DMA-able)
>    since USB != PCI.

Actually, they should work on parisc out of the box as well because of the way 
its DMA implementation is built in terms of the generic dma_ API.

As far as implementing this generically, just adding a case for the 
usb_bus_type in asm-generic/dma-mapping.h will probably get you where you need 
to be. (the asm-generic is, after all, only intended as a stopgap.  Fully 
coherent platforms with no IOMMUs will probably take the x86 route to 
implementing the dma_ API, platforms with IOMMUs will probably (eventually) do 
similar things to parisc).

>    (The second indirection:  the usb controller hardware does the
>    mapping, not the device or hcd.  That's usually PCI.) 

Could you clarify this a little.  I tend to think of "mapping" as something 
done by the IO MMU managing the bus.  I think you mean that the usb controller 
will mark a region of memory to be accessed by the device.  If such a region 
were also "mapped" by an IOMMU, it would be done outside the control of the 
USB controller, correct? (the IOMMU would translate between the address the 
processor sees and the address the USB controller thinks it's responding to)

Is the problem actually that the USB controller needs to be able to allocate 
coherent memory in a range much more narrowly defined than the current 
dma_mask allows?

> - There's no analogue to pci_pool, and there's nothing like
>    "kmalloc" (likely built from N dma-coherent pools). 

I didn't want to build another memory pool re-implementation.  The mempool API 
seems to me to be flexible enough for this, is there some reason it won't work?

I did consider wrappering mempool to make it easier, but I couldn't really 
find a simplifying wrapper that wouldn't lose flexibility.

James

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-27 20:21 David Brownell
  2002-12-27 21:40 ` James Bottomley
@ 2002-12-27 21:47 ` James Bottomley
  2002-12-28  2:28   ` David Brownell
  1 sibling, 1 reply; 32+ messages in thread
From: James Bottomley @ 2002-12-27 21:47 UTC (permalink / raw)
  To: David Brownell; +Cc: James Bottomley, linux-kernel

david-b@pacbell.net said:
> - DMA mapping calls still return no errors; so BUG() out instead? 

That's actually an open question.  The line of least resistance (which is what 
I followed) is to do what the pci_ API does (i.e. BUG()).

It's not clear to me that adding error returns rather than BUGging would buy 
us anything (because now all the drivers have to know about the errors and 
process them).

>    Consider systems where DMA-able memory is limited (like SA-1111,
>    to 1 MByte); clearly it should be possible for these calls to
>    fail, when they can't allocate a bounce buffer.  Or (see below)
>    when an invalid argument is provided to a dma mapping call. 

That's pretty much an edge case.  I'm not opposed to putting edge cases in the 
api (I did it for dma_alloc_noncoherent() to help parisc), but I don't think 
the main line should be affected unless there's a good case for it.

Perhaps there is a compromise where the driver flags in the struct 
device_driver that it wants error returns otherwise it takes the default 
behaviour (i.e. no error return checking and BUG if there's a problem).

James

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:  [RFT][PATCH] generic device DMA implementation
@ 2002-12-27 22:57 Manfred Spraul
  2002-12-27 23:55 ` James Bottomley
  0 siblings, 1 reply; 32+ messages in thread
From: Manfred Spraul @ 2002-12-27 22:57 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

>
>
>+
>+Consistent memory is memory for which a write by either the device or
>+the processor can immediately be read by the processor or device
>+without having to worry about caching effects.
>
This is not entirely correct:
The driver must use the normal memory barrier instructions even in 
coherent memory. Could you copy the section about wmb() from DMA-mapping 
into your new documentation?

+
+Warnings:  Memory coherency operates at a granularity called the cache
+line width.  In order for memory mapped by this API to operate
+correctly, the mapped region must begin exactly on a cache line
+boundary and end exactly on one (to prevent two separately mapped
+regions from sharing a single cache line).  Since the cache line size
+may not be known at compile time, the API will not enforce this
+requirement.  Therefore, it is recommended that driver writers who
+don't take special care to determine the cache line size at run time
+only map virtual regions that begin and end on page boundaries (which
+are guaranteed also to be cache line boundaries).
+

Noone obeys that rule, and it's not trivial to fix it.

- kmalloc (32,GFP_KERNEL) returns a 32-byte object, even if the cache line size is 128 bytes. The 4 objects in the cache line could be used by four different users.
- sendfile() with an odd offset.

Is it really impossible to work around that in the platform specific code?
In the worst case, the arch code could memcopy to/from a cacheline aligned buffer.


--
    Manfred


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-27 22:57 Manfred Spraul
@ 2002-12-27 23:55 ` James Bottomley
  2002-12-28  0:20   ` Manfred Spraul
  0 siblings, 1 reply; 32+ messages in thread
From: James Bottomley @ 2002-12-27 23:55 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: James Bottomley, linux-kernel

manfred@colorfullife.com said:
> This is not entirely correct: The driver must use the normal memory
> barrier instructions even in  coherent memory. Could you copy the
> section about wmb() from DMA-mapping  into your new documentation? 

I made the name change from consistent to coherent (at David Mosberger's) 
request to address at least some of this.

I suppose I can add it as a note to dma_alloc_coherent too.

> Noone obeys that rule, and it's not trivial to fix it. 

Any driver that disobeys this rule today with the pci_ API is prone to cache 
related corruption on non-coherent architectures.

> Is it really impossible to work around that in the platform specific
> code? In the worst case, the arch code could memcopy to/from a
> cacheline aligned buffer. 

Well, it's not impossible, but I don't believe it can be done efficiently.  
And since it can't be done efficiently, I don't believe it's right to impact 
the drivers that are properly written to take caching effects into account.

Isn't the better solution to let the platform maintainers negotiate with the 
driver maintainers to get those drivers they care about fixed?

James

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-27 23:55 ` James Bottomley
@ 2002-12-28  0:20   ` Manfred Spraul
  2002-12-28 16:26     ` James Bottomley
  0 siblings, 1 reply; 32+ messages in thread
From: Manfred Spraul @ 2002-12-28  0:20 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

James Bottomley wrote:

>>Noone obeys that rule, and it's not trivial to fix it. 
>>    
>>
>
>Any driver that disobeys this rule today with the pci_ API is prone to cache 
>related corruption on non-coherent architectures.
>  
>
The networking core disobeys the rule in the sendfile implementation. 
Depending on the cacheline size, even small TCP packets might disobey 
the rule. The problem is not restricted to drivers.

>  
>
>>Is it really impossible to work around that in the platform specific
>>code? In the worst case, the arch code could memcopy to/from a
>>cacheline aligned buffer. 
>>    
>>
>
>Well, it's not impossible, but I don't believe it can be done efficiently.  
>And since it can't be done efficiently, I don't believe it's right to impact 
>the drivers that are properly written to take caching effects into account.
>
>Isn't the better solution to let the platform maintainers negotiate with the 
>driver maintainers to get those drivers they care about fixed?
>  
>
I agree that the performance will be bad, but like misaligned memory 
access, the arch code should support it. Leave the warning about bad 
performance in the documentation, and modify the drivers where it 
actually matters.
Your new documentation disagrees with the current implementation, and 
that is just wrong.
And in the case of networking, someone must do the double buffering. 
Doing it within dma_map_single() would avoid modifying every pci driver.

--
    Manfred


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-27 21:40 ` James Bottomley
@ 2002-12-28  1:29   ` David Brownell
  2002-12-28 16:18     ` James Bottomley
  2002-12-28  1:56   ` David Brownell
  1 sibling, 1 reply; 32+ messages in thread
From: David Brownell @ 2002-12-28  1:29 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

James Bottomley wrote:
> david-b@pacbell.net said:
> 
>>- Implementation-wise, I'm rather surprised that the generic
>>   version doesn't just add new bus driver methods rather than
>>   still insisting everything be PCI underneath. 
> 
> 
> You mean dma-mapping.h in asm-generic?  ..

Yes.  As noted, it can't work for USB directly.  And your
suggestion of more "... else if (bus is USB) ... else  ... "
logic (#including each bus type's headers?) bothers me.

> Bus driver methods have been advocated before, but it's not clear to me that 
> they should be exposed in the *generic* API.

Isn't the goal to make sure that for every kind of "struct device *"
it should be possible to use those dma_*() calls, without BUGging
out.  If that's not true ... then why were they defined?

That's certainly the notion I was talking about when this "generic
dma" API notion came up this summer [1].  (That discussion led to
the USB DMA APIs, and then the usb_sg_* calls that let usb-storage
queue scatterlists directly to disk:  performance wins, including
DaveM's "USB keyboards don't allocate IOMMU pages", but structures
looking ahead to having real generic DMA calls.)

>>   It's not clear to me how I'd make, for example, a USB device
>>   or interface work with dma_map_sg() ... those "generic" calls
>>   are going to fail (except on x86, where all memory is DMA-able)
>>   since USB != PCI.
> 
> 
> Actually, they should work on parisc out of the box as well because of the way 
> its DMA implementation is built in terms of the generic dma_ API.

Most of us haven't seen your PARISC code, it's not in Linus' tree.  :)

>>   (The second indirection:  the usb controller hardware does the
>>   mapping, not the device or hcd.  That's usually PCI.) 
> 
> 
> Could you clarify this a little.

Actually, make that "hardware-specific code".

The USB controller is what does the DMA.  But USB device drivers don't
talk to USB controllers, at least not directly.  Instead they talk to a
"struct usb_interface *", or a "struct usb_device *" ... those are more
or less software proxies for the real devices with usbcore and some
HCD turning proxy i/o requests into USB controller operations.

The indirection is getting from the USB device (or interface) to the
object representing the USB controller.  All USB calls need that, at
least for host-side APIs, since the controller driver is multiplexing
up to almost 4000 I/O channels.  (127 devices * 31 endpoints, max; and
of course typical usage is more like dozens of channels.)

> Is the problem actually that the USB controller needs to be able to allocate 
> coherent memory in a range much more narrowly defined than the current 
> dma_mask allows?

Nope, it's just an indirection issue.  Even on a PCI based system, the "struct
device" used by a USB driver (likely usb_interface->dev) will never be a USB
controller.  Since it's the USB controller actually doing the I/O something
needs to use that controller to do the DMA mapping(s).

So any generic DMA logic needs to be able to drill down a level or so before
doing DMA mappings (or allocations) for a USB "struct device *".

- Dave

[1] http://marc.theaimsgroup.com/?l=linux-kernel&m=102389137402497&w=2

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-27 21:40 ` James Bottomley
  2002-12-28  1:29   ` David Brownell
@ 2002-12-28  1:56   ` David Brownell
  2002-12-28 16:13     ` James Bottomley
  1 sibling, 1 reply; 32+ messages in thread
From: David Brownell @ 2002-12-28  1:56 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

>>- There's no analogue to pci_pool, and there's nothing like
>>   "kmalloc" (likely built from N dma-coherent pools). 
> 
> 
> I didn't want to build another memory pool re-implementation.  The mempool API 
> seems to me to be flexible enough for this, is there some reason it won't work?

I didn't notice any way it would track, and return, DMA addresses.
It's much like a kmem_cache in that way.

> I did consider wrappering mempool to make it easier, but I couldn't really 
> find a simplifying wrapper that wouldn't lose flexibility.

In My Ideal World (tm) Linux would have some kind of memory allocator
that'd be configured to use __get_free_pages() or dma_alloc_coherent()
as appropriate.  Fast, efficient; caching pre-initted objects; etc.

I'm not sure how realistic that is.  So long as APIs keep getting written
so that drivers _must_ re-invent the "memory allocator" wheel, it's not.

But ... if the generic DMA API includes such stuff, it'd be easy to replace
a dumb implementation (have you seen pci_pool, or how usb_buffer_alloc
works? :) with something more intelligent than any driver could justify
writing for its own use.

- Dave

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-27 21:47 ` James Bottomley
@ 2002-12-28  2:28   ` David Brownell
  0 siblings, 0 replies; 32+ messages in thread
From: David Brownell @ 2002-12-28  2:28 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

Hi,

>>- DMA mapping calls still return no errors; so BUG() out instead? 
> 
> 
> That's actually an open question.  The line of least resistance (which is what 
> I followed) is to do what the pci_ API does (i.e. BUG()).

That might have been appropriate for PCI-mostly APIs, since those tend to
be resource-rich.  Maybe.  (It always seemed like an API bug to me.)

I can't buy that logic in the "generic" case though.  Heck, haven't all
the address space allocation calls in Linux always exposed ENOMEM type
faults ... except PCI?  This one is _really_ easy to fix now.  Resources
are never infinite.

> It's not clear to me that adding error returns rather than BUGging would buy 
> us anything (because now all the drivers have to know about the errors and 
> process them).

For me, designing any "generic" API to handle common cases (like allocation
failures) reasonably (no BUGging!) is a fundamental design requirement.

Robust drivers are aware of things like allocation faults, and handle them.
If they do so poorly, that can be fixed like any other driver bug.

>>   Consider systems where DMA-able memory is limited (like SA-1111,
>>   to 1 MByte); clearly it should be possible for these calls to
>>   fail, when they can't allocate a bounce buffer.  Or (see below)
>>   when an invalid argument is provided to a dma mapping call. 
> 
> 
> That's pretty much an edge case.  I'm not opposed to putting edge cases in the 
> api (I did it for dma_alloc_noncoherent() to help parisc), but I don't think 
> the main line should be affected unless there's a good case for it.

Absolutely *any* system can have situations where the relevant address space
(or memory) was all in use, or wasn't available to a non-blocking request
without blocking, etc.  Happens more often on some systems than others; I
just chose SA-1111 since your approach would seem to make that unusable.

If that isn't a "good case", why not?  And what could ever be a "good case"?

> Perhaps there is a compromise where the driver flags in the struct 
> device_driver that it wants error returns otherwise it takes the default 
> behaviour (i.e. no error return checking and BUG if there's a problem).

IMO that's the worst of all possible worlds.  The error paths would get
even less testing than they do today.  If there's a fault path defined,
use it in all cases:  don't just BUG() in some modes, and some drivers.

- Dave

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
@ 2002-12-28  2:48 Adam J. Richter
  2002-12-28 15:05 ` David Brownell
  0 siblings, 1 reply; 32+ messages in thread
From: Adam J. Richter @ 2002-12-28  2:48 UTC (permalink / raw)
  To: david-b, James.bottomley, linux-kernel, manfred

At 2002-12-28 1:29:54 GMT, David Brownell wrote:
>Isn't the goal to make sure that for every kind of "struct device *"
>it should be possible to use those dma_*() calls, without BUGging
>out.

	No.

>If that's not true ... then why were they defined?

	So that other memory mapped busses such as ISA and sbus
can use them.

	USB devices should do DMA operations with respect to their USB
host adapters, typically a PCI device.  For example, imagine a machine
with two USB controllers on different bus instances.

	One of these days, I'd like to add "struct device *dma_dev;" to
struct request_queue to facilitate optional centralization of
dma_{,un}map_sg for most hardware drivers.  PCI scsi controllers, for
example would set dma_dev to the PCI device.  USB scsi controllers
would set it to the USB host adapter.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:  [RFT][PATCH] generic device DMA implementation
@ 2002-12-28  3:39 Adam J. Richter
  2002-12-30  0:45 ` Alan Cox
  0 siblings, 1 reply; 32+ messages in thread
From: Adam J. Richter @ 2002-12-28  3:39 UTC (permalink / raw)
  To: manfred; +Cc: dvaid-b, James.Bottomley, linux-kernel

On 2002-12-17 Manfred Spraul wrote the following, which I am taking
out of order:

>+Warnings:  Memory coherency operates at a granularity called the cache
>+line width. [...]

	That's a description of "inconsistent" memory.  "Consistent"
memory does not have the cache line problems.  It is uncached.
Architectures that cannot allocate memory with this property cannot
allocate consistent memory, and accomodating what I think motivated
James to take a whack at this.

>The driver must use the normal memory barrier instructions even in 
>coherent memory.

	I know Documentation/DMA-mapping.txt says that, and I
understand that wmb() is necessary with consistent memory, but I
wonder if rmb() really is, at least if you've declared the data
structures in question as volatile to prevent reordering of reads by
the compiler.

>- kmalloc (32,GFP_KERNEL) returns a 32-byte object, even if the cache
>line size is 128 bytes.  The 4 objects in the cache line could be used
>by four different users. - sendfile() with an odd offset.

	+1 Insightful

	I think we could use a GFP_ flag for kmallocs that are
intended for DMA (typically networking and USB packets) that would
cause the memory allocation to be aligned and rounded up to a multiple
of cache line size.  There probably are only a dozen or so such
kmallocs, but I think it's enough so that having a kmalloc flag
would result in a smaller kernel than without it.

	As you've pointed out with your sendfile() example, the
problem you've identified goes beyond kmalloc.  I'm still mulling it
over, but it's worth noting that it is not a new problem introduced by
the generic DMA interface.  Thanks for raising it though.  I hadn't
thought of it before.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28  2:48 [RFT][PATCH] generic device DMA implementation Adam J. Richter
@ 2002-12-28 15:05 ` David Brownell
  0 siblings, 0 replies; 32+ messages in thread
From: David Brownell @ 2002-12-28 15:05 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: James.bottomley, linux-kernel, manfred

Adam J. Richter wrote:
> At 2002-12-28 1:29:54 GMT, David Brownell wrote:
> 
>>Isn't the goal to make sure that for every kind of "struct device *"
>>it should be possible to use those dma_*() calls, without BUGging
>>out.
> 
> 	No.
> 
>>If that's not true ... then why were they defined?
> 
> 	So that other memory mapped busses such as ISA and sbus
> can use them.

That sounds like a "yes", not a "no" ... except for devices on busses
that don't have do memory mapped I/O.  It's what I described:  dma_*()
calls being used with struct device, without BUGging out.

> 	USB devices should do DMA operations with respect to their USB
> host adapters, typically a PCI device.  For example, imagine a machine
> with two USB controllers on different bus instances.

USB already does that ... you write as if it didn't.  It's done that
since pretty early in the 2.4 series, when conversions to the "new" PCI
DMA APIs replaced usage of virt_to_bus and bus_to_virt and USB became
usable on platforms that weren't very PC-like.  Controllers that don't
use PCI also need Linux support, in embedded configs.

However, the device drivers can't do that using the not-yet-generic DMA
calls.  It's handled by usbcore, or by the USB DMA calls.  That works,
and will continue to do so ... but it does highlight how "generic" the
implementation of those APIs is (not very).

- Dave

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
@ 2002-12-28 15:41 Adam J. Richter
  2002-12-28 16:59 ` David Brownell
  0 siblings, 1 reply; 32+ messages in thread
From: Adam J. Richter @ 2002-12-28 15:41 UTC (permalink / raw)
  To: david-b; +Cc: James.bottomley, linux-kernel, manfred

David Brownell wrote:
>Adam J. Richter wrote:
>> At 2002-12-28 1:29:54 GMT, David Brownell wrote:
>> 
>>>Isn't the goal to make sure that for every kind of "struct device *"
                                        ^^^^^
>>>it should be possible to use those dma_*() calls, without BUGging
>>>out.
>> 
>> 	No.
>> 
>>>If that's not true ... then why were they defined?
>> 
>> 	So that other memory mapped busses such as ISA and sbus
>> can use them.

>That sounds like a "yes", not a "no" ... except for devices on busses
                                          ^^^^^^
	Then it's not "every", which was your question.  I guess I need
to understand better how to interpret your questions.

>that don't have do memory mapped I/O.  It's what I described:  dma_*()
>calls being used with struct device, without BUGging out.

	Let's see if we agree.  The behavior I expect is:

	addr = dma_malloc(&some_usb_device->dev,size, &dma_addr,DMA_CONSISTENT)
	     ===> BUG()

	addr = dma_malloc(host_dev(some_usb_device), &dma_addr, DMA_CONSISTENT)
	     ===> some consistent memory (or NULL).

where host_dev would be something like:

struct device *host_dev(struct usb_device *usbdev)
{
	struct usb_hcd *hcd = usbdev->bus->hcpriv;
	return &hcd->pdev->device; /* actually, this would become &hcd->dev */
}


>> 	USB devices should do DMA operations with respect to their USB
>> host adapters, typically a PCI device.  For example, imagine a machine
>> with two USB controllers on different bus instances.

>USB already does that ... you write as if it didn't.

	I wasn't aware of it.  I've looked up the code now.  Thanks
for the pointer.  With generic DMA operations, we can delete most of
drivers/bus/usb/core/buffer.c, specifically
hcd_buffer_{alloc,free,{,un}map{,_sg},dmasync,sync_sg}, and their
method pointers in struct usb_operations in drivers/usb/core/hcd.[ch]
without introducing PCI dependency.

	I hope this clarifies things.  Please let me know if you think
we still disagree on something.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28  1:56   ` David Brownell
@ 2002-12-28 16:13     ` James Bottomley
  2002-12-28 17:41       ` David Brownell
  0 siblings, 1 reply; 32+ messages in thread
From: James Bottomley @ 2002-12-28 16:13 UTC (permalink / raw)
  To: David Brownell; +Cc: James Bottomley, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 757 bytes --]

OK, the attached is a sketch of an implementation of bus_type operations.

It renames all the platform dma_ operations to platform_dma_  and will call 
only the bus specific operation if it exists.  Thus it will be the 
responsibility of the bus to call the platform_dma_ functions correctly (this 
one is a large loaded gun).

The answer to error handling in the general case is still no (because I don't 
want to impact the main line code for a specific problem, and the main line is 
x86 which effectively has infinite mapping resources), but I don't see why the 
platforms can't export a set of values they guarantee not to return as 
dma_addr_t's that you can use for errors in the bus implementations.

Would this solve most of your problems?

James


[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 5094 bytes --]

===== include/linux/device.h 1.70 vs edited =====
--- 1.70/include/linux/device.h	Mon Dec 16 10:01:41 2002
+++ edited/include/linux/device.h	Sat Dec 28 09:57:51 2002
@@ -61,6 +61,7 @@
 struct device;
 struct device_driver;
 struct device_class;
+struct bus_dma_ops;
 
 struct bus_type {
 	char			* name;
@@ -75,6 +76,7 @@
 	struct device * (*add)	(struct device * parent, char * bus_id);
 	int		(*hotplug) (struct device *dev, char **envp, 
 				    int num_envp, char *buffer, int buffer_size);
+	struct bus_dma_ops *	dma_ops;
 };
 
 
===== include/linux/dma-mapping.h 1.1 vs edited =====
--- 1.1/include/linux/dma-mapping.h	Sat Dec 21 22:37:05 2002
+++ edited/include/linux/dma-mapping.h	Sat Dec 28 10:02:37 2002
@@ -10,7 +10,144 @@
 	DMA_NONE = 3,
 };
 
+struct bus_dma_ops {
+	int (*supported)(struct device *dev, u64 mask);
+	int (*set_mask)(struct device *dev, u64 mask);
+	void *(*alloc_coherent)(struct device *dev, size_t size,
+				dma_addr_t *dma_handle);
+	void (*free_coherent)(struct device *dev, size_t size, void *cpu_addr,
+			      dma_addr_t dma_handle);
+	dma_addr_t (*map_single)(struct device *dev, void *cpu_addr,
+				 size_t size, enum dma_data_direction direction);
+	void (*unmap_single)(struct device *dev, dma_addr_t dma_addr,
+			     size_t size, enum dma_data_direction direction);
+	int (*map_sg)(struct device *dev, struct scatterlist *sg, int nents,
+		      enum dma_data_direction direction);
+	void (*unmap_sg)(struct device *dev, struct scatterlist *sg,
+			 int nhwentries, enum dma_data_direction direction);
+	void (*sync_single)(struct device *dev, dma_addr_t dma_handle,
+			    size_t size, enum dma_data_direction direction);
+	void (*sync_sg)(struct device *dev, struct scatterlist *sg, int nelems,
+			enum dma_data_direction direction);
+};
+
 #include <asm/dma-mapping.h>
+
+static inline int
+dma_supported(struct device *dev, u64 mask)
+{
+	struct bus_type *bus = dev->bus;
+
+	if(bus->dma_ops && bus->dma_ops->supported)
+		return bus->dma_ops->supported(dev, mask);
+
+	return platform_dma_supported(dev, mask);
+}
+
+static inline int
+dma_set_mask(struct device *dev, u64 dma_mask)
+{
+	struct bus_type *bus = dev->bus;
+
+	if(bus->dma_ops && bus->dma_ops->set_mask)
+		return bus->dma_ops->set_mask(dev, mask);
+	
+	return platform_dma_set_mask(dev, dma_mask);
+}
+
+static inline void *
+dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle)
+{
+	struct bus_type *bus = dev->bus;
+
+	if(bus->dma_ops && bus->dma_ops->alloc_coherent)
+		return bus->dma_ops->alloc_coherent(dev, size, dma_handle);
+
+	return platform_dma_alloc_coherent(dev, size, dma_handle);
+}
+
+static inline void
+dma_free_coherent(struct device *dev, size_t size, void *cpu_addr,
+		    dma_addr_t dma_handle)
+{
+	struct bus_type *bus = dev->bus;
+
+	if(bus->dma_ops && bus->dma_ops->free_coherent)
+		bus->dma_ops->free_coherent(dev, size, cpu_addr, dma_handle);
+	else
+		platform_dma_free_coherent(dev, size, cpu_addr, dma_handle);
+}
+
+static inline dma_addr_t
+dma_map_single(struct device *dev, void *cpu_addr, size_t size,
+	       enum dma_data_direction direction)
+{
+	struct bus_type *bus = dev->bus;
+
+	if(bus->dma_ops && bus->dma_ops->map_single)
+		return bus->dma_ops->map_single(dev, cpu_addr, size, direction);
+	return platform_dma_map_single(dev, cpu_addr, size, direction);
+}
+
+static inline void
+dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
+		 enum dma_data_direction direction)
+{
+	struct bus_type *bus = dev->bus;
+
+	if(bus->dma_ops && bus->dma_ops->unmap_single)
+		bus->dma_ops->unmap_single(dev, dma_addr, size, direction);
+	else
+		platform_dma_unmap_single(dev, dma_addr, size, direction);
+}
+
+static inline int
+dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+	   enum dma_data_direction direction)
+{
+	struct bus_type *bus = dev->bus;
+
+	if(bus->dma_ops && bus->dma_ops->map_sg)
+		return bus->dma_ops->map_sg(dev, sg, nents, direction);
+
+	return platform_dma_map_sg(dev, sg, nents, direction);
+}
+
+static inline void
+dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries,
+	     enum dma_data_direction direction)
+{
+	struct bus_type *bus = dev->bus;
+
+	if(bus->dma_ops && bus->dma_ops->unmap_sg)
+		bus->dma_ops->unmap_sg(dev, sg, nhwentries, direction);
+	else
+		platform_dma_unmap_sg(dev, sg, nhwentries, direction);
+}
+
+static inline void
+dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size,
+		enum dma_data_direction direction)
+{
+	struct bus_type *bus = dev->bus;
+
+	if(bus->dma_ops && bus->dma_ops->sync_single)
+		bus->dma_ops->sync_single(dev, dma_handle, size, direction);
+	else
+		platform_dma_sync_single(dev, dma_handle, size, direction);
+}
+
+static inline void
+dma_sync_sg(struct device *dev, struct scatterlist *sg, int nelems,
+	    enum dma_data_direction direction)
+{
+	struct bus_type *bus = dev->bus;
+
+	if(bus->dma_ops && bus->dma_ops->sync_sg)
+		bus->dma_ops->sync_sg(dev, sg, nelems, direction);
+	else
+		platform_dma_sync_sg(dev, sg, nelems, direction);
+}
 
 #endif
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28  1:29   ` David Brownell
@ 2002-12-28 16:18     ` James Bottomley
  2002-12-28 18:16       ` David Brownell
  0 siblings, 1 reply; 32+ messages in thread
From: James Bottomley @ 2002-12-28 16:18 UTC (permalink / raw)
  To: David Brownell; +Cc: James Bottomley, linux-kernel

david-b@pacbell.net said:
> The indirection is getting from the USB device (or interface) to the
> object representing the USB controller.  All USB calls need that, at
> least for host-side APIs, since the controller driver is multiplexing
> up to almost 4000 I/O channels.  (127 devices * 31 endpoints, max; and
> of course typical usage is more like dozens of channels.) 

This sounds like a mirror of the problem of finding the IOMMU on parisc (there 
can be more than one).

The way parisc solves this is to look in dev->platform_data and if that's null 
walk up the dev->parent until the IOMMU is found and then cache the IOMMU ops 
in the current dev->platform_data.  Obviously, you can't use platform_data, 
but you could use driver_data for this.  The IOMMU's actually lie on a parisc 
specific bus, so the ability to walk up the device tree without having to know 
the device types was crucial to implementing this.

James

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28  0:20   ` Manfred Spraul
@ 2002-12-28 16:26     ` James Bottomley
  2002-12-28 17:54       ` Manfred Spraul
  0 siblings, 1 reply; 32+ messages in thread
From: James Bottomley @ 2002-12-28 16:26 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: James Bottomley, linux-kernel

manfred@colorfullife.com said:
> Your new documentation disagrees with the current implementation, and
> that is just wrong.

I don't agree that protecting users from cache line overlap misuse is current 
implementation.  It's certainly not on parisc which was the non-coherent 
platform I chose to model this with, which platforms do it now for the pci_ 
API?

James

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28 15:41 Adam J. Richter
@ 2002-12-28 16:59 ` David Brownell
  0 siblings, 0 replies; 32+ messages in thread
From: David Brownell @ 2002-12-28 16:59 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: James.bottomley, linux-kernel, manfred

Hi,

>>>>Isn't the goal to make sure that for every kind of "struct device *"
>                                         ^^^^^
>>>>it should be possible to use those dma_*() calls, without BUGging
>>>>out.
>>That sounds like a "yes", not a "no" ... except for devices on busses
>                                           ^^^^^^
> 	Then it's not "every", which was your question.  I guess I need
> to understand better how to interpret your questions.

My bad ... I meant "every" kind of device doing memory mapped I/O, which
is all that had been mentioned so far.  This needs to work on more than
just the "platform bus"; "layered busses" shouldn't need special casing,
they are just as common.  You had mentioned SCSI busses; USB, FireWire,
and others also exist.  (Though I confess I still don't like BUG() as a
way to fail much of anything!)

That's likely a better way to present one of my points:  if it's "generic",
then it must work on more than just the lowest level platform bus.

>>that don't have do memory mapped I/O.  It's what I described:  dma_*()
>>calls being used with struct device, without BUGging out.
> 
> 
> 	Let's see if we agree.  The behavior I expect is:
> 
> 	addr = dma_malloc(&some_usb_device->dev,size, &dma_addr,DMA_CONSISTENT)
> 	     ===> BUG()

That's not consistent with what I thought what you said.  USB devices
use memory mapped I/O, so this should work.  (And using BUG instead
of returning null still seems wrong...)

However, we could agree that some kind of dma_malloc() should exist !!

> 	addr = dma_malloc(host_dev(some_usb_device), &dma_addr, DMA_CONSISTENT)
> 	     ===> some consistent memory (or NULL).
> 
> where host_dev would be something like:
> 
> struct device *host_dev(struct usb_device *usbdev)
> {
> 	struct usb_hcd *hcd = usbdev->bus->hcpriv;
> 	return &hcd->pdev->device; /* actually, this would become &hcd->dev */
> }

Please look at the 2.5.53 tree with my "usbcore dma updates (and doc)"
patch, which Greg has now merged and submitted to Linus.

The pre-existing USB DMA API syntax did not change, so it looks only
"something" like what you wrote there.

- Dave

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28 16:13     ` James Bottomley
@ 2002-12-28 17:41       ` David Brownell
  0 siblings, 0 replies; 32+ messages in thread
From: David Brownell @ 2002-12-28 17:41 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

James Bottomley wrote:
> OK, the attached is a sketch of an implementation of bus_type operations.

Quick reaction ....

Those signatures look more or less right, at a quick glance,
except that allocating N bytes should pass a __GFP_WAIT flag.
(And of course, allocating a mapping needs a failure return.)

That bus_dma_ops is more of a "vtable" approach, and I confess
I'd been thinking of hanging some object that had internal state
as well as method pointers.  (Call it a "whatsit" for the moment.)

That'd make it possible for layered busses like USB and SCSI
to just reference the "whatsit" from the parent bus in their
layered "struct device" objects. [1]

In many cases that'd just end up being a ref to the "platform
whatsit", eliminating a conditional test from the hot path from
your sketch as well as an entire set of new "platform_*()" APIs.

- Dave

[1] That is, resembling what Benjamin Herrenschmidt suggested:

http://marc.theaimsgroup.com/?l=linux-kernel&m=102389432006266&w=2

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28 16:26     ` James Bottomley
@ 2002-12-28 17:54       ` Manfred Spraul
  2002-12-28 18:13         ` James Bottomley
  0 siblings, 1 reply; 32+ messages in thread
From: Manfred Spraul @ 2002-12-28 17:54 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

James Bottomley wrote:

>manfred@colorfullife.com said:
>  
>
>>Your new documentation disagrees with the current implementation, and
>>that is just wrong.
>>    
>>
>
>I don't agree that protecting users from cache line overlap misuse is current 
>implementation.  It's certainly not on parisc which was the non-coherent 
>platform I chose to model this with, which platforms do it now for the pci_ 
>API?
>  
>
You are aware that "users" is not one or two drivers that noone uses, 
it's the whole networking stack.
What do you propose to fix sendfile() and networking with small network 
packets [e.g. two 64 byte packets within a 128 byte cache line]?

One platforms that handles it is Miles Bader's memcopy based 
dma_map_single() implementation.
http://marc.theaimsgroup.com/?l=linux-kernel&m=103907087825616&w=2

And obviously i386, i.e. all archs with empty dma_map_single() functions.

I see three options:
- modify the networking core, and enforce that a cache line is never 
shared between users for such archs. Big change. Often not necessary - 
some nics must double buffer internally anyway.
- modify every driver that doesn't do double buffering, and enable 
double buffering on the affected archs. Even larger change.
- do the double buffering in dma_map_single() & co.

One problem for double buffering in dma_map_single() is that it would 
double buffer too often: for example, the start of the rx buffers is 
usually misaligned by the driver, to ensure that the IP headers are 
aligned. The rest of the cacheline is unused, but it's not possible to 
give that information to dma_map_single().

--
    Manfred



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28 17:54       ` Manfred Spraul
@ 2002-12-28 18:13         ` James Bottomley
  2002-12-28 18:25           ` Manfred Spraul
  0 siblings, 1 reply; 32+ messages in thread
From: James Bottomley @ 2002-12-28 18:13 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: James Bottomley, linux-kernel

manfred@colorfullife.com said:
> You are aware that "users" is not one or two drivers that noone uses,
> it's the whole networking stack. 

I am aware of this.  I'm also aware that it is *currently* broken with the old 
API on all non-coherent arch's bar the one you point out.

All I actually did was document the existing problem, I think.

How bad actually is it?  Networking seems to work fine for me on non-coherent 
parisc.  Whereas, when I had this cache line overlap problem in a SCSI driver, 
I was seeing corruption all over the place.

The problem really only occurs if the CPU can modify part of a cache line 
while a device has modified memory belonging to another part.  Now a flush 
from the CPU will destroy the device data (or an invalidate from the driver 
destroy the CPU's data).  The problem is effectively rendered harmless if only 
data going in the same direction shares a cache line (even if it is for 
different devices).  It strikes me that this is probably true for network data 
and would explain the fact that I haven't seen any obvious network related 
corruption.

James

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-18  3:01 James Bottomley
  2002-12-18  3:13 ` David Mosberger
@ 2002-12-28 18:14 ` Russell King
  2002-12-28 18:19   ` James Bottomley
  1 sibling, 1 reply; 32+ messages in thread
From: Russell King @ 2002-12-28 18:14 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

I've just been working through the ARM dma stuff, converting it to the
new API, and I foudn this:

> +static inline int
> +pci_dma_supported(struct pci_dev *hwdev, u64 mask)
> +{
> +	return dma_supported(&hwdev->dev, mask);
> +}
> (etc)

I'll now pull out a bit from DMA-mapping.txt:

| 		 Using Consistent DMA mappings.
| 
| To allocate and map large (PAGE_SIZE or so) consistent DMA regions,
| you should do:
| 
| 	dma_addr_t dma_handle;
| 
| 	cpu_addr = pci_alloc_consistent(dev, size, &dma_handle);
| 
| where dev is a struct pci_dev *. You should pass NULL for PCI like buses
| where devices don't have struct pci_dev (like ISA, EISA).  This may be
| called in interrupt context. 

What happens to &hwdev->dev when you do as detailed there and pass NULL
into these "compatibility" functions?  Probably an oops.

I think these "compatibility" functions need to do:

static inline xxx
pci_xxx(struct pci_dev *hwdev, ...)
{
	dma_xxxx(hwdev ? &hwdev->dev : NULL, ...)
}

so they remain correct to existing API users expectations.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28 16:18     ` James Bottomley
@ 2002-12-28 18:16       ` David Brownell
  0 siblings, 0 replies; 32+ messages in thread
From: David Brownell @ 2002-12-28 18:16 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

James Bottomley wrote:
> david-b@pacbell.net said:
> 
>>The indirection is getting from the USB device (or interface) to the
>>object representing the USB controller.  ...
> 
> This sounds like a mirror of the problem of finding the IOMMU on parisc (there 
> can be more than one).

Wouldn't it be straightforward to package that IOMMU solution using the
"call dev->whatsit->dma_op()" approach I mentioned?  Storing data in
the "whatsit" seems more practical than saying driver_data is no longer
available to the device's driver.  (I'll be agnostic on platform_data.)

This problem seems to me to be a common layering requirement.  All the
indirections are known when the device structure is being initted, so it
might as well be set up then.  True for PARISC (right?), as well as USB,
SCSI, and most other driver stacks.  I suspect it'd even allow complex
voodoo for multi-path I/O too...

- Dave

> The way parisc solves this is to look in dev->platform_data and if that's null 
> walk up the dev->parent until the IOMMU is found and then cache the IOMMU ops 
> in the current dev->platform_data.  Obviously, you can't use platform_data, 
> but you could use driver_data for this.  The IOMMU's actually lie on a parisc 
> specific bus, so the ability to walk up the device tree without having to know 
> the device types was crucial to implementing this.
> 
> James
> 
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28 18:14 ` Russell King
@ 2002-12-28 18:19   ` James Bottomley
  0 siblings, 0 replies; 32+ messages in thread
From: James Bottomley @ 2002-12-28 18:19 UTC (permalink / raw)
  To: James Bottomley, linux-kernel

rmk@arm.linux.org.uk said:
> What happens to &hwdev->dev when you do as detailed there and pass
> NULL into these "compatibility" functions?  Probably an oops. 

Yes.  Already found by Udo Steinberg and fixed in bk latest...

James



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28 18:13         ` James Bottomley
@ 2002-12-28 18:25           ` Manfred Spraul
  2002-12-28 18:40             ` James Bottomley
  0 siblings, 1 reply; 32+ messages in thread
From: Manfred Spraul @ 2002-12-28 18:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

James Bottomley wrote:

>The problem really only occurs if the CPU can modify part of a cache line 
>while a device has modified memory belonging to another part.  Now a flush 
>from the CPU will destroy the device data (or an invalidate from the driver 
>destroy the CPU's data).  The problem is effectively rendered harmless if only 
>data going in the same direction shares a cache line (even if it is for 
>different devices).  It strikes me that this is probably true for network data 
>and would explain the fact that I haven't seen any obvious network related 
>corruption.
>  
>
Yes. Networking usually generates exclusive cachelines.
I'm aware of two special cases:
If multiple kmalloc buffers fit into one cacheline, then it can happen 
all the time. But the smallest kmalloc buffer is 64 bytes [assuming page 
size > 4096].
Is your cache line >= 128 bytes?

Or sendfile() of a mmap'ed file that is modified by userspace. That is 
the recommended approach for zerocopy tx, but I'm not sure which apps 
actually use that. IIRC DaveM mentioned the approach.

Additionally, the TCP checksum could catch the corruption and resent the 
packet - you wouldn't notice the corruptions, unless you use hw checksums.

--
    Manfred


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28 18:25           ` Manfred Spraul
@ 2002-12-28 18:40             ` James Bottomley
  2002-12-28 20:05               ` Manfred Spraul
  0 siblings, 1 reply; 32+ messages in thread
From: James Bottomley @ 2002-12-28 18:40 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: James Bottomley, linux-kernel

manfred@colorfullife.com said:
> If multiple kmalloc buffers fit into one cacheline, then it can happen
>  all the time. But the smallest kmalloc buffer is 64 bytes [assuming
> page  size > 4096].

Actually, I did forget to mention that on parisc non-coherent, the minimum 
kmalloc allocation is the cache line width, so that problem cannot occur.

Hmm, perhaps that is an easier (and faster) approach to fixing the problems on 
non-coherent platforms?

James



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28 18:40             ` James Bottomley
@ 2002-12-28 20:05               ` Manfred Spraul
  0 siblings, 0 replies; 32+ messages in thread
From: Manfred Spraul @ 2002-12-28 20:05 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel

James Bottomley wrote:

>manfred@colorfullife.com said:
>  
>
>>If multiple kmalloc buffers fit into one cacheline, then it can happen
>> all the time. But the smallest kmalloc buffer is 64 bytes [assuming
>>page  size > 4096].
>>    
>>
>
>Actually, I did forget to mention that on parisc non-coherent, the minimum 
>kmalloc allocation is the cache line width, so that problem cannot occur.
>
>Hmm, perhaps that is an easier (and faster) approach to fixing the problems on 
>non-coherent platforms?
>  
>
How do you want to fix sendfile()?
Note that I'm thinking along the same line as reading an unaligned 
integer: the arch must provide a trap handler that emulates misaligned 
reads, but it should never happen, except if someone manually creates an 
IP packet with odd options to perform an DoS attack. Restricting kmalloc 
is obviously faster, but I'm not convinced that this really catches all 
corner cases.

A memcpy() based dma_map implementation would have another advantage: 
enable it on i386, and you'll catch everyone that violates the dma spec 
immediately.

The only problem is that the API is bad - networking buffers are usually 
2 kB allocations, 2 kB aligned.
The actual data area that is passed to pci_map_single starts at offset 2 
and has an odd length. It's not possible to pass the information to 
dma_map_single() that the rest of the cacheline is unused and that 
double buffering is not required, despite the misaligned case.
Except for sendfile(), then double buffering is necessary.

--
    Manfred

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
@ 2002-12-28 20:11 Adam J. Richter
  0 siblings, 0 replies; 32+ messages in thread
From: Adam J. Richter @ 2002-12-28 20:11 UTC (permalink / raw)
  To: James.Bottomley, manfred; +Cc: david-b, linux-kernel

	Regarding the problem of multiple users of inconsistent
streaming memory potentially sharing the same cache line, I suspect
that the number of places that need to be fixed (or even just ought to
be commented) is probably small.

	First of all, I believe it is sufficient if we just ensure
that the memory used for device --> cpu do not share a cache line (at
least now with anything that the CPU may write to), as that is the
only direction which involves invalidating data in the CPU cache.

	So, for network packets, we should only concerned about
inbound ones, in which case the maximum packet size has usually been
allocated anyhow.

	I believe all of the current block device IO generators
generate transfers that are aligned and sized in units of at least 512
(although struct bio_vec does not require this), which I think is a
multiple of cache line size of all current architectures.

	I haven't checked, but I would suspect that even for the remaining
non-network non-block devices that do large amount of input via DMA,
such as scanners (typically via SCSI or USB) that the input buffers
allocated for these transfers happen to be a multiple of cache line
size, just because they're large and because programmers like to use
powers of two and the memory allocators usually end up aligning them
on such a power of two.

	Just to be clear: I'm only talking about corruption due to
streaming mappings overlapping other data in the same CPU cache line.
I am not talking about using inconsistent memory to map control
structures on architectures that lack consistent memory.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
@ 2002-12-28 22:19 Adam J. Richter
  2002-12-30 23:23 ` David Brownell
  0 siblings, 1 reply; 32+ messages in thread
From: Adam J. Richter @ 2002-12-28 22:19 UTC (permalink / raw)
  To: david-b; +Cc: James.bottomley, linux-kernel, manfred

Odd number of ">" = David Brownell
Even number of ">>" = Adam Richter

>[...] This [DMA mapping operations] needs to work on more than
>just the "platform bus"; "layered busses" shouldn't need special casing,
>they are just as common. 

	It is not necessarily the case that every non-DMA device has
nexactly one DMA device "parent."  You can have SCSI devices that are
multipathed from multiple scsi controllers.  Software RAID devices can
drive multiple controllers.  Some devices have no DMA in their IO
paths, such as some parallel port devices (and, yes, there is a
parallel bus in that you can daisy chain multiple devices off of one
port, each device having a distinct ID).  If drivers are explicit
about which DMA mapped device they are referring to, it will simplify
things like adding support for multipath failover.

	On the other hand, it might be a convenient shorthand be able to
say dma_malloc(usb_device,....) instead of
dma_malloc(usb_device->controller, ...).   It's just that the number of
callers is small enough so that I don't think that the resulting code
shrink would make up for the size of the extra wrapper routines.  So,
I'd rather have more clarity about exactly which device's the DMA
constrains are being used.

	By the way, one idea that I've mentioned before that might
help catch some memory alocation bugs would be a type scheme so that
the compiler could catch some of mistakes, like so:

/* PCI, ISA, sbus, etc. devices embed this instead of struct device: */
struct dma_device {
	u64		dma_mask;
	/* other stuff? */

	struct device	dev;
};

void *dma_malloc(struct dma_device *dma_dev, size_t nbytes,
		 dma_addr_t *dma_addr, unsigned int flags);


	Also, another separate feature that might be handy would be
a new field in struct device.

struct device {
	....
	struct dma_device *dma_dev;
}

	device.dma_dev would point back to the device in the case of PCI,
ISA and other memory mapped devices, and it would point to the host
controller for USB devices, the SCSI host adapter for SCSI devices, etc.
Devices that do fail over might implement device-specific spinlock to
guard access to this field so that it could be changed on the fly.

	So, for example, the high level networking code could embed
could conslidate mapping of outgoing packets by doing something like:

	skbuff->dma_addr = dma_map_single(netdev->dev->dma_dev,
			                  skbuff->data, ...)

	...and that would even work for USB or SCSI ethernet adapters.

	

>You had mentioned SCSI busses; USB, FireWire,
>and others also exist.  (Though I confess I still don't like BUG() as a
>way to fail much of anything!)

	BUG() is generally the optimal way to fail due to programmer
error, as opposed to program error.  You want to catch the bug as
early as possible.  If you have a system where you want to do
something other exiting the current process with a fake SIGSEGV (say
you want to try to invoke a debugger or do a more graceful system call
return), you can redefine BUG() to your liking.  Writing lots of code
to carefully unwind programmer error usually leads to so much more
complexity that the overall effect is a reduction in reliability, and,
besides, you get into an infinite development cycle of trying to
recover from all possible programmer errors in the recovery code, and
then in the recovery code for the recovery code and so on.

>Please look at the 2.5.53 tree with my "usbcore dma updates (and doc)"
>patch, which Greg has now merged and submitted to Linus.

	This looks great.  Notice that you're only doing DMA
operations on usb_device->controller, which is a memory-mapped device
(typically PCI).

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:  [RFT][PATCH] generic device DMA implementation
  2002-12-28  3:39 Adam J. Richter
@ 2002-12-30  0:45 ` Alan Cox
  0 siblings, 0 replies; 32+ messages in thread
From: Alan Cox @ 2002-12-30  0:45 UTC (permalink / raw)
  To: Adam J. Richter
  Cc: manfred, dvaid-b, James.Bottomley, Linux Kernel Mailing List

On Sat, 2002-12-28 at 03:39, Adam J. Richter wrote:
> 	I know Documentation/DMA-mapping.txt says that, and I
> understand that wmb() is necessary with consistent memory, but I
> wonder if rmb() really is, at least if you've declared the data
> structures in question as volatile to prevent reordering of reads by
> the compiler.

Compiler ordering != Processor to memory ordering != PCI device view
ordering

volatile may not be enough.

Alan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFT][PATCH] generic device DMA implementation
  2002-12-28 22:19 Adam J. Richter
@ 2002-12-30 23:23 ` David Brownell
  0 siblings, 0 replies; 32+ messages in thread
From: David Brownell @ 2002-12-30 23:23 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: James.bottomley, linux-kernel, manfred

Adam J. Richter wrote:
 > Odd number of ">" = David Brownell
 > Even number of ">>" = Adam Richter

Toggle that one more time ... ;)

 >
 > 	On the other hand, it might be a convenient shorthand be able to
 > say dma_malloc(usb_device,....) instead of
 > dma_malloc(usb_device->controller, ...).   It's just that the number of
 > callers is small enough so that I don't think that the resulting code
 > shrink would make up for the size of the extra wrapper routines.  So,

Since about 2.5.32 that API has been

     void *usb_buffer_alloc(usb_device *, size, mem_flags, dma_addr_t *)

Sure -- when dma_alloc() is available, we should be able to make it
inline completely.  Done correctly it should be an object code shrink.

 > struct device {
 > 	....
 > 	struct dma_device *dma_dev;
 > }
 >
 > 	device.dma_dev would point back to the device in the case of PCI,
 > ISA and other memory mapped devices, and it would point to the host
 > controller for USB devices, the SCSI host adapter for SCSI devices, etc.

With 'dma_device' being pretty much the 'whatsit' I mentioned:  some state
(from platforms that need it, like u64 dma_mask and maybe a list of pci
pools to use with dma_malloc), plus methods basically like James' signatures
from 'struct bus_dma_ops'.

Yes, that'd be something that might be the platform implementation (often
pci, if it doesn't vanish like on x86), something customized (choose dma
paths on the fly) or just BUG() out.

 > 	BUG() is generally the optimal way to fail due to programmer
 > error, as opposed to program error.  You want to catch the bug as
 > early as possible.

I can agree to that in scenarios like relying on DMA ops with hardware
known not to support them.  If it ever happens, there's deep confusion.

But not in the case of generic dma "map this buffer" operations failing
because of issues like temporary resource starvation; or almost any
other temporary allocation failure that appears after the system booted.

 >>Please look at the 2.5.53 tree with my "usbcore dma updates (and doc)"
 >>patch, which Greg has now merged and submitted to Linus.
 >
 > 	This looks great.  Notice that you're only doing DMA
 > operations on usb_device->controller, which is a memory-mapped device
 > (typically PCI).

Actually it isn't necessarily ... some host controllers talk I/O space
using FIFOs for commands and data, rather than memory mapping registers,
shared memory request schedules, and DMAing to/from the kernel buffers.
Linux would want a small tweak to support those controllers; maybe it'd
be as simple as testing whethere there's a dma_whatsit object pointer.

The usb_buffer_*map*() calls could now be inlined, but I thought I'd rather
only leave one copy of all the "don't go through null pointer" checking.
If we ever reduce such checking in USB, those routines would all be
good candidates for turning into inlined calls to dma_*() calls.

- Dave

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2002-12-30 23:09 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-28  2:48 [RFT][PATCH] generic device DMA implementation Adam J. Richter
2002-12-28 15:05 ` David Brownell
  -- strict thread matches above, loose matches on Subject: below --
2002-12-28 22:19 Adam J. Richter
2002-12-30 23:23 ` David Brownell
2002-12-28 20:11 Adam J. Richter
2002-12-28 15:41 Adam J. Richter
2002-12-28 16:59 ` David Brownell
2002-12-28  3:39 Adam J. Richter
2002-12-30  0:45 ` Alan Cox
2002-12-27 22:57 Manfred Spraul
2002-12-27 23:55 ` James Bottomley
2002-12-28  0:20   ` Manfred Spraul
2002-12-28 16:26     ` James Bottomley
2002-12-28 17:54       ` Manfred Spraul
2002-12-28 18:13         ` James Bottomley
2002-12-28 18:25           ` Manfred Spraul
2002-12-28 18:40             ` James Bottomley
2002-12-28 20:05               ` Manfred Spraul
2002-12-27 20:21 David Brownell
2002-12-27 21:40 ` James Bottomley
2002-12-28  1:29   ` David Brownell
2002-12-28 16:18     ` James Bottomley
2002-12-28 18:16       ` David Brownell
2002-12-28  1:56   ` David Brownell
2002-12-28 16:13     ` James Bottomley
2002-12-28 17:41       ` David Brownell
2002-12-27 21:47 ` James Bottomley
2002-12-28  2:28   ` David Brownell
2002-12-18  3:01 James Bottomley
2002-12-18  3:13 ` David Mosberger
2002-12-28 18:14 ` Russell King
2002-12-28 18:19   ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).